linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Yang Shi <yang.shi@linux.alibaba.com>, peterz@infradead.org
Cc: jstancek@redhat.com, akpm@linux-foundation.org,
	stable@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush
Date: Thu, 9 May 2019 09:37:26 +0100	[thread overview]
Message-ID: <20190509083726.GA2209@brain-police> (raw)
In-Reply-To: <1557264889-109594-1-git-send-email-yang.shi@linux.alibaba.com>

Hi all, [+Peter]

Apologies for the delay; I'm attending a conference this week so it's tricky
to keep up with email.

On Wed, May 08, 2019 at 05:34:49AM +0800, Yang Shi wrote:
> A few new fields were added to mmu_gather to make TLB flush smarter for
> huge page by telling what level of page table is changed.
> 
> __tlb_reset_range() is used to reset all these page table state to
> unchanged, which is called by TLB flush for parallel mapping changes for
> the same range under non-exclusive lock (i.e. read mmap_sem).  Before
> commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in
> munmap"), MADV_DONTNEED is the only one who may do page zapping in
> parallel and it doesn't remove page tables.  But, the forementioned commit
> may do munmap() under read mmap_sem and free page tables.  This causes a
> bug [1] reported by Jan Stancek since __tlb_reset_range() may pass the
> wrong page table state to architecture specific TLB flush operations.

Yikes. Is it actually safe to run free_pgtables() concurrently for a given
mm?

> So, removing __tlb_reset_range() sounds sane.  This may cause more TLB
> flush for MADV_DONTNEED, but it should be not called very often, hence
> the impact should be negligible.
> 
> The original proposed fix came from Jan Stancek who mainly debugged this
> issue, I just wrapped up everything together.

I'm still paging the nested flush logic back in, but I have some comments on
the patch below.

> [1] https://lore.kernel.org/linux-mm/342bf1fd-f1bf-ed62-1127-e911b5032274@linux.alibaba.com/T/#m7a2ab6c878d5a256560650e56189cfae4e73217f
> 
> Reported-by: Jan Stancek <jstancek@redhat.com>
> Tested-by: Jan Stancek <jstancek@redhat.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Signed-off-by: Jan Stancek <jstancek@redhat.com>
> ---
>  mm/mmu_gather.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 99740e1..9fd5272 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -249,11 +249,12 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
>  	 * flush by batching, a thread has stable TLB entry can fail to flush

Urgh, we should rewrite this comment while we're here so that it makes sense...

>  	 * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
>  	 * forcefully if we detect parallel PTE batching threads.
> +	 *
> +	 * munmap() may change mapping under non-excluse lock and also free
> +	 * page tables.  Do not call __tlb_reset_range() for it.
>  	 */
> -	if (mm_tlb_flush_nested(tlb->mm)) {
> -		__tlb_reset_range(tlb);
> +	if (mm_tlb_flush_nested(tlb->mm))
>  		__tlb_adjust_range(tlb, start, end - start);
> -	}

I don't think we can elide the call __tlb_reset_range() entirely, since I
think we do want to clear the freed_pXX bits to ensure that we walk the
range with the smallest mapping granule that we have. Otherwise couldn't we
have a problem if we hit a PMD that had been cleared, but the TLB
invalidation for the PTEs that used to be linked below it was still pending?

Perhaps we should just set fullmm if we see that here's a concurrent
unmapper rather than do a worst-case range invalidation. Do you have a feeling
for often the mm_tlb_flush_nested() triggers in practice?

Will


  reply	other threads:[~2019-05-09  8:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 21:34 Yang Shi
2019-05-09  8:37 ` Will Deacon [this message]
2019-05-09 10:38   ` Peter Zijlstra
2019-05-09 10:54     ` Peter Zijlstra
2019-05-09 18:35       ` Yang Shi
2019-05-09 18:40         ` Peter Zijlstra
2019-05-09 12:44     ` Jan Stancek
2019-05-09 17:36     ` Nadav Amit
2019-05-09 18:24       ` Peter Zijlstra
2019-05-09 19:10         ` Yang Shi
2019-05-09 21:06           ` Jan Stancek
2019-05-09 21:48             ` Yang Shi
2019-05-09 22:12               ` Jan Stancek
     [not found]         ` <04668E51-FD87-4D53-A066-5A35ABC3A0D6@vmware.com>
     [not found]           ` <20190509191120.GD2623@hirez.programming.kicks-ass.net>
2019-05-09 21:21             ` Nadav Amit
2019-05-13  8:36               ` Peter Zijlstra
2019-05-13  9:11                 ` Nadav Amit
2019-05-13 11:30                   ` Peter Zijlstra
2019-05-13 16:37                   ` Will Deacon
2019-05-13 17:06                     ` Nadav Amit
2019-05-14  8:58                       ` Mel Gorman
2019-05-13  9:12                 ` Peter Zijlstra
2019-05-13  9:21                   ` Nadav Amit
2019-05-13 11:27                     ` Peter Zijlstra
2019-05-13 17:41                       ` Nadav Amit
2019-05-09 18:22     ` Yang Shi
2019-05-09 19:56     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190509083726.GA2209@brain-police \
    --to=will.deacon@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=jstancek@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox