Re: [RFC PATCH] mm: avoid access flag update TLB flush for retried page fault

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Will Deacon <will@kernel.org>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: hannes@cmpxchg.org, catalin.marinas@arm.com, will.deacon@arm.com,
	akpm@linux-foundation.org, xuyu@linux.alibaba.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH] mm: avoid access flag update TLB flush for retried page fault
Date: Wed, 8 Jul 2020 09:00:00 +0100	[thread overview]
Message-ID: <20200708075959.GA25498@willie-the-truck> (raw)
In-Reply-To: <1594148072-91273-1-git-send-email-yang.shi@linux.alibaba.com>

On Wed, Jul 08, 2020 at 02:54:32AM +0800, Yang Shi wrote:
> Recently we found regression when running will_it_scale/page_fault3 test
> on ARM64.  Over 70% down for the multi processes cases and over 20% down
> for the multi threads cases.  It turns out the regression is caused by commit
> 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
> calling balance_dirty_pages() in write fault").
> 
> The test mmaps a memory size file then write to the mapping, this would
> make all memory dirty and trigger dirty pages throttle, that upstream
> commit would release mmap_sem then retry the page fault.  The retried
> page fault would see correct PTEs installed by the first try then update
> access flags and flush TLBs.  The regression is caused by the excessive
> TLB flush.  It is fine on x86 since x86 doesn't need flush TLB for
> access flag update.
> 
> The page fault would be retried due to:
> 1. Waiting for page readahead
> 2. Waiting for page swapped in
> 3. Waiting for dirty pages throttling
> 
> The first two cases don't have PTEs set up at all, so the retried page
> fault would install the PTEs, so they don't reach there.  But the #3
> case usually has PTEs installed, the retried page fault would reach the
> access flag update.  But it seems not necessary to update access flags
> for #3 since retried page fault is not real "second access", so it
> sounds safe to skip access flag update for retried page fault.
> 
> With this fix the test result get back to normal.
> 
> Reported-by: Xu Yu <xuyu@linux.alibaba.com>
> Debugged-by: Xu Yu <xuyu@linux.alibaba.com>
> Tested-by: Xu Yu <xuyu@linux.alibaba.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> I'm not sure if this is safe for non-x86 machines, we did some tests on arm64, but
> there may be still corner cases not covered.
> 
>  mm/memory.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 87ec87c..3d4e671 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4241,8 +4241,13 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>  	if (vmf->flags & FAULT_FLAG_WRITE) {
>  		if (!pte_write(entry))
>  			return do_wp_page(vmf);
> -		entry = pte_mkdirty(entry);
>  	}
> +
> +	if ((vmf->flags & FAULT_FLAG_WRITE) && !(vmf->flags & FAULT_FLAG_TRIED))
> +		entry = pte_mkdirty(entry); 
> +	else if (vmf->flags & FAULT_FLAG_TRIED)
> +		goto unlock;
> +

Can you rewrite this as:

	if (vmf->flags & FAULT_FLAG_TRIED)
		goto unlock;

	if (vmf->flags & FAULT_FLAG_WRITE)
		entry = pte_mkdirty(entry);

? (I'm half-asleep this morning and there are people screaming and shouting
outside my window, so this might be rubbish)

If you _can_make that change, then I don't understand why the existing
pte_mkdirty() line needs to move at all. Couldn't you just add:

	if (vmf->flags & FAULT_FLAG_TRIED)
		goto unlock;

after the existing "vmf->flags & FAULT_FLAG_WRITE" block?

Will

next prev parent reply	other threads:[~2020-07-08  8:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 18:54 Yang Shi
2020-07-08  8:00 ` Will Deacon [this message]
2020-07-08 16:40   ` Yang Shi
2020-07-08 17:29     ` Catalin Marinas
2020-07-08 18:13       ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200708075959.GA25498@willie-the-truck \
    --to=will@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=will.deacon@arm.com \
    --cc=xuyu@linux.alibaba.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox