Re: [patch]x86: clearing access bit don't flush tlb

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Shaohua Li <shli@kernel.org>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hughd@google.com,
	mel@csn.ul.ie
Subject: Re: [patch]x86: clearing access bit don't flush tlb
Date: Fri, 28 Mar 2014 01:12:37 +0800	[thread overview]
Message-ID: <20140327171237.GA9490@kernel.org> (raw)
In-Reply-To: <53336907.1050105@redhat.com>

On Wed, Mar 26, 2014 at 07:55:51PM -0400, Rik van Riel wrote:
> On 03/26/2014 06:30 PM, Shaohua Li wrote:
> >
> >I posted this patch a year ago or so, but it gets lost. Repost it here to check
> >if we can make progress this time.
> 
> I believe we can make progress. However, I also
> believe the code could be enhanced to address a
> concern that Hugh raised last time this was
> proposed...
> 
> >And according to intel manual, tlb has less than 1k entries, which covers < 4M
> >memory. In today's system, several giga byte memory is normal. After page
> >reclaim clears pte access bit and before cpu access the page again, it's quite
> >unlikely this page's pte is still in TLB. And context swich will flush tlb too.
> >The chance skiping tlb flush to impact page reclaim should be very rare.
> 
> Context switch to a kernel thread does not result in a
> TLB flush, due to the lazy TLB code.
> 
> While I agree with you that clearing the TLB right at
> the moment the accessed bit is cleared in a PTE is
> not necessary, I believe it would be good to clear
> the TLB on affected CPUs relatively soon, maybe at the
> next time schedule is called?
> 
> >--- linux.orig/arch/x86/mm/pgtable.c	2014-03-27 05:22:08.572100549 +0800
> >+++ linux/arch/x86/mm/pgtable.c	2014-03-27 05:46:12.456131121 +0800
> >@@ -399,13 +399,12 @@ int pmdp_test_and_clear_young(struct vm_
> >  int ptep_clear_flush_young(struct vm_area_struct *vma,
> >  			   unsigned long address, pte_t *ptep)
> >  {
> >-	int young;
> >-
> >-	young = ptep_test_and_clear_young(vma, address, ptep);
> >-	if (young)
> >-		flush_tlb_page(vma, address);
> >-
> >-	return young;
> >+	/*
> >+	 * In X86, clearing access bit without TLB flush doesn't cause data
> >+	 * corruption. Doing this could cause wrong page aging and so hot pages
> >+	 * are reclaimed, but the chance should be very rare.
> >+	 */
> >+	return ptep_test_and_clear_young(vma, address, ptep);
> >  }
> 
> 
> At this point, we could use vma->vm_mm->cpu_vm_mask_var to
> set (or clear) some bit in the per-cpu data of each CPU that
> has active/valid tlb state for the mm in question.
> 
> I could see using cpu_tlbstate.state for this, or maybe
> another variable in cpu_tlbstate, so switch_mm will load
> both items with the same cache line.
> 
> At schedule time, the function switch_mm() can examine that
> variable (it already touches that data, anyway), and flush
> the TLB even if prev==next.
> 
> I suspect that would be both low overhead enough to get you
> the performance gains you want, and address the concern that
> we do want to flush the TLB at some point.
> 
> Does that sound reasonable?

So looks what you suggested is to force tlb flush for a mm with access bit
cleared in two corner cases:
1. lazy tlb flush
2. context switch between threads from one process

Am I missing anything? I'm wonering if we should care about these corner cases.
On the other hand, a thread might run long time without schedule. If the corner
cases are an issue, the long run thread is a severer issue. My point is context
switch does provide a safeguard, but we don't depend on it. The whole theory at
the back of this patch is page which has access bit cleared is unlikely
accessed again when its pte entry is still in tlb cache.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-03-27 17:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-26 22:30 Shaohua Li
2014-03-26 23:55 ` Rik van Riel
2014-03-27 17:12   ` Shaohua Li [this message]
2014-03-27 18:41     ` Rik van Riel
2014-03-28 19:02       ` Shaohua Li
2014-03-30 12:58         ` Rik van Riel
2014-03-31  2:16           ` Shaohua Li
2014-04-02 13:01 ` Mel Gorman
2014-04-02 15:42   ` Hugh Dickins
2014-04-03  0:42 Shaohua Li
2014-04-03 11:35 ` [patch] x86: " Ingo Molnar
2014-04-03 13:45   ` Shaohua Li
2014-04-04 15:01     ` Johannes Weiner
2014-04-08  7:58   ` Shaohua Li
2014-04-14 11:36     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140327171237.GA9490@kernel.org \
    --to=shli@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox