linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
Date: Thu, 7 Jul 2022 22:56:25 -0700	[thread overview]
Message-ID: <575B908D-A29B-40B0-9A80-76B7E7A9762E@gmail.com> (raw)
In-Reply-To: <904C4BCE-78E7-4FEE-BD8D-03DCE75A5B8B@gmail.com>

On Jul 7, 2022, at 9:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:

> On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:
> 
>> On Mon, 6 Jun 2022, Nadav Amit wrote:
>> 
>>> From: Nadav Amit <namit@vmware.com>
>>> 
>>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>>> contended and reading it should (arguably) be avoided as much as
>>> possible.
>>> 
>>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>>> even when it is not necessary (e.g., the mm was already switched).
>>> This is wasteful.
>>> 
>>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>>> see if there are additional in-flight TLB invalidations and flush the
>>> entire TLB in such a case. However, if the request's tlb_gen was already
>>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>>> by the overhead of the check itself.
>>> 
>>> Running will-it-scale with tlb_flush1_threads show a considerable
>>> benefit on 56-core Skylake (up to +24%):
>>> 
>>> threads		Baseline (v5.17+)	+Patch
>>> 1		159960			160202
>>> 5		310808			308378 (-0.7%)
>>> 10		479110			490728
>>> 15		526771			562528
>>> 20		534495			587316
>>> 25		547462			628296
>>> 30		579616			666313
>>> 35		594134			701814
>>> 40		612288			732967
>>> 45		617517			749727
>>> 50		637476			735497
>>> 55		614363			778913 (+24%)
>>> 
>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Andy Lutomirski <luto@kernel.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: x86@kernel.org
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>> 
>>> --
>>> 
>>> Note: The benchmarked kernels include Dave's revert of commit
>>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>>> tlb_is_not_lazy()
>>> ---
>>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>>> index d400b6d9d246..d9314cc8b81f 100644
>>> --- a/arch/x86/mm/tlb.c
>>> +++ b/arch/x86/mm/tlb.c
>>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>>> 	const struct flush_tlb_info *f = info;
>>> 	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>>> 	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>>> -	u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>> 	u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>>> 	bool local = smp_processor_id() == f->initiating_cpu;
>>> 	unsigned long nr_invalidate = 0;
>>> +	u64 mm_tlb_gen;
>>> 
>>> 	/* This code cannot presently handle being reentered. */
>>> 	VM_WARN_ON(!irqs_disabled());
>>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>>> 		return;
>>> 	}
>>> 
>>> +	if (f->new_tlb_gen <= local_tlb_gen) {
>>> +		/*
>>> +		 * The TLB is already up to date in respect to f->new_tlb_gen.
>>> +		 * While the core might be still behind mm_tlb_gen, checking
>>> +		 * mm_tlb_gen unnecessarily would have negative caching effects
>>> +		 * so avoid it.
>>> +		 */
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Defer mm_tlb_gen reading as long as possible to avoid cache
>>> +	 * contention.
>>> +	 */
>>> +	mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>> +
>>> 	if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>>> 		/*
>>> 		 * There's nothing to do: we're already up to date.  This can
>>> -- 
>>> 2.25.1
>> 
>> I'm sorry, but bisection and reversion show that this commit,
>> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
>> is responsible for the "internal compiler error: Segmentation fault"s
>> I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
>> 
>> That tmpfs is using huge pages as much as it can, so splitting and
>> collapsing, compaction and page migration entailed, in case that's
>> relevant (maybe this commit is perfect, but there's a TLB flushing
>> bug over there in mm which this commit just exposes).
>> 
>> Whether those segfaults happen without the huge page element,
>> I have not done enough testing to tell - there are other bugs with
>> swapping in current linux-next, indeed, I wouldn't even have found
>> this one, if I hadn't already been on a bisection for another bug,
>> and got thrown off course by these segfaults.
>> 
>> I hope that you can work out what might be wrong with this,
>> but meantime I think it needs to be reverted.
> 
> I find it always surprising how trivial one liners fail.
> 
> As you probably know, debugging these kind of things is hard. I see two
> possible cases:
> 
> 1. The failure is directly related to this optimization. The immediate
> suspect in my mind is something to do with PCID/ASID.
> 
> 2. The failure is due to another bug that was papered by “enough” TLB
> flushes.
> 
> I will look into the code. But if it is possible, it would be helpful to
> know whether you get the failure with the “nopcid” kernel parameter. If it
> passes, it wouldn’t say much, but if it fails, I think (2) is more likely.
> 
> Not arguing about a revert, but, in some way, if the test fails, it can
> indicate that the optimization “works”…
> 
> I’ll put some time to look deeper into the code, but it would be very
> helpful if you can let me know what happens with nopcid.

Actually, only using “nopcid” would most likely make it go away if we have
PTI enabled. So to get a good indication, a check whether it reproduces with
“nopti” and “nopcid” is needed.

I don’t have a better answer yet. Still trying to see what might have gone
wrong.

  reply	other threads:[~2022-07-08  5:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220606180123.2485171-1-namit@vmware.com>
2022-07-08  3:27 ` Hugh Dickins
2022-07-08  4:23   ` Nadav Amit
2022-07-08  5:56     ` Nadav Amit [this message]
2022-07-08  6:59       ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575B908D-A29B-40B0-9A80-76B7E7A9762E@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox