[PATCH] x86/mm/tlb: ignore f->new_tlb

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
@ 2022-07-08  0:30 Nadav Amit
  2022-07-08 11:40 ` David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Nadav Amit @ 2022-07-08  0:30 UTC (permalink / raw)
  To: linux-kernel, Hugh Dickins, Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit,
	Dave Hansen, Peter Zijlstra, Andy Lutomirski

From: Nadav Amit <namit@vmware.com>

Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
possible") introduced an optimization of skipping the flush if the TLB
generation that is flushed (as provided in flush_tlb_info) was already
flushed.

However, arch_tlbbatch_flush() does not provide any generation in
flush_tlb_info. As a result, try_to_unmap_one() would not perform any
TLB flushes.

Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
anyhow is an invalid generation value.

In addition, add the missing unlikely() and jump to get tracing right.

Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/mm/tlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d9314cc8b81f..d81b4084bb8a 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -771,14 +771,14 @@ static void flush_tlb_func(void *info)
 		return;
 	}
 
-	if (f->new_tlb_gen <= local_tlb_gen) {
+	if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) {
 		/*
 		 * The TLB is already up to date in respect to f->new_tlb_gen.
 		 * While the core might be still behind mm_tlb_gen, checking
 		 * mm_tlb_gen unnecessarily would have negative caching effects
 		 * so avoid it.
 		 */
-		return;
+		goto done;
 	}
 
 	/*
-- 
2.25.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08  0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit
@ 2022-07-08 11:40 ` David Hildenbrand
  2022-07-08 15:13   ` Dave Hansen
  2022-07-08 14:49 ` Dave Hansen
  2022-07-08 19:21 ` Hugh Dickins
  2 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2022-07-08 11:40 UTC (permalink / raw)
  To: Nadav Amit, linux-kernel, Hugh Dickins, Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit,
	Dave Hansen, Peter Zijlstra, Andy Lutomirski

On 08.07.22 02:30, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
> 
> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
> possible") introduced an optimization of skipping the flush if the TLB
> generation that is flushed (as provided in flush_tlb_info) was already
> flushed.
> 
> However, arch_tlbbatch_flush() does not provide any generation in
> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
> TLB flushes.
> 
> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
> anyhow is an invalid generation value.
> 
> In addition, add the missing unlikely() and jump to get tracing right.
> 
> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
> Reported-by: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  arch/x86/mm/tlb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index d9314cc8b81f..d81b4084bb8a 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info)
>  		return;
>  	}
>  
> -	if (f->new_tlb_gen <= local_tlb_gen) {
> +	if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) {
>  		/*
>  		 * The TLB is already up to date in respect to f->new_tlb_gen.
>  		 * While the core might be still behind mm_tlb_gen, checking
>  		 * mm_tlb_gen unnecessarily would have negative caching effects
>  		 * so avoid it.
>  		 */
> -		return;
> +		goto done;

Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb:
Avoid reading mm_tlb_gen when possible")?

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 11:40 ` David Hildenbrand
@ 2022-07-08 15:13   ` Dave Hansen
  2022-07-08 16:54     ` Nadav Amit
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-07-08 15:13 UTC (permalink / raw)
  To: David Hildenbrand, Nadav Amit, linux-kernel, Hugh Dickins,
	Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit,
	Dave Hansen, Peter Zijlstra, Andy Lutomirski

On 7/8/22 04:40, David Hildenbrand wrote:
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index d9314cc8b81f..d81b4084bb8a 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info)
>>  		return;
>>  	}
>>  
>> -	if (f->new_tlb_gen <= local_tlb_gen) {
>> +	if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) {
>>  		/*
>>  		 * The TLB is already up to date in respect to f->new_tlb_gen.
>>  		 * While the core might be still behind mm_tlb_gen, checking
>>  		 * mm_tlb_gen unnecessarily would have negative caching effects
>>  		 * so avoid it.
>>  		 */
>> -		return;
>> +		goto done;
> Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb:
> Avoid reading mm_tlb_gen when possible")?

It depends on how many batched flushes that workload had.  From the
looks of it, they're all one page:

	madvise(addr + i, pgsize, MADV_DONTNEED);

so there shouldn't be *much* batching in play.  But, it wouldn't hurt to
re-run them in either case.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 15:13   ` Dave Hansen
@ 2022-07-08 16:54     ` Nadav Amit
  2022-07-08 17:01       ` Dave Hansen
  0 siblings, 1 reply; 14+ messages in thread
From: Nadav Amit @ 2022-07-08 16:54 UTC (permalink / raw)
  To: Dave Hansen
  Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Dave Hansen,
	Peter Zijlstra, Andy Lutomirski

On Jul 8, 2022, at 8:13 AM, Dave Hansen <dave.hansen@intel.com> wrote:

> ⚠ External Email
> 
> On 7/8/22 04:40, David Hildenbrand wrote:
>>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>>> index d9314cc8b81f..d81b4084bb8a 100644
>>> --- a/arch/x86/mm/tlb.c
>>> +++ b/arch/x86/mm/tlb.c
>>> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info)
>>>             return;
>>>     }
>>> 
>>> -    if (f->new_tlb_gen <= local_tlb_gen) {
>>> +    if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) {
>>>             /*
>>>              * The TLB is already up to date in respect to f->new_tlb_gen.
>>>              * While the core might be still behind mm_tlb_gen, checking
>>>              * mm_tlb_gen unnecessarily would have negative caching effects
>>>              * so avoid it.
>>>              */
>>> -            return;
>>> +            goto done;
>> Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb:
>> Avoid reading mm_tlb_gen when possible")?
> 
> It depends on how many batched flushes that workload had.  From the
> looks of it, they're all one page:
> 
>        madvise(addr + i, pgsize, MADV_DONTNEED);
> 
> so there shouldn't be *much* batching in play.  But, it wouldn't hurt to
> re-run them in either case.

Just to clarify, since these things are confusing.

There are two batching mechanisms. The common one is mmu_gather, which
MADV_DONTNEED uses. This one is *not* the one that caused the breakage.

The second one is the “unmap_batch”, which was only used by x86 until now.
(I just saw patches for ARM, but I think they just exploit the interface in
a way). The “unmap_batch” is used when you swap out. This was broken.

Since the bug was not during MADV_DONTNEED there is no reason for the
results to be any different.

Famous last words?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 16:54     ` Nadav Amit
@ 2022-07-08 17:01       ` Dave Hansen
  2022-07-08 17:09         ` Nadav Amit
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-07-08 17:01 UTC (permalink / raw)
  To: Nadav Amit
  Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Dave Hansen,
	Peter Zijlstra, Andy Lutomirski

On 7/8/22 09:54, Nadav Amit wrote:
> Since the bug was not during MADV_DONTNEED there is no reason for the
> results to be any different.
> 
> Famous last words?

Considering that your patch broke the kernel a way that surprised us
all, I think caution is warranted.  Re-running a microbenchmark that
takes five minutes and stresses things a bit is the least you can do, I
think.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 17:01       ` Dave Hansen
@ 2022-07-08 17:09         ` Nadav Amit
  2022-07-08 18:03           ` Nadav Amit
  0 siblings, 1 reply; 14+ messages in thread
From: Nadav Amit @ 2022-07-08 17:09 UTC (permalink / raw)
  To: Dave Hansen
  Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Dave Hansen,
	Peter Zijlstra, Andy Lutomirski

On Jul 8, 2022, at 10:01 AM, Dave Hansen <dave.hansen@intel.com> wrote:

> ⚠ External Email
> 
> On 7/8/22 09:54, Nadav Amit wrote:
>> Since the bug was not during MADV_DONTNEED there is no reason for the
>> results to be any different.
>> 
>> Famous last words?
> 
> Considering that your patch broke the kernel a way that surprised us
> all, I think caution is warranted.  Re-running a microbenchmark that
> takes five minutes and stresses things a bit is the least you can do, I
> think.

I will send it later today. I was just pointing that the failing code-path
is different than the one I measured.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 17:09         ` Nadav Amit
@ 2022-07-08 18:03           ` Nadav Amit
  0 siblings, 0 replies; 14+ messages in thread
From: Nadav Amit @ 2022-07-08 18:03 UTC (permalink / raw)
  To: Dave Hansen
  Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Peter Zijlstra,
	Andy Lutomirski

On Jul 8, 2022, at 10:09 AM, Nadav Amit <namit@vmware.com> wrote:

> On Jul 8, 2022, at 10:01 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
>> ⚠ External Email
>> 
>> On 7/8/22 09:54, Nadav Amit wrote:
>>> Since the bug was not during MADV_DONTNEED there is no reason for the
>>> results to be any different.
>>> 
>>> Famous last words?
>> 
>> Considering that your patch broke the kernel a way that surprised us
>> all, I think caution is warranted.  Re-running a microbenchmark that
>> takes five minutes and stresses things a bit is the least you can do, I
>> think.
> 
> I will send it later today. I was just pointing that the failing code-path
> is different than the one I measured.

It will take some more time, since 5.19 does not want to boot on my machine,
and results from VMs are meaningless for this patch. I would look into this
unrelated failure, unless you want results from 5.18.

[    6.303945] ------------[ cut here ]------------
[    6.309209] kernel BUG at arch/x86/kernel/apic/apic.c:1598!
[    6.315537] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[    6.321275] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc4TLB+ #5
[    6.328760] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.9.1 12/04/2018
[    6.337236] RIP: 0010:setup_local_APIC+0x31e/0x330
[    6.342686] Code: 01 0f 85 05 ff ff ff 85 d2 7f 2b 48 8b 05 aa 37 4f 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 37 99 fb 00 e9 04 ff ff ff <0f> 0b e8 5b 2d be 00 e9 ba 64 b8 0
[    6.363818] RSP: 0000:ffffffff82603e88 EFLAGS: 00010246
[    6.369752] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    6.377820] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020
[    6.385888] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff82603da8
[    6.393956] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000031
[    6.402024] R13: 0000000000000000 R14: ffffffff82613118 R15: 0000000000000000
[    6.410091] FS:  0000000000000000(0000) GS:ffff889fff600000(0000) knlGS:0000000000000000
[    6.419250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.425765] CR2: ffff88c07ffff000 CR3: 000000000260c001 CR4: 00000000000606f0
[    6.433826] Call Trace:
[    6.436646]  <TASK>
[    6.439077]  ? _printk+0x53/0x6a
[    6.442777]  apic_intr_mode_init+0xd2/0xf1
[    6.447448]  x86_late_time_init+0x1b/0x2b
[    6.452019]  start_kernel+0x5d8/0x694
[    6.456194]  secondary_startup_64_no_verify+0xce/0xdb
[    6.461933]  </TASK>
[    6.464463] Modules linked in:
[    6.467979] ---[ end trace 0000000000000000 ]---
[    6.473243] RIP: 0010:setup_local_APIC+0x31e/0x330
[    6.478704] Code: 01 0f 85 05 ff ff ff 85 d2 7f 2b 48 8b 05 aa 37 4f 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 37 99 fb 00 e9 04 ff ff ff <0f> 0b e8 5b 2d be 00 e9 ba 64 b8 0
[    6.499865] RSP: 0000:ffffffff82603e88 EFLAGS: 00010246
[    6.505803] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    6.513887] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020
[    6.521969] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff82603da8
[    6.530053] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000031
[    6.538136] R13: 0000000000000000 R14: ffffffff82613118 R15: 0000000000000000
[    6.546218] FS:  0000000000000000(0000) GS:ffff889fff600000(0000) knlGS:0000000000000000
[    6.555391] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.561919] CR2: ffff88c07ffff000 CR3: 000000000260c001 CR4: 00000000000606f0
[    6.570003] Kernel panic - not syncing: Attempted to kill the idle task!
[    6.577591] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08  0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit
  2022-07-08 11:40 ` David Hildenbrand
@ 2022-07-08 14:49 ` Dave Hansen
  2022-07-08 17:04   ` Nadav Amit
  2022-07-08 19:21 ` Hugh Dickins
  2 siblings, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-07-08 14:49 UTC (permalink / raw)
  To: Nadav Amit, linux-kernel, Hugh Dickins, Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit,
	Dave Hansen, Peter Zijlstra, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]

On 7/7/22 17:30, Nadav Amit wrote:

You might want to fix the clock on the system from which you sent this.
 I was really scratching my head trying to figure out how you got this
patch out before Hugh's bug report.

> From: Nadav Amit <namit@vmware.com>
> 
> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
> possible") introduced an optimization of skipping the flush if the TLB
> generation that is flushed (as provided in flush_tlb_info) was already
> flushed.
> 
> However, arch_tlbbatch_flush() does not provide any generation in
> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
> TLB flushes.
> 
> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
> anyhow is an invalid generation value.

It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker
for f->new_tlb_gen being invalid.  Being consistent seems like a good
idea on this stuff.

> In addition, add the missing unlikely() and jump to get tracing right.

There are currently five routes out of flush_tlb_func():
 * Three early returns
 * One 'goto done'
 * One implicit return

The tracing code doesn't get run for any of the early returns, but
that's intentional because they don't *actually* flush the TLB.  We
don't want to record that flush_tlb_func() flushed the TLB when it
didn't.  There's another tracepoint on the TLB_REMOTE_SEND_IPI side to
tell where the flushes were requested.

That said, I think the

	if (unlikely(local_tlb_gen == mm_tlb_gen))
		goto done;

is arguably buggy, as is the 'goto done' in this hunk:

>  arch/x86/mm/tlb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index d9314cc8b81f..d81b4084bb8a 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info)
>  		return;
>  	}
>  
> -	if (f->new_tlb_gen <= local_tlb_gen) {
> +	if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) {
>  		/*
>  		 * The TLB is already up to date in respect to f->new_tlb_gen.
>  		 * While the core might be still behind mm_tlb_gen, checking
>  		 * mm_tlb_gen unnecessarily would have negative caching effects
>  		 * so avoid it.
>  		 */
> -		return;
> +		goto done;
>  	}
>  
>  	/*

We might want to (eventually) think about doing something like the
attached patch to make the skipped flushes explicit in the tracing and
make the return paths out of this function more consistent.

[-- Attachment #2: tlbtrace.patch --]
[-- Type: text/x-patch, Size: 2408 bytes --]

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d400b6d9d246..44ba73601f50 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -720,7 +720,7 @@ void initialize_tlbstate_and_flush(void)
  * because all x86 flush operations are serializing and the
  * atomic64_read operation won't be reordered by the compiler.
  */
-static void flush_tlb_func(void *info)
+static flush_tlb_func(void *info)
 {
 	/*
 	 * We have three different tlb_gen values in here.  They are:
@@ -738,6 +738,7 @@ static void flush_tlb_func(void *info)
 	u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
 	bool local = smp_processor_id() == f->initiating_cpu;
 	unsigned long nr_invalidate = 0;
+	enum tlb_flush_reason;
 
 	/* This code cannot presently handle being reentered. */
 	VM_WARN_ON(!irqs_disabled());
@@ -747,12 +748,21 @@ static void flush_tlb_func(void *info)
 		count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
 
 		/* Can only happen on remote CPUs */
-		if (f->mm && f->mm != loaded_mm)
-			return;
+		if (f->mm && f->mm != loaded_mm) {
+			flush_reason = TLB_FLUSH_SKIPPED;
+			goto done;
+		}
+		flush_reason = TLB_REMOTE_SHOOTDOWN;
+	} else if (f->mm == NULL) {
+		flush_reason = TLB_LOCAL_SHOOTDOWN;
+	} else {
+		flush_reason = TLB_LOCAL_MM_SHOOTDOWN;
 	}
 
-	if (unlikely(loaded_mm == &init_mm))
-		return;
+	if (unlikely(loaded_mm == &init_mm)) {
+		flush_reason = TLB_FLUSH_SKIPPED;
+		goto done;
+	}
 
 	VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) !=
 		   loaded_mm->context.ctx_id);
@@ -768,7 +778,8 @@ static void flush_tlb_func(void *info)
 		 * IPIs to lazy TLB mode CPUs.
 		 */
 		switch_mm_irqs_off(NULL, &init_mm, NULL);
-		return;
+		flush_reason = TLB_FLUSH_SKIPPED;
+		goto done;
 	}
 
 	if (unlikely(local_tlb_gen == mm_tlb_gen)) {
@@ -778,6 +789,7 @@ static void flush_tlb_func(void *info)
 		 * be handled can catch us all the way up, leaving no work for
 		 * the second flush.
 		 */
+		flush_reason = TLB_FLUSH_SKIPPED;
 		goto done;
 	}
 
@@ -849,10 +861,7 @@ static void flush_tlb_func(void *info)
 
 	/* Tracing is done in a unified manner to reduce the code size */
 done:
-	trace_tlb_flush(!local ? TLB_REMOTE_SHOOTDOWN :
-				(f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN :
-						  TLB_LOCAL_MM_SHOOTDOWN,
-			nr_invalidate);
+	trace_tlb_flush(flush_reason, nr_invalidate);
 }
 
 static bool tlb_is_not_lazy(int cpu, void *data)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 14:49 ` Dave Hansen
@ 2022-07-08 17:04   ` Nadav Amit
  2022-07-08 18:54     ` Dave Hansen
  0 siblings, 1 reply; 14+ messages in thread
From: Nadav Amit @ 2022-07-08 17:04 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra,
	Andy Lutomirski

On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote:

> ⚠ External Email
> 
> On 7/7/22 17:30, Nadav Amit wrote:
> 
> You might want to fix the clock on the system from which you sent this.
> I was really scratching my head trying to figure out how you got this
> patch out before Hugh's bug report.
> 
>> From: Nadav Amit <namit@vmware.com>
>> 
>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
>> possible") introduced an optimization of skipping the flush if the TLB
>> generation that is flushed (as provided in flush_tlb_info) was already
>> flushed.
>> 
>> However, arch_tlbbatch_flush() does not provide any generation in
>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
>> TLB flushes.
>> 
>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
>> anyhow is an invalid generation value.
> 
> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker
> for f->new_tlb_gen being invalid.  Being consistent seems like a good
> idea on this stuff.

If we get a request to do a flush, regardless whether full or partial,
that logically we have already done, there is not reason to do it.

I therefore do not see a reason to look on f->end. I think that looking
at the generation is very intuitive. If you want, I can add a constant
such as TLB_GENERATION_INVALID.

> 
>> In addition, add the missing unlikely() and jump to get tracing right.
> 
> There are currently five routes out of flush_tlb_func():
> * Three early returns
> * One 'goto done'
> * One implicit return
> 
> The tracing code doesn't get run for any of the early returns, but
> that's intentional because they don't *actually* flush the TLB.  We
> don't want to record that flush_tlb_func() flushed the TLB when it
> didn't.  There's another tracepoint on the TLB_REMOTE_SEND_IPI side to
> tell where the flushes were requested.
> 
> That said, I think the
> 
>        if (unlikely(local_tlb_gen == mm_tlb_gen))
>                goto done;
> 
> is arguably buggy, as is the 'goto done' in this hunk:

I was just trying to follow it for consistency. Will remove.

> 
> We might want to (eventually) think about doing something like the
> attached patch to make the skipped flushes explicit in the tracing and
> make the return paths out of this function more consistent.

That’s fine with me. I just recommend that you have a single tracing call in
the function, since having too many ruins the generated code.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 17:04   ` Nadav Amit
@ 2022-07-08 18:54     ` Dave Hansen
  2022-07-11  5:19       ` Nadav Amit
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-07-08 18:54 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra,
	Andy Lutomirski

On 7/8/22 10:04, Nadav Amit wrote:
> On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>> On 7/7/22 17:30, Nadav Amit wrote:
>> You might want to fix the clock on the system from which you sent this.
>> I was really scratching my head trying to figure out how you got this
>> patch out before Hugh's bug report.
>>
>>> From: Nadav Amit <namit@vmware.com>
>>>
>>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
>>> possible") introduced an optimization of skipping the flush if the TLB
>>> generation that is flushed (as provided in flush_tlb_info) was already
>>> flushed.
>>>
>>> However, arch_tlbbatch_flush() does not provide any generation in
>>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
>>> TLB flushes.
>>>
>>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
>>> anyhow is an invalid generation value.
>>
>> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker
>> for f->new_tlb_gen being invalid.  Being consistent seems like a good
>> idea on this stuff.
> 
> If we get a request to do a flush, regardless whether full or partial,
> that logically we have already done, there is not reason to do it.
> 
> I therefore do not see a reason to look on f->end. I think that looking
> at the generation is very intuitive. If you want, I can add a constant
> such as TLB_GENERATION_INVALID.

That's a good point.

But, _my_ point was that there was only really one read site of
f->new_tlb_gen in flush_tlb_func().  That site is guarded by the "f->end
!= TLB_FLUSH_ALL" check which prevented it from making the same error
that your patch did.

Whatever we do, it would be nice to have a *single* way to check for
"does f->new_tlb_gen have an actual, valid bit of tlb gen data in it?"

Using something like TLB_GENERATION_INVALID seems reasonable to me.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 18:54     ` Dave Hansen
@ 2022-07-11  5:19       ` Nadav Amit
  0 siblings, 0 replies; 14+ messages in thread
From: Nadav Amit @ 2022-07-11  5:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra,
	Andy Lutomirski

On Jul 8, 2022, at 11:54 AM, Dave Hansen <dave.hansen@intel.com> wrote:

> ⚠ External Email
> 
> On 7/8/22 10:04, Nadav Amit wrote:
>> On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>> On 7/7/22 17:30, Nadav Amit wrote:
>>> You might want to fix the clock on the system from which you sent this.
>>> I was really scratching my head trying to figure out how you got this
>>> patch out before Hugh's bug report.
>>> 
>>>> From: Nadav Amit <namit@vmware.com>
>>>> 
>>>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
>>>> possible") introduced an optimization of skipping the flush if the TLB
>>>> generation that is flushed (as provided in flush_tlb_info) was already
>>>> flushed.
>>>> 
>>>> However, arch_tlbbatch_flush() does not provide any generation in
>>>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
>>>> TLB flushes.
>>>> 
>>>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
>>>> anyhow is an invalid generation value.
>>> 
>>> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker
>>> for f->new_tlb_gen being invalid.  Being consistent seems like a good
>>> idea on this stuff.
>> 
>> If we get a request to do a flush, regardless whether full or partial,
>> that logically we have already done, there is not reason to do it.
>> 
>> I therefore do not see a reason to look on f->end. I think that looking
>> at the generation is very intuitive. If you want, I can add a constant
>> such as TLB_GENERATION_INVALID.
> 
> That's a good point.
> 
> But, _my_ point was that there was only really one read site of
> f->new_tlb_gen in flush_tlb_func().  That site is guarded by the "f->end
> != TLB_FLUSH_ALL" check which prevented it from making the same error
> that your patch did.
> 
> Whatever we do, it would be nice to have a *single* way to check for
> "does f->new_tlb_gen have an actual, valid bit of tlb gen data in it?"
> 
> Using something like TLB_GENERATION_INVALID seems reasonable to me.

I am not sure that I fully understood what you meant. I understand you do
want TLB_GENERATION_INVALID, and I think you ask for some assertions to
regard to the expected relationship between TLB_GENERATION_INVALID and
TLB_FLUSH_ALL. I think that in order to shorten the discussion, I’ll send v2
(very soon) and you will comment on the patch itself.

I did run the tests again to measure performance as you asked for, and the
results are similar (will-it-scale tlb_flush1/threads): 11% speedup with 45
cores.

Without aa44284960d5:

tasks,processes,processes_idle,threads,threads_idle,linear

0,0,100,0,100,0
1,156782,97.89,157024,97.92,157024
5,707879,89.60,322015,89.69,785120
10,1311968,79.21,490881,79.68,1570240
15,1498762,68.82,553958,69.71,2355360
20,1483057,58.45,598939,60.00,3140480
25,1428105,48.07,626179,50.46,3925600
30,1648992,37.77,643954,41.36,4710720
35,1861301,27.50,671570,32.63,5495840
40,2038278,17.17,669470,24.03,6280960
45,2140412,6.71,673633,15.27,7066080

With aa44284960d5 + pending patch:

0,0,100,0,100,0
1,157935,97.93,155351,97.93,157935
5,710450,89.60,324981,89.70,789675
10,1291935,79.21,498496,79.57,1579350
15,1515323,68.81,575821,69.68,2369025
20,1545172,58.46,625521,60.05,3158700
25,1501015,48.09,675856,50.62,3948375
30,1682308,37.80,705242,41.55,4738050
35,1850464,27.52,717724,32.33,5527725
40,2030726,17.17,734610,23.99,6317400
45,2136401,6.71,747257,16.07,7107075



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08  0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit
  2022-07-08 11:40 ` David Hildenbrand
  2022-07-08 14:49 ` Dave Hansen
@ 2022-07-08 19:21 ` Hugh Dickins
  2022-07-08 20:02   ` Nadav Amit
  2 siblings, 1 reply; 14+ messages in thread
From: Hugh Dickins @ 2022-07-08 19:21 UTC (permalink / raw)
  To: Nadav Amit
  Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Linux MM, Nadav Amit, Dave Hansen,
	Peter Zijlstra, Andy Lutomirski, Andrew Morton

On Thu, 7 Jul 2022, Nadav Amit wrote:

> From: Nadav Amit <namit@vmware.com>
> 
> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
> possible") introduced an optimization of skipping the flush if the TLB
> generation that is flushed (as provided in flush_tlb_info) was already
> flushed.
> 
> However, arch_tlbbatch_flush() does not provide any generation in
> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
> TLB flushes.
> 
> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
> anyhow is an invalid generation value.
> 
> In addition, add the missing unlikely() and jump to get tracing right.
> 
> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
> Reported-by: Hugh Dickins <hughd@google.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Nadav Amit <namit@vmware.com>

Thanks a lot for your rapid response and thinking it through
(before I got around to any "nopcid" or "nopti" experiments).

I've been testing this one for a few hours now, and no problems seen.
I expect you'll be sending another version, maybe next week, meeting
Dave's concerns; but wanted to reassure that you have correctly
identified the issue and fixed it with this - thanks.

Hugh

> ---
>  arch/x86/mm/tlb.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index d9314cc8b81f..d81b4084bb8a 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info)
>  		return;
>  	}
>  
> -	if (f->new_tlb_gen <= local_tlb_gen) {
> +	if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) {
>  		/*
>  		 * The TLB is already up to date in respect to f->new_tlb_gen.
>  		 * While the core might be still behind mm_tlb_gen, checking
>  		 * mm_tlb_gen unnecessarily would have negative caching effects
>  		 * so avoid it.
>  		 */
> -		return;
> +		goto done;
>  	}
>  
>  	/*
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 19:21 ` Hugh Dickins
@ 2022-07-08 20:02   ` Nadav Amit
  2022-07-08 20:48     ` Hugh Dickins
  0 siblings, 1 reply; 14+ messages in thread
From: Nadav Amit @ 2022-07-08 20:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski,
	Andrew Morton

On Jul 8, 2022, at 12:21 PM, Hugh Dickins <hughd@google.com> wrote:

> ⚠ External Email
> 
> On Thu, 7 Jul 2022, Nadav Amit wrote:
> 
>> From: Nadav Amit <namit@vmware.com>
>> 
>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
>> possible") introduced an optimization of skipping the flush if the TLB
>> generation that is flushed (as provided in flush_tlb_info) was already
>> flushed.
>> 
>> However, arch_tlbbatch_flush() does not provide any generation in
>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
>> TLB flushes.
>> 
>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
>> anyhow is an invalid generation value.
>> 
>> In addition, add the missing unlikely() and jump to get tracing right.
>> 
>> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
>> Reported-by: Hugh Dickins <hughd@google.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
> 
> Thanks a lot for your rapid response and thinking it through
> (before I got around to any "nopcid" or "nopti" experiments).
> 
> I've been testing this one for a few hours now, and no problems seen.
> I expect you'll be sending another version, maybe next week, meeting
> Dave's concerns; but wanted to reassure that you have correctly
> identified the issue and fixed it with this - thanks.

Thanks, Hugh. Sorry again for my mistake.

Can I please have your “Tested-by”?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
  2022-07-08 20:02   ` Nadav Amit
@ 2022-07-08 20:48     ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2022-07-08 20:48 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Hugh Dickins, LKML, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra,
	Andy Lutomirski, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1837 bytes --]

On Fri, 8 Jul 2022, Nadav Amit wrote:
> On Jul 8, 2022, at 12:21 PM, Hugh Dickins <hughd@google.com> wrote:
> 
> > ⚠ External Email
> > 
> > On Thu, 7 Jul 2022, Nadav Amit wrote:
> > 
> >> From: Nadav Amit <namit@vmware.com>
> >> 
> >> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
> >> possible") introduced an optimization of skipping the flush if the TLB
> >> generation that is flushed (as provided in flush_tlb_info) was already
> >> flushed.
> >> 
> >> However, arch_tlbbatch_flush() does not provide any generation in
> >> flush_tlb_info. As a result, try_to_unmap_one() would not perform any
> >> TLB flushes.
> >> 
> >> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
> >> anyhow is an invalid generation value.
> >> 
> >> In addition, add the missing unlikely() and jump to get tracing right.
> >> 
> >> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
> >> Reported-by: Hugh Dickins <hughd@google.com>
> >> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> >> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> >> Cc: Andy Lutomirski <luto@kernel.org>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> > 
> > Thanks a lot for your rapid response and thinking it through
> > (before I got around to any "nopcid" or "nopti" experiments).
> > 
> > I've been testing this one for a few hours now, and no problems seen.
> > I expect you'll be sending another version, maybe next week, meeting
> > Dave's concerns; but wanted to reassure that you have correctly
> > identified the issue and fixed it with this - thanks.
> 
> Thanks, Hugh. Sorry again for my mistake.
> 
> Can I please have your “Tested-by”?

Sure, let me scrabble around in my box of tags, yes, here's one:

Tested-by: Hugh Dickins <hughd@google.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-11  5:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-08  0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit
2022-07-08 11:40 ` David Hildenbrand
2022-07-08 15:13   ` Dave Hansen
2022-07-08 16:54     ` Nadav Amit
2022-07-08 17:01       ` Dave Hansen
2022-07-08 17:09         ` Nadav Amit
2022-07-08 18:03           ` Nadav Amit
2022-07-08 14:49 ` Dave Hansen
2022-07-08 17:04   ` Nadav Amit
2022-07-08 18:54     ` Dave Hansen
2022-07-11  5:19       ` Nadav Amit
2022-07-08 19:21 ` Hugh Dickins
2022-07-08 20:02   ` Nadav Amit
2022-07-08 20:48     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox