* [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero
@ 2022-07-08 0:30 Nadav Amit
2022-07-08 11:40 ` David Hildenbrand
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Nadav Amit @ 2022-07-08 0:30 UTC (permalink / raw)
To: linux-kernel, Hugh Dickins, Thomas Gleixner
Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit,
Dave Hansen, Peter Zijlstra, Andy Lutomirski
From: Nadav Amit <namit@vmware.com>
Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when
possible") introduced an optimization of skipping the flush if the TLB
generation that is flushed (as provided in flush_tlb_info) was already
flushed.
However, arch_tlbbatch_flush() does not provide any generation in
flush_tlb_info. As a result, try_to_unmap_one() would not perform any
TLB flushes.
Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is
anyhow is an invalid generation value.
In addition, add the missing unlikely() and jump to get tracing right.
Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
arch/x86/mm/tlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d9314cc8b81f..d81b4084bb8a 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -771,14 +771,14 @@ static void flush_tlb_func(void *info)
return;
}
- if (f->new_tlb_gen <= local_tlb_gen) {
+ if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) {
/*
* The TLB is already up to date in respect to f->new_tlb_gen.
* While the core might be still behind mm_tlb_gen, checking
* mm_tlb_gen unnecessarily would have negative caching effects
* so avoid it.
*/
- return;
+ goto done;
}
/*
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit @ 2022-07-08 11:40 ` David Hildenbrand 2022-07-08 15:13 ` Dave Hansen 2022-07-08 14:49 ` Dave Hansen 2022-07-08 19:21 ` Hugh Dickins 2 siblings, 1 reply; 14+ messages in thread From: David Hildenbrand @ 2022-07-08 11:40 UTC (permalink / raw) To: Nadav Amit, linux-kernel, Hugh Dickins, Thomas Gleixner Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit, Dave Hansen, Peter Zijlstra, Andy Lutomirski On 08.07.22 02:30, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > possible") introduced an optimization of skipping the flush if the TLB > generation that is flushed (as provided in flush_tlb_info) was already > flushed. > > However, arch_tlbbatch_flush() does not provide any generation in > flush_tlb_info. As a result, try_to_unmap_one() would not perform any > TLB flushes. > > Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > anyhow is an invalid generation value. > > In addition, add the missing unlikely() and jump to get tracing right. > > Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") > Reported-by: Hugh Dickins <hughd@google.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Nadav Amit <namit@vmware.com> > --- > arch/x86/mm/tlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index d9314cc8b81f..d81b4084bb8a 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) > return; > } > > - if (f->new_tlb_gen <= local_tlb_gen) { > + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { > /* > * The TLB is already up to date in respect to f->new_tlb_gen. > * While the core might be still behind mm_tlb_gen, checking > * mm_tlb_gen unnecessarily would have negative caching effects > * so avoid it. > */ > - return; > + goto done; Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 11:40 ` David Hildenbrand @ 2022-07-08 15:13 ` Dave Hansen 2022-07-08 16:54 ` Nadav Amit 0 siblings, 1 reply; 14+ messages in thread From: Dave Hansen @ 2022-07-08 15:13 UTC (permalink / raw) To: David Hildenbrand, Nadav Amit, linux-kernel, Hugh Dickins, Thomas Gleixner Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit, Dave Hansen, Peter Zijlstra, Andy Lutomirski On 7/8/22 04:40, David Hildenbrand wrote: >> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >> index d9314cc8b81f..d81b4084bb8a 100644 >> --- a/arch/x86/mm/tlb.c >> +++ b/arch/x86/mm/tlb.c >> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) >> return; >> } >> >> - if (f->new_tlb_gen <= local_tlb_gen) { >> + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { >> /* >> * The TLB is already up to date in respect to f->new_tlb_gen. >> * While the core might be still behind mm_tlb_gen, checking >> * mm_tlb_gen unnecessarily would have negative caching effects >> * so avoid it. >> */ >> - return; >> + goto done; > Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb: > Avoid reading mm_tlb_gen when possible")? It depends on how many batched flushes that workload had. From the looks of it, they're all one page: madvise(addr + i, pgsize, MADV_DONTNEED); so there shouldn't be *much* batching in play. But, it wouldn't hurt to re-run them in either case. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 15:13 ` Dave Hansen @ 2022-07-08 16:54 ` Nadav Amit 2022-07-08 17:01 ` Dave Hansen 0 siblings, 1 reply; 14+ messages in thread From: Nadav Amit @ 2022-07-08 16:54 UTC (permalink / raw) To: Dave Hansen Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski On Jul 8, 2022, at 8:13 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/8/22 04:40, David Hildenbrand wrote: >>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c >>> index d9314cc8b81f..d81b4084bb8a 100644 >>> --- a/arch/x86/mm/tlb.c >>> +++ b/arch/x86/mm/tlb.c >>> @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) >>> return; >>> } >>> >>> - if (f->new_tlb_gen <= local_tlb_gen) { >>> + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { >>> /* >>> * The TLB is already up to date in respect to f->new_tlb_gen. >>> * While the core might be still behind mm_tlb_gen, checking >>> * mm_tlb_gen unnecessarily would have negative caching effects >>> * so avoid it. >>> */ >>> - return; >>> + goto done; >> Does this affect the performance numbers from aa44284960d5 ("x86/mm/tlb: >> Avoid reading mm_tlb_gen when possible")? > > It depends on how many batched flushes that workload had. From the > looks of it, they're all one page: > > madvise(addr + i, pgsize, MADV_DONTNEED); > > so there shouldn't be *much* batching in play. But, it wouldn't hurt to > re-run them in either case. Just to clarify, since these things are confusing. There are two batching mechanisms. The common one is mmu_gather, which MADV_DONTNEED uses. This one is *not* the one that caused the breakage. The second one is the “unmap_batch”, which was only used by x86 until now. (I just saw patches for ARM, but I think they just exploit the interface in a way). The “unmap_batch” is used when you swap out. This was broken. Since the bug was not during MADV_DONTNEED there is no reason for the results to be any different. Famous last words? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 16:54 ` Nadav Amit @ 2022-07-08 17:01 ` Dave Hansen 2022-07-08 17:09 ` Nadav Amit 0 siblings, 1 reply; 14+ messages in thread From: Dave Hansen @ 2022-07-08 17:01 UTC (permalink / raw) To: Nadav Amit Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski On 7/8/22 09:54, Nadav Amit wrote: > Since the bug was not during MADV_DONTNEED there is no reason for the > results to be any different. > > Famous last words? Considering that your patch broke the kernel a way that surprised us all, I think caution is warranted. Re-running a microbenchmark that takes five minutes and stresses things a bit is the least you can do, I think. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 17:01 ` Dave Hansen @ 2022-07-08 17:09 ` Nadav Amit 2022-07-08 18:03 ` Nadav Amit 0 siblings, 1 reply; 14+ messages in thread From: Nadav Amit @ 2022-07-08 17:09 UTC (permalink / raw) To: Dave Hansen Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski On Jul 8, 2022, at 10:01 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/8/22 09:54, Nadav Amit wrote: >> Since the bug was not during MADV_DONTNEED there is no reason for the >> results to be any different. >> >> Famous last words? > > Considering that your patch broke the kernel a way that surprised us > all, I think caution is warranted. Re-running a microbenchmark that > takes five minutes and stresses things a bit is the least you can do, I > think. I will send it later today. I was just pointing that the failing code-path is different than the one I measured. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 17:09 ` Nadav Amit @ 2022-07-08 18:03 ` Nadav Amit 0 siblings, 0 replies; 14+ messages in thread From: Nadav Amit @ 2022-07-08 18:03 UTC (permalink / raw) To: Dave Hansen Cc: David Hildenbrand, LKML, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML, Linux MM, Peter Zijlstra, Andy Lutomirski On Jul 8, 2022, at 10:09 AM, Nadav Amit <namit@vmware.com> wrote: > On Jul 8, 2022, at 10:01 AM, Dave Hansen <dave.hansen@intel.com> wrote: > >> ⚠ External Email >> >> On 7/8/22 09:54, Nadav Amit wrote: >>> Since the bug was not during MADV_DONTNEED there is no reason for the >>> results to be any different. >>> >>> Famous last words? >> >> Considering that your patch broke the kernel a way that surprised us >> all, I think caution is warranted. Re-running a microbenchmark that >> takes five minutes and stresses things a bit is the least you can do, I >> think. > > I will send it later today. I was just pointing that the failing code-path > is different than the one I measured. It will take some more time, since 5.19 does not want to boot on my machine, and results from VMs are meaningless for this patch. I would look into this unrelated failure, unless you want results from 5.18. [ 6.303945] ------------[ cut here ]------------ [ 6.309209] kernel BUG at arch/x86/kernel/apic/apic.c:1598! [ 6.315537] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 6.321275] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc4TLB+ #5 [ 6.328760] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.9.1 12/04/2018 [ 6.337236] RIP: 0010:setup_local_APIC+0x31e/0x330 [ 6.342686] Code: 01 0f 85 05 ff ff ff 85 d2 7f 2b 48 8b 05 aa 37 4f 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 37 99 fb 00 e9 04 ff ff ff <0f> 0b e8 5b 2d be 00 e9 ba 64 b8 0 [ 6.363818] RSP: 0000:ffffffff82603e88 EFLAGS: 00010246 [ 6.369752] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 6.377820] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020 [ 6.385888] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff82603da8 [ 6.393956] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000031 [ 6.402024] R13: 0000000000000000 R14: ffffffff82613118 R15: 0000000000000000 [ 6.410091] FS: 0000000000000000(0000) GS:ffff889fff600000(0000) knlGS:0000000000000000 [ 6.419250] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.425765] CR2: ffff88c07ffff000 CR3: 000000000260c001 CR4: 00000000000606f0 [ 6.433826] Call Trace: [ 6.436646] <TASK> [ 6.439077] ? _printk+0x53/0x6a [ 6.442777] apic_intr_mode_init+0xd2/0xf1 [ 6.447448] x86_late_time_init+0x1b/0x2b [ 6.452019] start_kernel+0x5d8/0x694 [ 6.456194] secondary_startup_64_no_verify+0xce/0xdb [ 6.461933] </TASK> [ 6.464463] Modules linked in: [ 6.467979] ---[ end trace 0000000000000000 ]--- [ 6.473243] RIP: 0010:setup_local_APIC+0x31e/0x330 [ 6.478704] Code: 01 0f 85 05 ff ff ff 85 d2 7f 2b 48 8b 05 aa 37 4f 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 37 99 fb 00 e9 04 ff ff ff <0f> 0b e8 5b 2d be 00 e9 ba 64 b8 0 [ 6.499865] RSP: 0000:ffffffff82603e88 EFLAGS: 00010246 [ 6.505803] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 6.513887] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020 [ 6.521969] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff82603da8 [ 6.530053] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000031 [ 6.538136] R13: 0000000000000000 R14: ffffffff82613118 R15: 0000000000000000 [ 6.546218] FS: 0000000000000000(0000) GS:ffff889fff600000(0000) knlGS:0000000000000000 [ 6.555391] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.561919] CR2: ffff88c07ffff000 CR3: 000000000260c001 CR4: 00000000000606f0 [ 6.570003] Kernel panic - not syncing: Attempted to kill the idle task! [ 6.577591] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit 2022-07-08 11:40 ` David Hildenbrand @ 2022-07-08 14:49 ` Dave Hansen 2022-07-08 17:04 ` Nadav Amit 2022-07-08 19:21 ` Hugh Dickins 2 siblings, 1 reply; 14+ messages in thread From: Dave Hansen @ 2022-07-08 14:49 UTC (permalink / raw) To: Nadav Amit, linux-kernel, Hugh Dickins, Thomas Gleixner Cc: Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit, Dave Hansen, Peter Zijlstra, Andy Lutomirski [-- Attachment #1: Type: text/plain, Size: 2485 bytes --] On 7/7/22 17:30, Nadav Amit wrote: You might want to fix the clock on the system from which you sent this. I was really scratching my head trying to figure out how you got this patch out before Hugh's bug report. > From: Nadav Amit <namit@vmware.com> > > Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > possible") introduced an optimization of skipping the flush if the TLB > generation that is flushed (as provided in flush_tlb_info) was already > flushed. > > However, arch_tlbbatch_flush() does not provide any generation in > flush_tlb_info. As a result, try_to_unmap_one() would not perform any > TLB flushes. > > Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > anyhow is an invalid generation value. It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker for f->new_tlb_gen being invalid. Being consistent seems like a good idea on this stuff. > In addition, add the missing unlikely() and jump to get tracing right. There are currently five routes out of flush_tlb_func(): * Three early returns * One 'goto done' * One implicit return The tracing code doesn't get run for any of the early returns, but that's intentional because they don't *actually* flush the TLB. We don't want to record that flush_tlb_func() flushed the TLB when it didn't. There's another tracepoint on the TLB_REMOTE_SEND_IPI side to tell where the flushes were requested. That said, I think the if (unlikely(local_tlb_gen == mm_tlb_gen)) goto done; is arguably buggy, as is the 'goto done' in this hunk: > arch/x86/mm/tlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index d9314cc8b81f..d81b4084bb8a 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) > return; > } > > - if (f->new_tlb_gen <= local_tlb_gen) { > + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { > /* > * The TLB is already up to date in respect to f->new_tlb_gen. > * While the core might be still behind mm_tlb_gen, checking > * mm_tlb_gen unnecessarily would have negative caching effects > * so avoid it. > */ > - return; > + goto done; > } > > /* We might want to (eventually) think about doing something like the attached patch to make the skipped flushes explicit in the tracing and make the return paths out of this function more consistent. [-- Attachment #2: tlbtrace.patch --] [-- Type: text/x-patch, Size: 2408 bytes --] diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index d400b6d9d246..44ba73601f50 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -720,7 +720,7 @@ void initialize_tlbstate_and_flush(void) * because all x86 flush operations are serializing and the * atomic64_read operation won't be reordered by the compiler. */ -static void flush_tlb_func(void *info) +static flush_tlb_func(void *info) { /* * We have three different tlb_gen values in here. They are: @@ -738,6 +738,7 @@ static void flush_tlb_func(void *info) u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen); bool local = smp_processor_id() == f->initiating_cpu; unsigned long nr_invalidate = 0; + enum tlb_flush_reason; /* This code cannot presently handle being reentered. */ VM_WARN_ON(!irqs_disabled()); @@ -747,12 +748,21 @@ static void flush_tlb_func(void *info) count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED); /* Can only happen on remote CPUs */ - if (f->mm && f->mm != loaded_mm) - return; + if (f->mm && f->mm != loaded_mm) { + flush_reason = TLB_FLUSH_SKIPPED; + goto done; + } + flush_reason = TLB_REMOTE_SHOOTDOWN; + } else if (f->mm == NULL) { + flush_reason = TLB_LOCAL_SHOOTDOWN; + } else { + flush_reason = TLB_LOCAL_MM_SHOOTDOWN; } - if (unlikely(loaded_mm == &init_mm)) - return; + if (unlikely(loaded_mm == &init_mm)) { + flush_reason = TLB_FLUSH_SKIPPED; + goto done; + } VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].ctx_id) != loaded_mm->context.ctx_id); @@ -768,7 +778,8 @@ static void flush_tlb_func(void *info) * IPIs to lazy TLB mode CPUs. */ switch_mm_irqs_off(NULL, &init_mm, NULL); - return; + flush_reason = TLB_FLUSH_SKIPPED; + goto done; } if (unlikely(local_tlb_gen == mm_tlb_gen)) { @@ -778,6 +789,7 @@ static void flush_tlb_func(void *info) * be handled can catch us all the way up, leaving no work for * the second flush. */ + flush_reason = TLB_FLUSH_SKIPPED; goto done; } @@ -849,10 +861,7 @@ static void flush_tlb_func(void *info) /* Tracing is done in a unified manner to reduce the code size */ done: - trace_tlb_flush(!local ? TLB_REMOTE_SHOOTDOWN : - (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : - TLB_LOCAL_MM_SHOOTDOWN, - nr_invalidate); + trace_tlb_flush(flush_reason, nr_invalidate); } static bool tlb_is_not_lazy(int cpu, void *data) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 14:49 ` Dave Hansen @ 2022-07-08 17:04 ` Nadav Amit 2022-07-08 18:54 ` Dave Hansen 0 siblings, 1 reply; 14+ messages in thread From: Nadav Amit @ 2022-07-08 17:04 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/7/22 17:30, Nadav Amit wrote: > > You might want to fix the clock on the system from which you sent this. > I was really scratching my head trying to figure out how you got this > patch out before Hugh's bug report. > >> From: Nadav Amit <namit@vmware.com> >> >> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >> possible") introduced an optimization of skipping the flush if the TLB >> generation that is flushed (as provided in flush_tlb_info) was already >> flushed. >> >> However, arch_tlbbatch_flush() does not provide any generation in >> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >> TLB flushes. >> >> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >> anyhow is an invalid generation value. > > It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker > for f->new_tlb_gen being invalid. Being consistent seems like a good > idea on this stuff. If we get a request to do a flush, regardless whether full or partial, that logically we have already done, there is not reason to do it. I therefore do not see a reason to look on f->end. I think that looking at the generation is very intuitive. If you want, I can add a constant such as TLB_GENERATION_INVALID. > >> In addition, add the missing unlikely() and jump to get tracing right. > > There are currently five routes out of flush_tlb_func(): > * Three early returns > * One 'goto done' > * One implicit return > > The tracing code doesn't get run for any of the early returns, but > that's intentional because they don't *actually* flush the TLB. We > don't want to record that flush_tlb_func() flushed the TLB when it > didn't. There's another tracepoint on the TLB_REMOTE_SEND_IPI side to > tell where the flushes were requested. > > That said, I think the > > if (unlikely(local_tlb_gen == mm_tlb_gen)) > goto done; > > is arguably buggy, as is the 'goto done' in this hunk: I was just trying to follow it for consistency. Will remove. > > We might want to (eventually) think about doing something like the > attached patch to make the skipped flushes explicit in the tracing and > make the return paths out of this function more consistent. That’s fine with me. I just recommend that you have a single tracing call in the function, since having too many ruins the generated code. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 17:04 ` Nadav Amit @ 2022-07-08 18:54 ` Dave Hansen 2022-07-11 5:19 ` Nadav Amit 0 siblings, 1 reply; 14+ messages in thread From: Dave Hansen @ 2022-07-08 18:54 UTC (permalink / raw) To: Nadav Amit Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski On 7/8/22 10:04, Nadav Amit wrote: > On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote: >> On 7/7/22 17:30, Nadav Amit wrote: >> You might want to fix the clock on the system from which you sent this. >> I was really scratching my head trying to figure out how you got this >> patch out before Hugh's bug report. >> >>> From: Nadav Amit <namit@vmware.com> >>> >>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >>> possible") introduced an optimization of skipping the flush if the TLB >>> generation that is flushed (as provided in flush_tlb_info) was already >>> flushed. >>> >>> However, arch_tlbbatch_flush() does not provide any generation in >>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >>> TLB flushes. >>> >>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >>> anyhow is an invalid generation value. >> >> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker >> for f->new_tlb_gen being invalid. Being consistent seems like a good >> idea on this stuff. > > If we get a request to do a flush, regardless whether full or partial, > that logically we have already done, there is not reason to do it. > > I therefore do not see a reason to look on f->end. I think that looking > at the generation is very intuitive. If you want, I can add a constant > such as TLB_GENERATION_INVALID. That's a good point. But, _my_ point was that there was only really one read site of f->new_tlb_gen in flush_tlb_func(). That site is guarded by the "f->end != TLB_FLUSH_ALL" check which prevented it from making the same error that your patch did. Whatever we do, it would be nice to have a *single* way to check for "does f->new_tlb_gen have an actual, valid bit of tlb gen data in it?" Using something like TLB_GENERATION_INVALID seems reasonable to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 18:54 ` Dave Hansen @ 2022-07-11 5:19 ` Nadav Amit 0 siblings, 0 replies; 14+ messages in thread From: Nadav Amit @ 2022-07-11 5:19 UTC (permalink / raw) To: Dave Hansen Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski On Jul 8, 2022, at 11:54 AM, Dave Hansen <dave.hansen@intel.com> wrote: > ⚠ External Email > > On 7/8/22 10:04, Nadav Amit wrote: >> On Jul 8, 2022, at 7:49 AM, Dave Hansen <dave.hansen@intel.com> wrote: >>> On 7/7/22 17:30, Nadav Amit wrote: >>> You might want to fix the clock on the system from which you sent this. >>> I was really scratching my head trying to figure out how you got this >>> patch out before Hugh's bug report. >>> >>>> From: Nadav Amit <namit@vmware.com> >>>> >>>> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >>>> possible") introduced an optimization of skipping the flush if the TLB >>>> generation that is flushed (as provided in flush_tlb_info) was already >>>> flushed. >>>> >>>> However, arch_tlbbatch_flush() does not provide any generation in >>>> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >>>> TLB flushes. >>>> >>>> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >>>> anyhow is an invalid generation value. >>> >>> It is, but the check below uses 'f->end == TLB_FLUSH_ALL' as the marker >>> for f->new_tlb_gen being invalid. Being consistent seems like a good >>> idea on this stuff. >> >> If we get a request to do a flush, regardless whether full or partial, >> that logically we have already done, there is not reason to do it. >> >> I therefore do not see a reason to look on f->end. I think that looking >> at the generation is very intuitive. If you want, I can add a constant >> such as TLB_GENERATION_INVALID. > > That's a good point. > > But, _my_ point was that there was only really one read site of > f->new_tlb_gen in flush_tlb_func(). That site is guarded by the "f->end > != TLB_FLUSH_ALL" check which prevented it from making the same error > that your patch did. > > Whatever we do, it would be nice to have a *single* way to check for > "does f->new_tlb_gen have an actual, valid bit of tlb gen data in it?" > > Using something like TLB_GENERATION_INVALID seems reasonable to me. I am not sure that I fully understood what you meant. I understand you do want TLB_GENERATION_INVALID, and I think you ask for some assertions to regard to the expected relationship between TLB_GENERATION_INVALID and TLB_FLUSH_ALL. I think that in order to shorten the discussion, I’ll send v2 (very soon) and you will comment on the patch itself. I did run the tests again to measure performance as you asked for, and the results are similar (will-it-scale tlb_flush1/threads): 11% speedup with 45 cores. Without aa44284960d5: tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 1,156782,97.89,157024,97.92,157024 5,707879,89.60,322015,89.69,785120 10,1311968,79.21,490881,79.68,1570240 15,1498762,68.82,553958,69.71,2355360 20,1483057,58.45,598939,60.00,3140480 25,1428105,48.07,626179,50.46,3925600 30,1648992,37.77,643954,41.36,4710720 35,1861301,27.50,671570,32.63,5495840 40,2038278,17.17,669470,24.03,6280960 45,2140412,6.71,673633,15.27,7066080 With aa44284960d5 + pending patch: 0,0,100,0,100,0 1,157935,97.93,155351,97.93,157935 5,710450,89.60,324981,89.70,789675 10,1291935,79.21,498496,79.57,1579350 15,1515323,68.81,575821,69.68,2369025 20,1545172,58.46,625521,60.05,3158700 25,1501015,48.09,675856,50.62,3948375 30,1682308,37.80,705242,41.55,4738050 35,1850464,27.52,717724,32.33,5527725 40,2030726,17.17,734610,23.99,6317400 45,2136401,6.71,747257,16.07,7107075 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit 2022-07-08 11:40 ` David Hildenbrand 2022-07-08 14:49 ` Dave Hansen @ 2022-07-08 19:21 ` Hugh Dickins 2022-07-08 20:02 ` Nadav Amit 2 siblings, 1 reply; 14+ messages in thread From: Hugh Dickins @ 2022-07-08 19:21 UTC (permalink / raw) To: Nadav Amit Cc: linux-kernel, Hugh Dickins, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, Linux MM, Nadav Amit, Dave Hansen, Peter Zijlstra, Andy Lutomirski, Andrew Morton On Thu, 7 Jul 2022, Nadav Amit wrote: > From: Nadav Amit <namit@vmware.com> > > Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > possible") introduced an optimization of skipping the flush if the TLB > generation that is flushed (as provided in flush_tlb_info) was already > flushed. > > However, arch_tlbbatch_flush() does not provide any generation in > flush_tlb_info. As a result, try_to_unmap_one() would not perform any > TLB flushes. > > Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > anyhow is an invalid generation value. > > In addition, add the missing unlikely() and jump to get tracing right. > > Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") > Reported-by: Hugh Dickins <hughd@google.com> > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > Cc: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Nadav Amit <namit@vmware.com> Thanks a lot for your rapid response and thinking it through (before I got around to any "nopcid" or "nopti" experiments). I've been testing this one for a few hours now, and no problems seen. I expect you'll be sending another version, maybe next week, meeting Dave's concerns; but wanted to reassure that you have correctly identified the issue and fixed it with this - thanks. Hugh > --- > arch/x86/mm/tlb.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index d9314cc8b81f..d81b4084bb8a 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -771,14 +771,14 @@ static void flush_tlb_func(void *info) > return; > } > > - if (f->new_tlb_gen <= local_tlb_gen) { > + if (unlikely(f->new_tlb_gen != 0 && f->new_tlb_gen <= local_tlb_gen)) { > /* > * The TLB is already up to date in respect to f->new_tlb_gen. > * While the core might be still behind mm_tlb_gen, checking > * mm_tlb_gen unnecessarily would have negative caching effects > * so avoid it. > */ > - return; > + goto done; > } > > /* > -- > 2.25.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 19:21 ` Hugh Dickins @ 2022-07-08 20:02 ` Nadav Amit 2022-07-08 20:48 ` Hugh Dickins 0 siblings, 1 reply; 14+ messages in thread From: Nadav Amit @ 2022-07-08 20:02 UTC (permalink / raw) To: Hugh Dickins Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski, Andrew Morton On Jul 8, 2022, at 12:21 PM, Hugh Dickins <hughd@google.com> wrote: > ⚠ External Email > > On Thu, 7 Jul 2022, Nadav Amit wrote: > >> From: Nadav Amit <namit@vmware.com> >> >> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when >> possible") introduced an optimization of skipping the flush if the TLB >> generation that is flushed (as provided in flush_tlb_info) was already >> flushed. >> >> However, arch_tlbbatch_flush() does not provide any generation in >> flush_tlb_info. As a result, try_to_unmap_one() would not perform any >> TLB flushes. >> >> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is >> anyhow is an invalid generation value. >> >> In addition, add the missing unlikely() and jump to get tracing right. >> >> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") >> Reported-by: Hugh Dickins <hughd@google.com> >> Cc: Dave Hansen <dave.hansen@linux.intel.com> >> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> >> Cc: Andy Lutomirski <luto@kernel.org> >> Signed-off-by: Nadav Amit <namit@vmware.com> > > Thanks a lot for your rapid response and thinking it through > (before I got around to any "nopcid" or "nopti" experiments). > > I've been testing this one for a few hours now, and no problems seen. > I expect you'll be sending another version, maybe next week, meeting > Dave's concerns; but wanted to reassure that you have correctly > identified the issue and fixed it with this - thanks. Thanks, Hugh. Sorry again for my mistake. Can I please have your “Tested-by”? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero 2022-07-08 20:02 ` Nadav Amit @ 2022-07-08 20:48 ` Hugh Dickins 0 siblings, 0 replies; 14+ messages in thread From: Hugh Dickins @ 2022-07-08 20:48 UTC (permalink / raw) To: Nadav Amit Cc: Hugh Dickins, LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, Linux MM, Dave Hansen, Peter Zijlstra, Andy Lutomirski, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1837 bytes --] On Fri, 8 Jul 2022, Nadav Amit wrote: > On Jul 8, 2022, at 12:21 PM, Hugh Dickins <hughd@google.com> wrote: > > > ⚠ External Email > > > > On Thu, 7 Jul 2022, Nadav Amit wrote: > > > >> From: Nadav Amit <namit@vmware.com> > >> > >> Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when > >> possible") introduced an optimization of skipping the flush if the TLB > >> generation that is flushed (as provided in flush_tlb_info) was already > >> flushed. > >> > >> However, arch_tlbbatch_flush() does not provide any generation in > >> flush_tlb_info. As a result, try_to_unmap_one() would not perform any > >> TLB flushes. > >> > >> Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is > >> anyhow is an invalid generation value. > >> > >> In addition, add the missing unlikely() and jump to get tracing right. > >> > >> Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") > >> Reported-by: Hugh Dickins <hughd@google.com> > >> Cc: Dave Hansen <dave.hansen@linux.intel.com> > >> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> > >> Cc: Andy Lutomirski <luto@kernel.org> > >> Signed-off-by: Nadav Amit <namit@vmware.com> > > > > Thanks a lot for your rapid response and thinking it through > > (before I got around to any "nopcid" or "nopti" experiments). > > > > I've been testing this one for a few hours now, and no problems seen. > > I expect you'll be sending another version, maybe next week, meeting > > Dave's concerns; but wanted to reassure that you have correctly > > identified the issue and fixed it with this - thanks. > > Thanks, Hugh. Sorry again for my mistake. > > Can I please have your “Tested-by”? Sure, let me scrabble around in my box of tags, yes, here's one: Tested-by: Hugh Dickins <hughd@google.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-07-11 5:19 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-08 0:30 [PATCH] x86/mm/tlb: ignore f->new_tlb_gen when zero Nadav Amit 2022-07-08 11:40 ` David Hildenbrand 2022-07-08 15:13 ` Dave Hansen 2022-07-08 16:54 ` Nadav Amit 2022-07-08 17:01 ` Dave Hansen 2022-07-08 17:09 ` Nadav Amit 2022-07-08 18:03 ` Nadav Amit 2022-07-08 14:49 ` Dave Hansen 2022-07-08 17:04 ` Nadav Amit 2022-07-08 18:54 ` Dave Hansen 2022-07-11 5:19 ` Nadav Amit 2022-07-08 19:21 ` Hugh Dickins 2022-07-08 20:02 ` Nadav Amit 2022-07-08 20:48 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox