Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
       [not found] <20220606180123.2485171-1-namit@vmware.com>
@ 2022-07-08  3:27 ` Hugh Dickins
  2022-07-08  4:23   ` Nadav Amit
  0 siblings, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2022-07-08  3:27 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andrew Morton, Dave Hansen, LKML, Nadav Amit, Peter Zijlstra,
	Ingo Molnar, Andy Lutomirski, Thomas Gleixner, x86, linux-mm

On Mon, 6 Jun 2022, Nadav Amit wrote:

> From: Nadav Amit <namit@vmware.com>
> 
> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
> contended and reading it should (arguably) be avoided as much as
> possible.
> 
> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
> even when it is not necessary (e.g., the mm was already switched).
> This is wasteful.
> 
> Moreover, one of the existing optimizations is to read mm's tlb_gen to
> see if there are additional in-flight TLB invalidations and flush the
> entire TLB in such a case. However, if the request's tlb_gen was already
> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
> by the overhead of the check itself.
> 
> Running will-it-scale with tlb_flush1_threads show a considerable
> benefit on 56-core Skylake (up to +24%):
> 
> threads		Baseline (v5.17+)	+Patch
> 1		159960			160202
> 5		310808			308378 (-0.7%)
> 10		479110			490728
> 15		526771			562528
> 20		534495			587316
> 25		547462			628296
> 30		579616			666313
> 35		594134			701814
> 40		612288			732967
> 45		617517			749727
> 50		637476			735497
> 55		614363			778913 (+24%)
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: x86@kernel.org
> Signed-off-by: Nadav Amit <namit@vmware.com>
> 
> --
> 
> Note: The benchmarked kernels include Dave's revert of commit
> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
> tlb_is_not_lazy()
> ---
>  arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index d400b6d9d246..d9314cc8b81f 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>  	const struct flush_tlb_info *f = info;
>  	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>  	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
> -	u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>  	u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>  	bool local = smp_processor_id() == f->initiating_cpu;
>  	unsigned long nr_invalidate = 0;
> +	u64 mm_tlb_gen;
>  
>  	/* This code cannot presently handle being reentered. */
>  	VM_WARN_ON(!irqs_disabled());
> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>  		return;
>  	}
>  
> +	if (f->new_tlb_gen <= local_tlb_gen) {
> +		/*
> +		 * The TLB is already up to date in respect to f->new_tlb_gen.
> +		 * While the core might be still behind mm_tlb_gen, checking
> +		 * mm_tlb_gen unnecessarily would have negative caching effects
> +		 * so avoid it.
> +		 */
> +		return;
> +	}
> +
> +	/*
> +	 * Defer mm_tlb_gen reading as long as possible to avoid cache
> +	 * contention.
> +	 */
> +	mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
> +
>  	if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>  		/*
>  		 * There's nothing to do: we're already up to date.  This can
> -- 
> 2.25.1

I'm sorry, but bisection and reversion show that this commit,
aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
is responsible for the "internal compiler error: Segmentation fault"s
I get when running kernel builds on tmpfs in 1G memory, lots of swapping.

That tmpfs is using huge pages as much as it can, so splitting and
collapsing, compaction and page migration entailed, in case that's
relevant (maybe this commit is perfect, but there's a TLB flushing
bug over there in mm which this commit just exposes).

Whether those segfaults happen without the huge page element,
I have not done enough testing to tell - there are other bugs with
swapping in current linux-next, indeed, I wouldn't even have found
this one, if I hadn't already been on a bisection for another bug,
and got thrown off course by these segfaults.

I hope that you can work out what might be wrong with this,
but meantime I think it needs to be reverted.

Thanks,
Hugh


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
  2022-07-08  3:27 ` [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible Hugh Dickins
@ 2022-07-08  4:23   ` Nadav Amit
  2022-07-08  5:56     ` Nadav Amit
  0 siblings, 1 reply; 4+ messages in thread
From: Nadav Amit @ 2022-07-08  4:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Dave Hansen, LKML, Peter Zijlstra, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, x86, linux-mm

On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:

> On Mon, 6 Jun 2022, Nadav Amit wrote:
> 
>> From: Nadav Amit <namit@vmware.com>
>> 
>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>> contended and reading it should (arguably) be avoided as much as
>> possible.
>> 
>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>> even when it is not necessary (e.g., the mm was already switched).
>> This is wasteful.
>> 
>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>> see if there are additional in-flight TLB invalidations and flush the
>> entire TLB in such a case. However, if the request's tlb_gen was already
>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>> by the overhead of the check itself.
>> 
>> Running will-it-scale with tlb_flush1_threads show a considerable
>> benefit on 56-core Skylake (up to +24%):
>> 
>> threads		Baseline (v5.17+)	+Patch
>> 1		159960			160202
>> 5		310808			308378 (-0.7%)
>> 10		479110			490728
>> 15		526771			562528
>> 20		534495			587316
>> 25		547462			628296
>> 30		579616			666313
>> 35		594134			701814
>> 40		612288			732967
>> 45		617517			749727
>> 50		637476			735497
>> 55		614363			778913 (+24%)
>> 
>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: x86@kernel.org
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> 
>> --
>> 
>> Note: The benchmarked kernels include Dave's revert of commit
>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>> tlb_is_not_lazy()
>> ---
>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>> 1 file changed, 17 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index d400b6d9d246..d9314cc8b81f 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>> 	const struct flush_tlb_info *f = info;
>> 	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>> 	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>> -	u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>> 	u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>> 	bool local = smp_processor_id() == f->initiating_cpu;
>> 	unsigned long nr_invalidate = 0;
>> +	u64 mm_tlb_gen;
>> 
>> 	/* This code cannot presently handle being reentered. */
>> 	VM_WARN_ON(!irqs_disabled());
>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>> 		return;
>> 	}
>> 
>> +	if (f->new_tlb_gen <= local_tlb_gen) {
>> +		/*
>> +		 * The TLB is already up to date in respect to f->new_tlb_gen.
>> +		 * While the core might be still behind mm_tlb_gen, checking
>> +		 * mm_tlb_gen unnecessarily would have negative caching effects
>> +		 * so avoid it.
>> +		 */
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * Defer mm_tlb_gen reading as long as possible to avoid cache
>> +	 * contention.
>> +	 */
>> +	mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>> +
>> 	if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>> 		/*
>> 		 * There's nothing to do: we're already up to date.  This can
>> -- 
>> 2.25.1
> 
> I'm sorry, but bisection and reversion show that this commit,
> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
> is responsible for the "internal compiler error: Segmentation fault"s
> I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
> 
> That tmpfs is using huge pages as much as it can, so splitting and
> collapsing, compaction and page migration entailed, in case that's
> relevant (maybe this commit is perfect, but there's a TLB flushing
> bug over there in mm which this commit just exposes).
> 
> Whether those segfaults happen without the huge page element,
> I have not done enough testing to tell - there are other bugs with
> swapping in current linux-next, indeed, I wouldn't even have found
> this one, if I hadn't already been on a bisection for another bug,
> and got thrown off course by these segfaults.
> 
> I hope that you can work out what might be wrong with this,
> but meantime I think it needs to be reverted.

I find it always surprising how trivial one liners fail.

As you probably know, debugging these kind of things is hard. I see two
possible cases:

1. The failure is directly related to this optimization. The immediate
suspect in my mind is something to do with PCID/ASID.

2. The failure is due to another bug that was papered by “enough” TLB
flushes.

I will look into the code. But if it is possible, it would be helpful to
know whether you get the failure with the “nopcid” kernel parameter. If it
passes, it wouldn’t say much, but if it fails, I think (2) is more likely.

Not arguing about a revert, but, in some way, if the test fails, it can
indicate that the optimization “works”…

I’ll put some time to look deeper into the code, but it would be very
helpful if you can let me know what happens with nopcid.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
  2022-07-08  4:23   ` Nadav Amit
@ 2022-07-08  5:56     ` Nadav Amit
  2022-07-08  6:59       ` Nadav Amit
  0 siblings, 1 reply; 4+ messages in thread
From: Nadav Amit @ 2022-07-08  5:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Dave Hansen, LKML, Peter Zijlstra, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, x86, linux-mm

On Jul 7, 2022, at 9:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:

> On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:
> 
>> On Mon, 6 Jun 2022, Nadav Amit wrote:
>> 
>>> From: Nadav Amit <namit@vmware.com>
>>> 
>>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>>> contended and reading it should (arguably) be avoided as much as
>>> possible.
>>> 
>>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>>> even when it is not necessary (e.g., the mm was already switched).
>>> This is wasteful.
>>> 
>>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>>> see if there are additional in-flight TLB invalidations and flush the
>>> entire TLB in such a case. However, if the request's tlb_gen was already
>>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>>> by the overhead of the check itself.
>>> 
>>> Running will-it-scale with tlb_flush1_threads show a considerable
>>> benefit on 56-core Skylake (up to +24%):
>>> 
>>> threads		Baseline (v5.17+)	+Patch
>>> 1		159960			160202
>>> 5		310808			308378 (-0.7%)
>>> 10		479110			490728
>>> 15		526771			562528
>>> 20		534495			587316
>>> 25		547462			628296
>>> 30		579616			666313
>>> 35		594134			701814
>>> 40		612288			732967
>>> 45		617517			749727
>>> 50		637476			735497
>>> 55		614363			778913 (+24%)
>>> 
>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Andy Lutomirski <luto@kernel.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: x86@kernel.org
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>> 
>>> --
>>> 
>>> Note: The benchmarked kernels include Dave's revert of commit
>>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>>> tlb_is_not_lazy()
>>> ---
>>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>>> index d400b6d9d246..d9314cc8b81f 100644
>>> --- a/arch/x86/mm/tlb.c
>>> +++ b/arch/x86/mm/tlb.c
>>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>>> 	const struct flush_tlb_info *f = info;
>>> 	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>>> 	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>>> -	u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>> 	u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>>> 	bool local = smp_processor_id() == f->initiating_cpu;
>>> 	unsigned long nr_invalidate = 0;
>>> +	u64 mm_tlb_gen;
>>> 
>>> 	/* This code cannot presently handle being reentered. */
>>> 	VM_WARN_ON(!irqs_disabled());
>>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>>> 		return;
>>> 	}
>>> 
>>> +	if (f->new_tlb_gen <= local_tlb_gen) {
>>> +		/*
>>> +		 * The TLB is already up to date in respect to f->new_tlb_gen.
>>> +		 * While the core might be still behind mm_tlb_gen, checking
>>> +		 * mm_tlb_gen unnecessarily would have negative caching effects
>>> +		 * so avoid it.
>>> +		 */
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * Defer mm_tlb_gen reading as long as possible to avoid cache
>>> +	 * contention.
>>> +	 */
>>> +	mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>> +
>>> 	if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>>> 		/*
>>> 		 * There's nothing to do: we're already up to date.  This can
>>> -- 
>>> 2.25.1
>> 
>> I'm sorry, but bisection and reversion show that this commit,
>> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
>> is responsible for the "internal compiler error: Segmentation fault"s
>> I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
>> 
>> That tmpfs is using huge pages as much as it can, so splitting and
>> collapsing, compaction and page migration entailed, in case that's
>> relevant (maybe this commit is perfect, but there's a TLB flushing
>> bug over there in mm which this commit just exposes).
>> 
>> Whether those segfaults happen without the huge page element,
>> I have not done enough testing to tell - there are other bugs with
>> swapping in current linux-next, indeed, I wouldn't even have found
>> this one, if I hadn't already been on a bisection for another bug,
>> and got thrown off course by these segfaults.
>> 
>> I hope that you can work out what might be wrong with this,
>> but meantime I think it needs to be reverted.
> 
> I find it always surprising how trivial one liners fail.
> 
> As you probably know, debugging these kind of things is hard. I see two
> possible cases:
> 
> 1. The failure is directly related to this optimization. The immediate
> suspect in my mind is something to do with PCID/ASID.
> 
> 2. The failure is due to another bug that was papered by “enough” TLB
> flushes.
> 
> I will look into the code. But if it is possible, it would be helpful to
> know whether you get the failure with the “nopcid” kernel parameter. If it
> passes, it wouldn’t say much, but if it fails, I think (2) is more likely.
> 
> Not arguing about a revert, but, in some way, if the test fails, it can
> indicate that the optimization “works”…
> 
> I’ll put some time to look deeper into the code, but it would be very
> helpful if you can let me know what happens with nopcid.

Actually, only using “nopcid” would most likely make it go away if we have
PTI enabled. So to get a good indication, a check whether it reproduces with
“nopti” and “nopcid” is needed.

I don’t have a better answer yet. Still trying to see what might have gone
wrong.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible
  2022-07-08  5:56     ` Nadav Amit
@ 2022-07-08  6:59       ` Nadav Amit
  0 siblings, 0 replies; 4+ messages in thread
From: Nadav Amit @ 2022-07-08  6:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Dave Hansen, LKML, Peter Zijlstra, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, x86, linux-mm

On Jul 7, 2022, at 10:56 PM, Nadav Amit <nadav.amit@gmail.com> wrote:

> On Jul 7, 2022, at 9:23 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
>> On Jul 7, 2022, at 8:27 PM, Hugh Dickins <hughd@google.com> wrote:
>> 
>>> On Mon, 6 Jun 2022, Nadav Amit wrote:
>>> 
>>>> From: Nadav Amit <namit@vmware.com>
>>>> 
>>>> On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
>>>> contended and reading it should (arguably) be avoided as much as
>>>> possible.
>>>> 
>>>> Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
>>>> even when it is not necessary (e.g., the mm was already switched).
>>>> This is wasteful.
>>>> 
>>>> Moreover, one of the existing optimizations is to read mm's tlb_gen to
>>>> see if there are additional in-flight TLB invalidations and flush the
>>>> entire TLB in such a case. However, if the request's tlb_gen was already
>>>> flushed, the benefit of checking the mm's tlb_gen is likely to be offset
>>>> by the overhead of the check itself.
>>>> 
>>>> Running will-it-scale with tlb_flush1_threads show a considerable
>>>> benefit on 56-core Skylake (up to +24%):
>>>> 
>>>> threads		Baseline (v5.17+)	+Patch
>>>> 1		159960			160202
>>>> 5		310808			308378 (-0.7%)
>>>> 10		479110			490728
>>>> 15		526771			562528
>>>> 20		534495			587316
>>>> 25		547462			628296
>>>> 30		579616			666313
>>>> 35		594134			701814
>>>> 40		612288			732967
>>>> 45		617517			749727
>>>> 50		637476			735497
>>>> 55		614363			778913 (+24%)
>>>> 
>>>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>> Cc: Ingo Molnar <mingo@kernel.org>
>>>> Cc: Andy Lutomirski <luto@kernel.org>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: x86@kernel.org
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> 
>>>> --
>>>> 
>>>> Note: The benchmarked kernels include Dave's revert of commit
>>>> 6035152d8eeb ("x86/mm/tlb: Open-code on_each_cpu_cond_mask() for
>>>> tlb_is_not_lazy()
>>>> ---
>>>> arch/x86/mm/tlb.c | 18 +++++++++++++++++-
>>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>>>> index d400b6d9d246..d9314cc8b81f 100644
>>>> --- a/arch/x86/mm/tlb.c
>>>> +++ b/arch/x86/mm/tlb.c
>>>> @@ -734,10 +734,10 @@ static void flush_tlb_func(void *info)
>>>> 	const struct flush_tlb_info *f = info;
>>>> 	struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
>>>> 	u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid);
>>>> -	u64 mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>>> 	u64 local_tlb_gen = this_cpu_read(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen);
>>>> 	bool local = smp_processor_id() == f->initiating_cpu;
>>>> 	unsigned long nr_invalidate = 0;
>>>> +	u64 mm_tlb_gen;
>>>> 
>>>> 	/* This code cannot presently handle being reentered. */
>>>> 	VM_WARN_ON(!irqs_disabled());
>>>> @@ -771,6 +771,22 @@ static void flush_tlb_func(void *info)
>>>> 		return;
>>>> 	}
>>>> 
>>>> +	if (f->new_tlb_gen <= local_tlb_gen) {
>>>> +		/*
>>>> +		 * The TLB is already up to date in respect to f->new_tlb_gen.
>>>> +		 * While the core might be still behind mm_tlb_gen, checking
>>>> +		 * mm_tlb_gen unnecessarily would have negative caching effects
>>>> +		 * so avoid it.
>>>> +		 */
>>>> +		return;
>>>> +	}
>>>> +
>>>> +	/*
>>>> +	 * Defer mm_tlb_gen reading as long as possible to avoid cache
>>>> +	 * contention.
>>>> +	 */
>>>> +	mm_tlb_gen = atomic64_read(&loaded_mm->context.tlb_gen);
>>>> +
>>>> 	if (unlikely(local_tlb_gen == mm_tlb_gen)) {
>>>> 		/*
>>>> 		 * There's nothing to do: we're already up to date.  This can
>>>> -- 
>>>> 2.25.1
>>> 
>>> I'm sorry, but bisection and reversion show that this commit,
>>> aa44284960d550eb4d8614afdffebc68a432a9b4 in current linux-next,
>>> is responsible for the "internal compiler error: Segmentation fault"s
>>> I get when running kernel builds on tmpfs in 1G memory, lots of swapping.
>>> 
>>> That tmpfs is using huge pages as much as it can, so splitting and
>>> collapsing, compaction and page migration entailed, in case that's
>>> relevant (maybe this commit is perfect, but there's a TLB flushing
>>> bug over there in mm which this commit just exposes).
>>> 
>>> Whether those segfaults happen without the huge page element,
>>> I have not done enough testing to tell - there are other bugs with
>>> swapping in current linux-next, indeed, I wouldn't even have found
>>> this one, if I hadn't already been on a bisection for another bug,
>>> and got thrown off course by these segfaults.
>>> 
>>> I hope that you can work out what might be wrong with this,
>>> but meantime I think it needs to be reverted.
>> 
>> I find it always surprising how trivial one liners fail.
>> 
>> As you probably know, debugging these kind of things is hard. I see two
>> possible cases:
>> 
>> 1. The failure is directly related to this optimization. The immediate
>> suspect in my mind is something to do with PCID/ASID.
>> 
>> 2. The failure is due to another bug that was papered by “enough” TLB
>> flushes.
>> 
>> I will look into the code. But if it is possible, it would be helpful to
>> know whether you get the failure with the “nopcid” kernel parameter. If it
>> passes, it wouldn’t say much, but if it fails, I think (2) is more likely.
>> 
>> Not arguing about a revert, but, in some way, if the test fails, it can
>> indicate that the optimization “works”…
>> 
>> I’ll put some time to look deeper into the code, but it would be very
>> helpful if you can let me know what happens with nopcid.
> 
> Actually, only using “nopcid” would most likely make it go away if we have
> PTI enabled. So to get a good indication, a check whether it reproduces with
> “nopti” and “nopcid” is needed.
> 
> I don’t have a better answer yet. Still trying to see what might have gone
> wrong.

Ok. My bad. Sorry. arch_tlbbatch_flush() does not set any generation in
flush_tlb_info. Bad.

Should be fixed by something like - I’ll send a patch tomorrow:

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d9314cc8b81f..9f19894c322f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -771,7 +771,7 @@ static void flush_tlb_func(void *info)
                return;
        }
 
-       if (f->new_tlb_gen <= local_tlb_gen) {
+       if (unlikely(f->mm && f->new_tlb_gen <= local_tlb_gen)) {
                /*
                 * The TLB is already up to date in respect to f->new_tlb_gen.
                 * While the core might be still behind mm_tlb_gen, checking






^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-08  7:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20220606180123.2485171-1-namit@vmware.com>
2022-07-08  3:27 ` [PATCH v2] x86/mm/tlb: avoid reading mm_tlb_gen when possible Hugh Dickins
2022-07-08  4:23   ` Nadav Amit
2022-07-08  5:56     ` Nadav Amit
2022-07-08  6:59       ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox