* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 14:49 ` Alan Stern
@ 2024-09-28 15:30 ` Mathieu Desnoyers
2024-09-28 15:32 ` Mathieu Desnoyers
2024-09-30 11:26 ` Jonas Oberhauser
2 siblings, 0 replies; 42+ messages in thread
From: Mathieu Desnoyers @ 2024-09-28 15:30 UTC (permalink / raw)
To: Alan Stern, Linus Torvalds
Cc: linux-kernel, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Paul E. McKenney, Will Deacon, Peter Zijlstra, Boqun Feng,
John Stultz, Neeraj Upadhyay, Frederic Weisbecker,
Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long, Mark Rutland,
Thomas Gleixner, Vlastimil Babka, maged.michael, Mateusz Guzik,
Gary Guo, Jonas Oberhauser, rcu, linux-mm, lkmm
On 2024-09-28 16:49, Alan Stern wrote:
> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
[...]
>> +/*
>> + * Compare two addresses while preserving the address dependencies for
>> + * later use of the address. It should be used when comparing an address
>> + * returned by rcu_dereference().
>> + *
>> + * This is needed to prevent the compiler CSE and SSA GVN optimizations
>> + * from replacing the registers holding @a or @b based on their
>> + * equality, which does not preserve address dependencies and allows the
>> + * following misordering speculations:
>> + *
>> + * - If @b is a constant, the compiler can issue the loads which depend
>> + * on @a before loading @a.
>> + * - If @b is a register populated by a prior load, weakly-ordered
>> + * CPUs can speculate loads which depend on @a before loading @a.
>> + *
>> + * The same logic applies with @a and @b swapped.
[...]
>> + *
>> + * Return value: true if pointers are equal, false otherwise.
>> + *
>> + * The compiler barrier() is ineffective at fixing this issue. It does
>> + * not prevent the compiler CSE from losing the address dependency:
>> + *
>> + * int fct_2_volatile_barriers(void)
>> + * {
>> + * int *a, *b;
>> + *
>> + * do {
>> + * a = READ_ONCE(p);
>> + * asm volatile ("" : : : "memory");
>> + * b = READ_ONCE(p);
>> + * } while (a != b);
>> + * asm volatile ("" : : : "memory"); <-- barrier()
>> + * return *b;
>> + * }
>> + *
>> + * With gcc 14.2 (arm64):
>> + *
>> + * fct_2_volatile_barriers:
>> + * adrp x0, .LANCHOR0
>> + * add x0, x0, :lo12:.LANCHOR0
>> + * .L2:
>> + * ldr x1, [x0] <-- x1 populated by first load.
>> + * ldr x2, [x0]
>> + * cmp x1, x2
>> + * bne .L2
>> + * ldr w0, [x1] <-- x1 is used for access which should depend on b.
>> + * ret
>> + *
>> + * On weakly-ordered architectures, this lets CPU speculation use the
>> + * result from the first load to speculate "ldr w0, [x1]" before
>> + * "ldr x2, [x0]".
>> + * Based on the RCU documentation, the control dependency does not
>> + * prevent the CPU from speculating loads.
>
> IMO, this lengthy explanation is not needed in the source code. Just
> refer interested readers to the commit description. You're repeating
> the same text verbatim, after all.
>
> (Or if you firmly believe that this explanation _does_ belong in the
> code, then omit it from the commit description. There's no need to say
> everything twice.)
>
Linus asked for this code/asm example to be in the comment:
https://lore.kernel.org/lkml/CAHk-=wgBgh5U+dyNaN=+XCdcm2OmgSRbcH4Vbtk8i5ZDGwStSA@mail.gmail.com/
I agree that it may be a bit verbose for the comment.
Linus, do you want the explanation wrt compiler barriers not being
enough (with C/asm example) in the comment above ptr_eq() or in
the commit message ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 14:49 ` Alan Stern
2024-09-28 15:30 ` Mathieu Desnoyers
@ 2024-09-28 15:32 ` Mathieu Desnoyers
2024-09-28 15:49 ` Alan Stern
2024-09-30 11:26 ` Jonas Oberhauser
2 siblings, 1 reply; 42+ messages in thread
From: Mathieu Desnoyers @ 2024-09-28 15:32 UTC (permalink / raw)
To: Alan Stern
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo,
Jonas Oberhauser, rcu, linux-mm, lkmm
On 2024-09-28 16:49, Alan Stern wrote:
> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>> Compiler CSE and SSA GVN optimizations can cause the address dependency
>> of addresses returned by rcu_dereference to be lost when comparing those
>> pointers with either constants or previously loaded pointers.
>>
>> Introduce ptr_eq() to compare two addresses while preserving the address
>> dependencies for later use of the address. It should be used when
>> comparing an address returned by rcu_dereference().
>>
>> This is needed to prevent the compiler CSE and SSA GVN optimizations
>> from replacing the registers holding @a or @b based on their
>
> "Replacing" isn't the right word. What the compiler does is use one
> rather than the other. Furthermore, the compiler can play these games
> even with values that aren't in registers.
>
> You should just say: "... from using @a (or @b) in places where the
> source refers to @b (or @a) (based on the fact that after the
> comparison, the two are known to be equal), which does not ..."
OK.
>
>> equality, which does not preserve address dependencies and allows the
>> following misordering speculations:
>>
>> - If @b is a constant, the compiler can issue the loads which depend
>> on @a before loading @a.
>> - If @b is a register populated by a prior load, weakly-ordered
>> CPUs can speculate loads which depend on @a before loading @a.
>
> It shouldn't matter whether @a and @b are constants, registers, or
> anything else. All that matters is that the compiler uses the wrong
> one, which allows weakly ordered CPUs to speculate loads you wouldn't
> expect it to, based on the source code alone.
I only partially agree here.
On weakly-ordered architectures, indeed we don't care whether the
issue is caused by the compiler reordering the code (constant)
or the CPU speculating the load (registers).
However, on strongly-ordered architectures, AFAIU, only the constant
case is problematic (compiler reordering the dependent load), because
CPU speculating the loads across the control dependency is not an
issue.
So am I tempted to keep examples that clearly state whether
the issue is caused by compiler reordering instructions, or by
CPU speculation.
>
>> The same logic applies with @a and @b swapped.
>>
>> The compiler barrier() is ineffective at fixing this issue.
>> It does not prevent the compiler CSE from losing the address dependency:
>>
>> int fct_2_volatile_barriers(void)
>> {
>> int *a, *b;
>>
>> do {
>> a = READ_ONCE(p);
>> asm volatile ("" : : : "memory");
>> b = READ_ONCE(p);
>> } while (a != b);
>> asm volatile ("" : : : "memory"); <----- barrier()
>> return *b;
>> }
>>
>> With gcc 14.2 (arm64):
>>
>> fct_2_volatile_barriers:
>> adrp x0, .LANCHOR0
>> add x0, x0, :lo12:.LANCHOR0
>> .L2:
>> ldr x1, [x0] <------ x1 populated by first load.
>> ldr x2, [x0]
>> cmp x1, x2
>> bne .L2
>> ldr w0, [x1] <------ x1 is used for access which should depend on b.
>> ret
>>
>> On weakly-ordered architectures, this lets CPU speculation use the
>> result from the first load to speculate "ldr w0, [x1]" before
>> "ldr x2, [x0]".
>> Based on the RCU documentation, the control dependency does not prevent
>> the CPU from speculating loads.
>>
[...]
>> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
>> index 2df665fa2964..f26705c267e8 100644
>> --- a/include/linux/compiler.h
>> +++ b/include/linux/compiler.h
>> @@ -186,6 +186,68 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
>> __asm__ ("" : "=r" (var) : "0" (var))
>> #endif
>>
>> +/*
>> + * Compare two addresses while preserving the address dependencies for
>> + * later use of the address. It should be used when comparing an address
>> + * returned by rcu_dereference().
>> + *
>> + * This is needed to prevent the compiler CSE and SSA GVN optimizations
>> + * from replacing the registers holding @a or @b based on their
>> + * equality, which does not preserve address dependencies and allows the
Replacing with:
* This is needed to prevent the compiler CSE and SSA GVN optimizations
* from using @a (or @b) in places where the source refers to @b (or @a)
* based on the fact that after the comparison, the two are known to be
* equal, which does not preserve address dependencies and allows the
* following misordering speculations:
>> + * following misordering speculations:
>> + *
>> + * - If @b is a constant, the compiler can issue the loads which depend
>> + * on @a before loading @a.
>> + * - If @b is a register populated by a prior load, weakly-ordered
>> + * CPUs can speculate loads which depend on @a before loading @a.
>> + *
>> + * The same logic applies with @a and @b swapped.
>
> This could be more concise, and it should be more general (along the
> same lines as the description above).
As per my earlier comment, I would prefer to keep the examples specific
rather than general so it is clear which scenarios are problematic on
weakly vs strongly ordered architectures.
[...]
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 15:32 ` Mathieu Desnoyers
@ 2024-09-28 15:49 ` Alan Stern
2024-09-28 15:55 ` Mathieu Desnoyers
0 siblings, 1 reply; 42+ messages in thread
From: Alan Stern @ 2024-09-28 15:49 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo,
Jonas Oberhauser, rcu, linux-mm, lkmm
On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
> On 2024-09-28 16:49, Alan Stern wrote:
> > On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
> > > equality, which does not preserve address dependencies and allows the
> > > following misordering speculations:
> > >
> > > - If @b is a constant, the compiler can issue the loads which depend
> > > on @a before loading @a.
> > > - If @b is a register populated by a prior load, weakly-ordered
> > > CPUs can speculate loads which depend on @a before loading @a.
> >
> > It shouldn't matter whether @a and @b are constants, registers, or
> > anything else. All that matters is that the compiler uses the wrong
> > one, which allows weakly ordered CPUs to speculate loads you wouldn't
> > expect it to, based on the source code alone.
>
> I only partially agree here.
>
> On weakly-ordered architectures, indeed we don't care whether the
> issue is caused by the compiler reordering the code (constant)
> or the CPU speculating the load (registers).
>
> However, on strongly-ordered architectures, AFAIU, only the constant
> case is problematic (compiler reordering the dependent load), because
I thought you were trying to prevent the compiler from using one pointer
instead of the other, not trying to prevent it from reordering anything.
Isn't this the point the documentation wants to get across when it says
that comparing pointers can be dangerous?
> CPU speculating the loads across the control dependency is not an
> issue.
>
> So am I tempted to keep examples that clearly state whether
> the issue is caused by compiler reordering instructions, or by
> CPU speculation.
Isn't it true that on strongly ordered CPUs, a compiler barrier is
sufficient to prevent the rcu_dereference() problem? So the whole idea
behind ptr_eq() is that it prevents the problem on all CPUs.
You can make your examples as specific as you like, but the fact remains
that ptr_eq() is meant to prevent situations where both:
The compiler uses the wrong pointer for a load, and
The CPU performs the load earlier than you want.
If either one of those doesn't hold then the problem won't arise.
Alan Stern
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 15:49 ` Alan Stern
@ 2024-09-28 15:55 ` Mathieu Desnoyers
2024-09-28 21:15 ` Alan Stern
2024-09-28 22:26 ` Alan Huang
0 siblings, 2 replies; 42+ messages in thread
From: Mathieu Desnoyers @ 2024-09-28 15:55 UTC (permalink / raw)
To: Alan Stern
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo,
Jonas Oberhauser, rcu, linux-mm, lkmm
On 2024-09-28 17:49, Alan Stern wrote:
> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
>> On 2024-09-28 16:49, Alan Stern wrote:
>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>> equality, which does not preserve address dependencies and allows the
>>>> following misordering speculations:
>>>>
>>>> - If @b is a constant, the compiler can issue the loads which depend
>>>> on @a before loading @a.
>>>> - If @b is a register populated by a prior load, weakly-ordered
>>>> CPUs can speculate loads which depend on @a before loading @a.
>>>
>>> It shouldn't matter whether @a and @b are constants, registers, or
>>> anything else. All that matters is that the compiler uses the wrong
>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't
>>> expect it to, based on the source code alone.
>>
>> I only partially agree here.
>>
>> On weakly-ordered architectures, indeed we don't care whether the
>> issue is caused by the compiler reordering the code (constant)
>> or the CPU speculating the load (registers).
>>
>> However, on strongly-ordered architectures, AFAIU, only the constant
>> case is problematic (compiler reordering the dependent load), because
>
> I thought you were trying to prevent the compiler from using one pointer
> instead of the other, not trying to prevent it from reordering anything.
> Isn't this the point the documentation wants to get across when it says
> that comparing pointers can be dangerous?
The motivation for introducing ptr_eq() is indeed because the
compiler barrier is not sufficient to prevent the compiler from
using one pointer instead of the other.
But it turns out that ptr_eq() is also a good tool to prevent the
compiler from reordering loads in case where the comparison is
done against a constant.
>
>> CPU speculating the loads across the control dependency is not an
>> issue.
>>
>> So am I tempted to keep examples that clearly state whether
>> the issue is caused by compiler reordering instructions, or by
>> CPU speculation.
>
> Isn't it true that on strongly ordered CPUs, a compiler barrier is
> sufficient to prevent the rcu_dereference() problem? So the whole idea
> behind ptr_eq() is that it prevents the problem on all CPUs.
Correct. But given that we have ptr_eq(), it's good to show how it
equally prevents the compiler from reordering address-dependent loads
(comparison with constant) *and* prevents the compiler from using
one pointer rather than the other (comparison between two non-constant
pointers) which affects speculation on weakly-ordered CPUs.
> You can make your examples as specific as you like, but the fact remains
> that ptr_eq() is meant to prevent situations where both:
>
> The compiler uses the wrong pointer for a load, and
>
> The CPU performs the load earlier than you want.
>
> If either one of those doesn't hold then the problem won't arise.
Correct.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 15:55 ` Mathieu Desnoyers
@ 2024-09-28 21:15 ` Alan Stern
2024-09-30 9:42 ` Jonas Oberhauser
2024-09-28 22:26 ` Alan Huang
1 sibling, 1 reply; 42+ messages in thread
From: Alan Stern @ 2024-09-28 21:15 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo,
Jonas Oberhauser, rcu, linux-mm, lkmm
On Sat, Sep 28, 2024 at 11:55:22AM -0400, Mathieu Desnoyers wrote:
> On 2024-09-28 17:49, Alan Stern wrote:
> > On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
> > > On 2024-09-28 16:49, Alan Stern wrote:
> > > > On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
> > > > > equality, which does not preserve address dependencies and allows the
> > > > > following misordering speculations:
> > > > >
> > > > > - If @b is a constant, the compiler can issue the loads which depend
> > > > > on @a before loading @a.
> > > > > - If @b is a register populated by a prior load, weakly-ordered
> > > > > CPUs can speculate loads which depend on @a before loading @a.
> > > >
> > > > It shouldn't matter whether @a and @b are constants, registers, or
> > > > anything else. All that matters is that the compiler uses the wrong
> > > > one, which allows weakly ordered CPUs to speculate loads you wouldn't
> > > > expect it to, based on the source code alone.
> > >
> > > I only partially agree here.
> > >
> > > On weakly-ordered architectures, indeed we don't care whether the
> > > issue is caused by the compiler reordering the code (constant)
> > > or the CPU speculating the load (registers).
> > >
> > > However, on strongly-ordered architectures, AFAIU, only the constant
> > > case is problematic (compiler reordering the dependent load), because
> >
> > I thought you were trying to prevent the compiler from using one pointer
> > instead of the other, not trying to prevent it from reordering anything.
> > Isn't this the point the documentation wants to get across when it says
> > that comparing pointers can be dangerous?
>
> The motivation for introducing ptr_eq() is indeed because the
> compiler barrier is not sufficient to prevent the compiler from
> using one pointer instead of the other.
>
> But it turns out that ptr_eq() is also a good tool to prevent the
> compiler from reordering loads in case where the comparison is
> done against a constant.
Isn't that the same thing? A constant pointer like &x is still a
pointer. What you want to do is compare p with &x without allowing the
compiler to then replace *p with *&x (or just x).
> > Isn't it true that on strongly ordered CPUs, a compiler barrier is
> > sufficient to prevent the rcu_dereference() problem? So the whole idea
> > behind ptr_eq() is that it prevents the problem on all CPUs.
>
> Correct. But given that we have ptr_eq(), it's good to show how it
> equally prevents the compiler from reordering address-dependent loads
> (comparison with constant) *and* prevents the compiler from using
> one pointer rather than the other (comparison between two non-constant
> pointers) which affects speculation on weakly-ordered CPUs.
I don't see how these two things differ from each other. In the
comparison-with-a-constant case, how is the compiler reordering
anything? Isn't it just using the constant address rather than the
loaded pointer and thereby breaking the address dependency?
Alan stern
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 21:15 ` Alan Stern
@ 2024-09-30 9:42 ` Jonas Oberhauser
2024-09-30 11:04 ` Paul E. McKenney
0 siblings, 1 reply; 42+ messages in thread
From: Jonas Oberhauser @ 2024-09-30 9:42 UTC (permalink / raw)
To: Alan Stern, Mathieu Desnoyers
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
Am 9/28/2024 um 11:15 PM schrieb Alan Stern:
> On Sat, Sep 28, 2024 at 11:55:22AM -0400, Mathieu Desnoyers wrote:
>> On 2024-09-28 17:49, Alan Stern wrote:
>>> Isn't it true that on strongly ordered CPUs, a compiler barrier is
>>> sufficient to prevent the rcu_dereference() problem? So the whole idea
>>> behind ptr_eq() is that it prevents the problem on all CPUs.
>>
>> Correct. But given that we have ptr_eq(), it's good to show how it
>> equally prevents the compiler from reordering address-dependent loads
>> (comparison with constant) *and* prevents the compiler from using
>> one pointer rather than the other (comparison between two non-constant
>> pointers) which affects speculation on weakly-ordered CPUs.
>
> I don't see how these two things differ from each other. In the
> comparison-with-a-constant case, how is the compiler reordering
> anything? Isn't it just using the constant address rather than the
> loaded pointer and thereby breaking the address dependency?
I also currently don't see any major difference between the constant and
register case. The point is that the address is known before loading
into b, and hence the compiler + hardware can speculatively load *b
before loading into b.
The only difference is how far before loading into b the address is known.
Best wishes,
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 9:42 ` Jonas Oberhauser
@ 2024-09-30 11:04 ` Paul E. McKenney
2024-09-30 12:06 ` Jonas Oberhauser
0 siblings, 1 reply; 42+ messages in thread
From: Paul E. McKenney @ 2024-09-30 11:04 UTC (permalink / raw)
To: Jonas Oberhauser
Cc: Alan Stern, Mathieu Desnoyers, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
On Mon, Sep 30, 2024 at 11:42:11AM +0200, Jonas Oberhauser wrote:
>
>
> Am 9/28/2024 um 11:15 PM schrieb Alan Stern:
> > On Sat, Sep 28, 2024 at 11:55:22AM -0400, Mathieu Desnoyers wrote:
> > > On 2024-09-28 17:49, Alan Stern wrote:
> > > > Isn't it true that on strongly ordered CPUs, a compiler barrier is
> > > > sufficient to prevent the rcu_dereference() problem? So the whole idea
> > > > behind ptr_eq() is that it prevents the problem on all CPUs.
> > >
> > > Correct. But given that we have ptr_eq(), it's good to show how it
> > > equally prevents the compiler from reordering address-dependent loads
> > > (comparison with constant) *and* prevents the compiler from using
> > > one pointer rather than the other (comparison between two non-constant
> > > pointers) which affects speculation on weakly-ordered CPUs.
> >
> > I don't see how these two things differ from each other. In the
> > comparison-with-a-constant case, how is the compiler reordering
> > anything? Isn't it just using the constant address rather than the
> > loaded pointer and thereby breaking the address dependency?
>
> I also currently don't see any major difference between the constant and
> register case. The point is that the address is known before loading into b,
> and hence the compiler + hardware can speculatively load *b before loading
> into b.
>
> The only difference is how far before loading into b the address is known.
In theory, true. In practice, in the register case, you need a little
more bad luck for the compiler to be able to exploit your mistake.
Still, it is indeed far better to attain a state of bliss by keeping
the compiler ignorant.
Thanx, Paul
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 11:04 ` Paul E. McKenney
@ 2024-09-30 12:06 ` Jonas Oberhauser
2024-09-30 13:54 ` Paul E. McKenney
0 siblings, 1 reply; 42+ messages in thread
From: Jonas Oberhauser @ 2024-09-30 12:06 UTC (permalink / raw)
To: paulmck
Cc: Alan Stern, Mathieu Desnoyers, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
Am 9/30/2024 um 1:04 PM schrieb Paul E. McKenney:
> On Mon, Sep 30, 2024 at 11:42:11AM +0200, Jonas Oberhauser wrote:
>>
>>
>> I also currently don't see any major difference between the constant and
>> register case. The point is that the address is known before loading into b,
>> and hence the compiler + hardware can speculatively load *b before loading
>> into b.
>
> In theory, true. In practice, in the register case, you need a little
> more bad luck for the compiler to be able to exploit your mistake.
If there's one thing I've never run out of, then it is bad luck with
technology.
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 12:06 ` Jonas Oberhauser
@ 2024-09-30 13:54 ` Paul E. McKenney
0 siblings, 0 replies; 42+ messages in thread
From: Paul E. McKenney @ 2024-09-30 13:54 UTC (permalink / raw)
To: Jonas Oberhauser
Cc: Alan Stern, Mathieu Desnoyers, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
On Mon, Sep 30, 2024 at 02:06:12PM +0200, Jonas Oberhauser wrote:
>
>
> Am 9/30/2024 um 1:04 PM schrieb Paul E. McKenney:
> > On Mon, Sep 30, 2024 at 11:42:11AM +0200, Jonas Oberhauser wrote:
> > >
> > >
> > > I also currently don't see any major difference between the constant and
> > > register case. The point is that the address is known before loading into b,
> > > and hence the compiler + hardware can speculatively load *b before loading
> > > into b.
> >
> > In theory, true. In practice, in the register case, you need a little
> > more bad luck for the compiler to be able to exploit your mistake.
>
> If there's one thing I've never run out of, then it is bad luck with
> technology.
Careful! Someone might try to recruit you as tester. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 15:55 ` Mathieu Desnoyers
2024-09-28 21:15 ` Alan Stern
@ 2024-09-28 22:26 ` Alan Huang
2024-09-28 23:55 ` Boqun Feng
2024-09-30 8:57 ` Jonas Oberhauser
1 sibling, 2 replies; 42+ messages in thread
From: Alan Huang @ 2024-09-28 22:26 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Alan Stern, Linus Torvalds, LKML, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki (Sony),
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, Jonas Oberhauser, RCU, linux-mm, lkmm
2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>
> On 2024-09-28 17:49, Alan Stern wrote:
>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
>>> On 2024-09-28 16:49, Alan Stern wrote:
>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>>> equality, which does not preserve address dependencies and allows the
>>>>> following misordering speculations:
>>>>>
>>>>> - If @b is a constant, the compiler can issue the loads which depend
>>>>> on @a before loading @a.
>>>>> - If @b is a register populated by a prior load, weakly-ordered
>>>>> CPUs can speculate loads which depend on @a before loading @a.
>>>>
>>>> It shouldn't matter whether @a and @b are constants, registers, or
>>>> anything else. All that matters is that the compiler uses the wrong
>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't
>>>> expect it to, based on the source code alone.
>>>
>>> I only partially agree here.
>>>
>>> On weakly-ordered architectures, indeed we don't care whether the
>>> issue is caused by the compiler reordering the code (constant)
>>> or the CPU speculating the load (registers).
>>>
>>> However, on strongly-ordered architectures, AFAIU, only the constant
>>> case is problematic (compiler reordering the dependent load), because
>> I thought you were trying to prevent the compiler from using one pointer
>> instead of the other, not trying to prevent it from reordering anything.
>> Isn't this the point the documentation wants to get across when it says
>> that comparing pointers can be dangerous?
>
> The motivation for introducing ptr_eq() is indeed because the
> compiler barrier is not sufficient to prevent the compiler from
> using one pointer instead of the other.
barrier_data(&b) prevents that.
>
> But it turns out that ptr_eq() is also a good tool to prevent the
> compiler from reordering loads in case where the comparison is
> done against a constant.
>
>>> CPU speculating the loads across the control dependency is not an
>>> issue.
>>>
>>> So am I tempted to keep examples that clearly state whether
>>> the issue is caused by compiler reordering instructions, or by
>>> CPU speculation.
>> Isn't it true that on strongly ordered CPUs, a compiler barrier is
>> sufficient to prevent the rcu_dereference() problem? So the whole idea
>> behind ptr_eq() is that it prevents the problem on all CPUs.
>
> Correct. But given that we have ptr_eq(), it's good to show how it
> equally prevents the compiler from reordering address-dependent loads
> (comparison with constant) *and* prevents the compiler from using
> one pointer rather than the other (comparison between two non-constant
> pointers) which affects speculation on weakly-ordered CPUs.
>
>> You can make your examples as specific as you like, but the fact remains
>> that ptr_eq() is meant to prevent situations where both:
>> The compiler uses the wrong pointer for a load, and
>> The CPU performs the load earlier than you want.
>> If either one of those doesn't hold then the problem won't arise.
>
> Correct.
>
> Thanks,
>
> Mathieu
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 22:26 ` Alan Huang
@ 2024-09-28 23:55 ` Boqun Feng
2024-09-29 0:20 ` Alan Huang
2024-09-30 8:57 ` Jonas Oberhauser
1 sibling, 1 reply; 42+ messages in thread
From: Boqun Feng @ 2024-09-28 23:55 UTC (permalink / raw)
To: Alan Huang, Mathieu Desnoyers
Cc: Alan Stern, Linus Torvalds, LKML, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki (Sony),
rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, Jonas Oberhauser, RCU, linux-mm, lkmm
On Sun, Sep 29, 2024, at 6:26 AM, Alan Huang wrote:
> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2024-09-28 17:49, Alan Stern wrote:
>>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
>>>> On 2024-09-28 16:49, Alan Stern wrote:
>>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>>>> equality, which does not preserve address dependencies and allows the
>>>>>> following misordering speculations:
>>>>>>
>>>>>> - If @b is a constant, the compiler can issue the loads which depend
>>>>>> on @a before loading @a.
>>>>>> - If @b is a register populated by a prior load, weakly-ordered
>>>>>> CPUs can speculate loads which depend on @a before loading @a.
>>>>>
>>>>> It shouldn't matter whether @a and @b are constants, registers, or
>>>>> anything else. All that matters is that the compiler uses the wrong
>>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't
>>>>> expect it to, based on the source code alone.
>>>>
>>>> I only partially agree here.
>>>>
>>>> On weakly-ordered architectures, indeed we don't care whether the
>>>> issue is caused by the compiler reordering the code (constant)
>>>> or the CPU speculating the load (registers).
>>>>
>>>> However, on strongly-ordered architectures, AFAIU, only the constant
>>>> case is problematic (compiler reordering the dependent load), because
>>> I thought you were trying to prevent the compiler from using one pointer
>>> instead of the other, not trying to prevent it from reordering anything.
>>> Isn't this the point the documentation wants to get across when it says
>>> that comparing pointers can be dangerous?
>>
>> The motivation for introducing ptr_eq() is indeed because the
>> compiler barrier is not sufficient to prevent the compiler from
>> using one pointer instead of the other.
>
> barrier_data(&b) prevents that.
>
It prevents that because it acts as barrier() + OPTIMIZER_HIDE_VAR(b).
I don’t see much value of using that since we can resolve the problem
with OPTIMIZER_HIDE_VAR() alone.
Regards,
Boqun
>>
>> But it turns out that ptr_eq() is also a good tool to prevent the
>> compiler from reordering loads in case where the comparison is
>> done against a constant.
>>
>>>> CPU speculating the loads across the control dependency is not an
>>>> issue.
>>>>
>>>> So am I tempted to keep examples that clearly state whether
>>>> the issue is caused by compiler reordering instructions, or by
>>>> CPU speculation.
>>> Isn't it true that on strongly ordered CPUs, a compiler barrier is
>>> sufficient to prevent the rcu_dereference() problem? So the whole idea
>>> behind ptr_eq() is that it prevents the problem on all CPUs.
>>
>> Correct. But given that we have ptr_eq(), it's good to show how it
>> equally prevents the compiler from reordering address-dependent loads
>> (comparison with constant) *and* prevents the compiler from using
>> one pointer rather than the other (comparison between two non-constant
>> pointers) which affects speculation on weakly-ordered CPUs.
>>
>>> You can make your examples as specific as you like, but the fact remains
>>> that ptr_eq() is meant to prevent situations where both:
>>> The compiler uses the wrong pointer for a load, and
>>> The CPU performs the load earlier than you want.
>>> If either one of those doesn't hold then the problem won't arise.
>>
>> Correct.
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>>
>>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 23:55 ` Boqun Feng
@ 2024-09-29 0:20 ` Alan Huang
0 siblings, 0 replies; 42+ messages in thread
From: Alan Huang @ 2024-09-29 0:20 UTC (permalink / raw)
To: Boqun Feng
Cc: Mathieu Desnoyers, Alan Stern, Linus Torvalds, LKML,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, John Stultz, Neeraj upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki (Sony),
rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, Jonas Oberhauser, RCU, linux-mm, lkmm
2024年9月29日 07:55,Boqun Feng <boqun.feng@gmail.com> wrote:
>
>
>
> On Sun, Sep 29, 2024, at 6:26 AM, Alan Huang wrote:
>> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>
>>> On 2024-09-28 17:49, Alan Stern wrote:
>>>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
>>>>> On 2024-09-28 16:49, Alan Stern wrote:
>>>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>>>>> equality, which does not preserve address dependencies and allows the
>>>>>>> following misordering speculations:
>>>>>>>
>>>>>>> - If @b is a constant, the compiler can issue the loads which depend
>>>>>>> on @a before loading @a.
>>>>>>> - If @b is a register populated by a prior load, weakly-ordered
>>>>>>> CPUs can speculate loads which depend on @a before loading @a.
>>>>>>
>>>>>> It shouldn't matter whether @a and @b are constants, registers, or
>>>>>> anything else. All that matters is that the compiler uses the wrong
>>>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't
>>>>>> expect it to, based on the source code alone.
>>>>>
>>>>> I only partially agree here.
>>>>>
>>>>> On weakly-ordered architectures, indeed we don't care whether the
>>>>> issue is caused by the compiler reordering the code (constant)
>>>>> or the CPU speculating the load (registers).
>>>>>
>>>>> However, on strongly-ordered architectures, AFAIU, only the constant
>>>>> case is problematic (compiler reordering the dependent load), because
>>>> I thought you were trying to prevent the compiler from using one pointer
>>>> instead of the other, not trying to prevent it from reordering anything.
>>>> Isn't this the point the documentation wants to get across when it says
>>>> that comparing pointers can be dangerous?
>>>
>>> The motivation for introducing ptr_eq() is indeed because the
>>> compiler barrier is not sufficient to prevent the compiler from
>>> using one pointer instead of the other.
>>
>> barrier_data(&b) prevents that.
>>
>
> It prevents that because it acts as barrier() + OPTIMIZER_HIDE_VAR(b).
> I don’t see much value of using that since we can resolve the problem
> with OPTIMIZER_HIDE_VAR() alone.
Yeah, barrier_data generates more instructions.
>
> Regards,
> Boqun
>
>>>
>>> But it turns out that ptr_eq() is also a good tool to prevent the
>>> compiler from reordering loads in case where the comparison is
>>> done against a constant.
>>>
>>>>> CPU speculating the loads across the control dependency is not an
>>>>> issue.
>>>>>
>>>>> So am I tempted to keep examples that clearly state whether
>>>>> the issue is caused by compiler reordering instructions, or by
>>>>> CPU speculation.
>>>> Isn't it true that on strongly ordered CPUs, a compiler barrier is
>>>> sufficient to prevent the rcu_dereference() problem? So the whole idea
>>>> behind ptr_eq() is that it prevents the problem on all CPUs.
>>>
>>> Correct. But given that we have ptr_eq(), it's good to show how it
>>> equally prevents the compiler from reordering address-dependent loads
>>> (comparison with constant) *and* prevents the compiler from using
>>> one pointer rather than the other (comparison between two non-constant
>>> pointers) which affects speculation on weakly-ordered CPUs.
>>>
>>>> You can make your examples as specific as you like, but the fact remains
>>>> that ptr_eq() is meant to prevent situations where both:
>>>> The compiler uses the wrong pointer for a load, and
>>>> The CPU performs the load earlier than you want.
>>>> If either one of those doesn't hold then the problem won't arise.
>>>
>>> Correct.
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> https://www.efficios.com
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 22:26 ` Alan Huang
2024-09-28 23:55 ` Boqun Feng
@ 2024-09-30 8:57 ` Jonas Oberhauser
2024-09-30 9:15 ` Alan Huang
1 sibling, 1 reply; 42+ messages in thread
From: Jonas Oberhauser @ 2024-09-30 8:57 UTC (permalink / raw)
To: Alan Huang, Mathieu Desnoyers
Cc: Alan Stern, Linus Torvalds, LKML, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki (Sony),
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, RCU, linux-mm, lkmm
Am 9/29/2024 um 12:26 AM schrieb Alan Huang:
> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>
>> On 2024-09-28 17:49, Alan Stern wrote:
>>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
>>>> On 2024-09-28 16:49, Alan Stern wrote:
>>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>>>> equality, which does not preserve address dependencies and allows the
>>>>>> following misordering speculations:
>>>>>>
>>>>>> - If @b is a constant, the compiler can issue the loads which depend
>>>>>> on @a before loading @a.
>>>>>> - If @b is a register populated by a prior load, weakly-ordered
>>>>>> CPUs can speculate loads which depend on @a before loading @a.
>>>>>
>>>>> It shouldn't matter whether @a and @b are constants, registers, or
>>>>> anything else. All that matters is that the compiler uses the wrong
>>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't
>>>>> expect it to, based on the source code alone.
>>>>
>>>> I only partially agree here.
>>>>
>>>> On weakly-ordered architectures, indeed we don't care whether the
>>>> issue is caused by the compiler reordering the code (constant)
>>>> or the CPU speculating the load (registers).
>>>>
>>>> However, on strongly-ordered architectures, AFAIU, only the constant
>>>> case is problematic (compiler reordering the dependent load), because
>>> I thought you were trying to prevent the compiler from using one pointer
>>> instead of the other, not trying to prevent it from reordering anything.
>>> Isn't this the point the documentation wants to get across when it says
>>> that comparing pointers can be dangerous?
>>
>> The motivation for introducing ptr_eq() is indeed because the
>> compiler barrier is not sufficient to prevent the compiler from
>> using one pointer instead of the other.
>
> barrier_data(&b) prevents that.
I don't think one barrier_data can garantuee preventing this, because
right after doing the comparison, the compiler still could do b=a.
In that case you would be guaranteed to use the value in b, but that
value is not the value loaded into b originally but rather the value
loaded into a, and hence your address dependency goes to the wrong load
still.
However, doing
barrier_data(&b);
if (a == b) {
barrier();
foo(*b);
}
might maybe prevent it, because after the address of b is escaped, the
compiler might no longer be allowed to just do b=a;, but I'm not sure if
that is completely correct, since the compiler knows b==a and no other
thread can be concurrently modifying a or b. Therefore, given that the
compiler knows the hardware, it might know that assigning b=a would not
cause any race-related issues even if another thread was reading b
concurrently.
Finally, it may be only a combination of barrier_data and making b
volatile could be guaranteed to solve the issue, but the code will be
very obscure compared to using ptr_eq.
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 8:57 ` Jonas Oberhauser
@ 2024-09-30 9:15 ` Alan Huang
2024-09-30 9:27 ` Alan Huang
0 siblings, 1 reply; 42+ messages in thread
From: Alan Huang @ 2024-09-30 9:15 UTC (permalink / raw)
To: Jonas Oberhauser
Cc: Mathieu Desnoyers, Alan Stern, Linus Torvalds, LKML,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki (Sony),
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, RCU, linux-mm, lkmm
2024年9月30日 16:57,Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> 写道:
>
>
>
> Am 9/29/2024 um 12:26 AM schrieb Alan Huang:
>> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>
>>> On 2024-09-28 17:49, Alan Stern wrote:
>>>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
>>>>> On 2024-09-28 16:49, Alan Stern wrote:
>>>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>>>>> equality, which does not preserve address dependencies and allows the
>>>>>>> following misordering speculations:
>>>>>>>
>>>>>>> - If @b is a constant, the compiler can issue the loads which depend
>>>>>>> on @a before loading @a.
>>>>>>> - If @b is a register populated by a prior load, weakly-ordered
>>>>>>> CPUs can speculate loads which depend on @a before loading @a.
>>>>>>
>>>>>> It shouldn't matter whether @a and @b are constants, registers, or
>>>>>> anything else. All that matters is that the compiler uses the wrong
>>>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't
>>>>>> expect it to, based on the source code alone.
>>>>>
>>>>> I only partially agree here.
>>>>>
>>>>> On weakly-ordered architectures, indeed we don't care whether the
>>>>> issue is caused by the compiler reordering the code (constant)
>>>>> or the CPU speculating the load (registers).
>>>>>
>>>>> However, on strongly-ordered architectures, AFAIU, only the constant
>>>>> case is problematic (compiler reordering the dependent load), because
>>>> I thought you were trying to prevent the compiler from using one pointer
>>>> instead of the other, not trying to prevent it from reordering anything.
>>>> Isn't this the point the documentation wants to get across when it says
>>>> that comparing pointers can be dangerous?
>>>
>>> The motivation for introducing ptr_eq() is indeed because the
>>> compiler barrier is not sufficient to prevent the compiler from
>>> using one pointer instead of the other.
>> barrier_data(&b) prevents that.
>
> I don't think one barrier_data can garantuee preventing this, because right after doing the comparison, the compiler still could do b=a.
>
> In that case you would be guaranteed to use the value in b, but that value is not the value loaded into b originally but rather the value loaded into a, and hence your address dependency goes to the wrong load still.
After barrier_data(&b), *b will be loaded from memory, you mean even if *b is loaded from memory, the address dependency goes to the wrong load still?
>
> However, doing
>
> barrier_data(&b);
> if (a == b) {
> barrier();
> foo(*b);
> }
>
> might maybe prevent it, because after the address of b is escaped, the compiler might no longer be allowed to just do b=a;, but I'm not sure if that is completely correct, since the compiler knows b==a and no other thread can be concurrently modifying a or b. Therefore, given that the compiler knows the hardware, it might know that assigning b=a would not cause any race-related issues even if another thread was reading b concurrently.
>
> Finally, it may be only a combination of barrier_data and making b volatile could be guaranteed to solve the issue, but the code will be very obscure compared to using ptr_eq.
>
> jonas
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 9:15 ` Alan Huang
@ 2024-09-30 9:27 ` Alan Huang
2024-09-30 9:33 ` Jonas Oberhauser
0 siblings, 1 reply; 42+ messages in thread
From: Alan Huang @ 2024-09-30 9:27 UTC (permalink / raw)
To: Jonas Oberhauser
Cc: Mathieu Desnoyers, Alan Stern, Linus Torvalds, LKML,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki (Sony),
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, RCU, linux-mm, lkmm
2024年9月30日 17:15,Alan Huang <mmpgouride@gmail.com> 写道:
>
> 2024年9月30日 16:57,Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> 写道:
>>
>>
>>
>> Am 9/29/2024 um 12:26 AM schrieb Alan Huang:
>>> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>>
>>>> On 2024-09-28 17:49, Alan Stern wrote:
>>>>> On Sat, Sep 28, 2024 at 11:32:18AM -0400, Mathieu Desnoyers wrote:
>>>>>> On 2024-09-28 16:49, Alan Stern wrote:
>>>>>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>>>>>> equality, which does not preserve address dependencies and allows the
>>>>>>>> following misordering speculations:
>>>>>>>>
>>>>>>>> - If @b is a constant, the compiler can issue the loads which depend
>>>>>>>> on @a before loading @a.
>>>>>>>> - If @b is a register populated by a prior load, weakly-ordered
>>>>>>>> CPUs can speculate loads which depend on @a before loading @a.
>>>>>>>
>>>>>>> It shouldn't matter whether @a and @b are constants, registers, or
>>>>>>> anything else. All that matters is that the compiler uses the wrong
>>>>>>> one, which allows weakly ordered CPUs to speculate loads you wouldn't
>>>>>>> expect it to, based on the source code alone.
>>>>>>
>>>>>> I only partially agree here.
>>>>>>
>>>>>> On weakly-ordered architectures, indeed we don't care whether the
>>>>>> issue is caused by the compiler reordering the code (constant)
>>>>>> or the CPU speculating the load (registers).
>>>>>>
>>>>>> However, on strongly-ordered architectures, AFAIU, only the constant
>>>>>> case is problematic (compiler reordering the dependent load), because
>>>>> I thought you were trying to prevent the compiler from using one pointer
>>>>> instead of the other, not trying to prevent it from reordering anything.
>>>>> Isn't this the point the documentation wants to get across when it says
>>>>> that comparing pointers can be dangerous?
>>>>
>>>> The motivation for introducing ptr_eq() is indeed because the
>>>> compiler barrier is not sufficient to prevent the compiler from
>>>> using one pointer instead of the other.
>>> barrier_data(&b) prevents that.
>>
>> I don't think one barrier_data can garantuee preventing this, because right after doing the comparison, the compiler still could do b=a.
>>
>> In that case you would be guaranteed to use the value in b, but that value is not the value loaded into b originally but rather the value loaded into a, and hence your address dependency goes to the wrong load still.
>
> After barrier_data(&b), *b will be loaded from memory, you mean even if *b is loaded from memory, the address dependency goes to the wrong load still?
Sorry, *b should b.
>
>>
>> However, doing
>>
>> barrier_data(&b);
>> if (a == b) {
>> barrier();
>> foo(*b);
>> }
>>
>> might maybe prevent it, because after the address of b is escaped, the compiler might no longer be allowed to just do b=a;, but I'm not sure if that is completely correct, since the compiler knows b==a and no other thread can be concurrently modifying a or b. Therefore, given that the compiler knows the hardware, it might know that assigning b=a would not cause any race-related issues even if another thread was reading b concurrently.
>>
>> Finally, it may be only a combination of barrier_data and making b volatile could be guaranteed to solve the issue, but the code will be very obscure compared to using ptr_eq.
>>
>> jonas
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 9:27 ` Alan Huang
@ 2024-09-30 9:33 ` Jonas Oberhauser
2024-09-30 10:12 ` Alan Huang
0 siblings, 1 reply; 42+ messages in thread
From: Jonas Oberhauser @ 2024-09-30 9:33 UTC (permalink / raw)
To: Alan Huang
Cc: Mathieu Desnoyers, Alan Stern, Linus Torvalds, LKML,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki (Sony),
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, RCU, linux-mm, lkmm
Am 9/30/2024 um 11:27 AM schrieb Alan Huang:
> 2024年9月30日 17:15,Alan Huang <mmpgouride@gmail.com> 写道:
>>
>> 2024年9月30日 16:57,Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> 写道:
>>>
>>>
>>>
>>> Am 9/29/2024 um 12:26 AM schrieb Alan Huang:
>>>> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>>>
>>>>> The motivation for introducing ptr_eq() is indeed because the
>>>>> compiler barrier is not sufficient to prevent the compiler from
>>>>> using one pointer instead of the other.
>>>> barrier_data(&b) prevents that.
>>>
>>> I don't think one barrier_data can garantuee preventing this, because right after doing the comparison, the compiler still could do b=a.
>>>
>>> In that case you would be guaranteed to use the value in b, but that value is not the value loaded into b originally but rather the value loaded into a, and hence your address dependency goes to the wrong load still.
>>
>> After barrier_data(&b), *b will be loaded from memory, you mean even if *b is loaded from memory, the address dependency goes to the wrong load still?
>
> Sorry, *b should b.
That's exactly what I meant to say. In my understanding, it can happen
like this:
a = READ_ONCE(*p);
...
b = READ_ONCE(*p);
if (a == b) {
b = a; // inserted by compiler
barrier_data(&b);
foo(*b); // compiler definitely use the current value in b
}
In the end, the address dependency is from the first load, and the CPU
can speculatively (with register renaming, forwarding etc) execute
a = READ_ONCE(*p);
b2 = a; // speculatively
tmp = load *b2 // speculatively
b1 = READ_ONCE(*p);
if (a == b1) { // confirmed
foo(tmp);
}
best wishes,
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 9:33 ` Jonas Oberhauser
@ 2024-09-30 10:12 ` Alan Huang
0 siblings, 0 replies; 42+ messages in thread
From: Alan Huang @ 2024-09-30 10:12 UTC (permalink / raw)
To: Jonas Oberhauser
Cc: Mathieu Desnoyers, Alan Stern, Linus Torvalds, LKML,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki (Sony),
Steven Rostedt, Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long,
Mark Rutland, Thomas Gleixner, Vlastimil Babka, maged.michael,
Mateusz Guzik, Gary Guo, RCU, linux-mm, lkmm
On Sep 30, 2024, at 17:33, Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> wrote:
>
>
>
> Am 9/30/2024 um 11:27 AM schrieb Alan Huang:
>> 2024年9月30日 17:15,Alan Huang <mmpgouride@gmail.com> 写道:
>>>
>>> 2024年9月30日 16:57,Jonas Oberhauser <jonas.oberhauser@huaweicloud.com> 写道:
>>>>
>>>>
>>>>
>>>> Am 9/29/2024 um 12:26 AM schrieb Alan Huang:
>>>>> 2024年9月28日 23:55,Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
>>>>>>
>>>>>> The motivation for introducing ptr_eq() is indeed because the
>>>>>> compiler barrier is not sufficient to prevent the compiler from
>>>>>> using one pointer instead of the other.
>>>>> barrier_data(&b) prevents that.
>>>>
>>>> I don't think one barrier_data can garantuee preventing this, because right after doing the comparison, the compiler still could do b=a.
>>>>
>>>> In that case you would be guaranteed to use the value in b, but that value is not the value loaded into b originally but rather the value loaded into a, and hence your address dependency goes to the wrong load still.
>>>
>>> After barrier_data(&b), *b will be loaded from memory, you mean even if *b is loaded from memory, the address dependency goes to the wrong load still?
>> Sorry, *b should b.
>
> That's exactly what I meant to say. In my understanding, it can happen like this:
>
> a = READ_ONCE(*p);
> ...
> b = READ_ONCE(*p);
> if (a == b) {
> b = a; // inserted by compiler
>
> barrier_data(&b);
>
> foo(*b); // compiler definitely use the current value in b
> }
>
>
>
> In the end, the address dependency is from the first load, and the CPU can speculatively (with register renaming, forwarding etc) execute
>
> a = READ_ONCE(*p);
> b2 = a; // speculatively
> tmp = load *b2 // speculatively
> b1 = READ_ONCE(*p);
> if (a == b1) { // confirmed
> foo(tmp);
> }
I get it now, thanks for the explanation.
>
>
> best wishes,
> jonas
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-28 14:49 ` Alan Stern
2024-09-28 15:30 ` Mathieu Desnoyers
2024-09-28 15:32 ` Mathieu Desnoyers
@ 2024-09-30 11:26 ` Jonas Oberhauser
2024-09-30 16:43 ` Alan Stern
2 siblings, 1 reply; 42+ messages in thread
From: Jonas Oberhauser @ 2024-09-30 11:26 UTC (permalink / raw)
To: Alan Stern, Mathieu Desnoyers
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>> Compiler CSE and SSA GVN optimizations can cause the address dependency
>> of addresses returned by rcu_dereference to be lost when comparing those
>> pointers with either constants or previously loaded pointers.
>>
>> Introduce ptr_eq() to compare two addresses while preserving the address
>> dependencies for later use of the address. It should be used when
>> comparing an address returned by rcu_dereference().
>>
>> This is needed to prevent the compiler CSE and SSA GVN optimizations
>> from replacing the registers holding @a or @b based on their
>
> "Replacing" isn't the right word. What the compiler does is use one
> rather than the other. Furthermore, the compiler can play these games
> even with values that aren't in registers.
>
> You should just say: "... from using @a (or @b) in places where the
> source refers to @b (or @a) (based on the fact that after the
> comparison, the two are known to be equal), which does not ..."
I should also point out that it is not enough to prevent the compiler
from using @a instead of @b.
It must also be prevented from assigning @b=@a, which it is often
allowed to do after finding @a==@b.
Best wishes,
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 11:26 ` Jonas Oberhauser
@ 2024-09-30 16:43 ` Alan Stern
2024-09-30 17:05 ` Jonas Oberhauser
0 siblings, 1 reply; 42+ messages in thread
From: Alan Stern @ 2024-09-30 16:43 UTC (permalink / raw)
To: Jonas Oberhauser
Cc: Mathieu Desnoyers, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
>
>
> Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> > On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
> > > Compiler CSE and SSA GVN optimizations can cause the address dependency
> > > of addresses returned by rcu_dereference to be lost when comparing those
> > > pointers with either constants or previously loaded pointers.
> > >
> > > Introduce ptr_eq() to compare two addresses while preserving the address
> > > dependencies for later use of the address. It should be used when
> > > comparing an address returned by rcu_dereference().
> > >
> > > This is needed to prevent the compiler CSE and SSA GVN optimizations
> > > from replacing the registers holding @a or @b based on their
> >
> > "Replacing" isn't the right word. What the compiler does is use one
> > rather than the other. Furthermore, the compiler can play these games
> > even with values that aren't in registers.
> >
> > You should just say: "... from using @a (or @b) in places where the
> > source refers to @b (or @a) (based on the fact that after the
> > comparison, the two are known to be equal), which does not ..."
>
> I should also point out that it is not enough to prevent the compiler from
> using @a instead of @b.
>
> It must also be prevented from assigning @b=@a, which it is often allowed to
> do after finding @a==@b.
Wouldn't that be a bug? Consider this litmus test:
int x = 0;
int y = 45;
int z = 0;
void P0(int *x, int *y, int *z) {
int r1, r2;
r1 = READ_ONCE(*x);
r2 = READ_ONCE(*y);
if (r1 == r2) {
WRITE_ONCE(*z, 1);
// L1: WRITE_ONCE(*y, r1);
}
}
void P1(int *x, int *y) {
int r3;
WRITE_ONCE(*x, 45);
WRITE_ONCE(*y, 56);
r3 = READ_ONCE(*y);
}
exists (z=1 /\ 1:r3=45) (* Not allowed *)
If the compiler were to make the extra assignment (basically,
uncommenting the line marked L1) then the exists clause could be
satisfied. That would indicate there's a bug in the compiler.
Alan
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 16:43 ` Alan Stern
@ 2024-09-30 17:05 ` Jonas Oberhauser
2024-09-30 18:53 ` Alan Stern
0 siblings, 1 reply; 42+ messages in thread
From: Jonas Oberhauser @ 2024-09-30 17:05 UTC (permalink / raw)
To: Alan Stern
Cc: Mathieu Desnoyers, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
Am 9/30/2024 um 6:43 PM schrieb Alan Stern:
> On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
>>
>>
>> Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
>>> On Sat, Sep 28, 2024 at 09:51:27AM -0400, Mathieu Desnoyers wrote:
>>>> Compiler CSE and SSA GVN optimizations can cause the address dependency
>>>> of addresses returned by rcu_dereference to be lost when comparing those
>>>> pointers with either constants or previously loaded pointers.
>>>>
>>>> Introduce ptr_eq() to compare two addresses while preserving the address
>>>> dependencies for later use of the address. It should be used when
>>>> comparing an address returned by rcu_dereference().
>>>>
>>>> This is needed to prevent the compiler CSE and SSA GVN optimizations
>>>> from replacing the registers holding @a or @b based on their
>>>
>>> "Replacing" isn't the right word. What the compiler does is use one
>>> rather than the other. Furthermore, the compiler can play these games
>>> even with values that aren't in registers.
>>>
>>> You should just say: "... from using @a (or @b) in places where the
>>> source refers to @b (or @a) (based on the fact that after the
>>> comparison, the two are known to be equal), which does not ..."
>>
>> I should also point out that it is not enough to prevent the compiler from
>> using @a instead of @b.
>>
>> It must also be prevented from assigning @b=@a, which it is often allowed to
>> do after finding @a==@b.
>
> Wouldn't that be a bug?
That's why I said that it is often allowed to do it. In your case it
wouldn't, but it is often possible when a and b are non-atomic &
non-volatile (and haven't escaped, and I believe sometimes even then).
It happens for example here with GCC 14.1.0 -O3:
int fct_hide(void)
{
int *a, *b;
do {
a = READ_ONCE(p);
asm volatile ("" : : : "memory");
b = READ_ONCE(p);
} while (a != b);
OPTIMIZER_HIDE_VAR(b);
return *b;
}
ldr r1, [r2]
ldr r3, [r2]
cmp r1, r3
bne .L6
mov r3, r1 // nay...
ldr r0, [r3] // yay!
bx lr
Have fun,
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 17:05 ` Jonas Oberhauser
@ 2024-09-30 18:53 ` Alan Stern
2024-10-01 17:11 ` David Laight
0 siblings, 1 reply; 42+ messages in thread
From: Alan Stern @ 2024-09-30 18:53 UTC (permalink / raw)
To: Jonas Oberhauser
Cc: Mathieu Desnoyers, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
On Mon, Sep 30, 2024 at 07:05:06PM +0200, Jonas Oberhauser wrote:
>
>
> Am 9/30/2024 um 6:43 PM schrieb Alan Stern:
> > On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
> > >
> > >
> > > Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> > >
> > > I should also point out that it is not enough to prevent the compiler from
> > > using @a instead of @b.
> > >
> > > It must also be prevented from assigning @b=@a, which it is often allowed to
> > > do after finding @a==@b.
> >
> > Wouldn't that be a bug?
>
> That's why I said that it is often allowed to do it. In your case it
> wouldn't, but it is often possible when a and b are non-atomic &
> non-volatile (and haven't escaped, and I believe sometimes even then).
>
> It happens for example here with GCC 14.1.0 -O3:
>
> int fct_hide(void)
> {
> int *a, *b;
>
> do {
> a = READ_ONCE(p);
> asm volatile ("" : : : "memory");
> b = READ_ONCE(p);
> } while (a != b);
> OPTIMIZER_HIDE_VAR(b);
> return *b;
> }
>
>
>
> ldr r1, [r2]
> ldr r3, [r2]
> cmp r1, r3
> bne .L6
> mov r3, r1 // nay...
A totally unnecessary instruction, which accomplishes nothing other than
to waste time, space, and energy. But nonetheless, allowed -- I agree.
The people in charge of GCC's optimizer might like to hear about this,
if they're not already aware of it...
> ldr r0, [r3] // yay!
> bx lr
One could argue that in this example the compiler _has_ used *a instead
of *b. However, such an argument would have more force if we had
described what we are talking about more precisely.
Yes, we do want to prevent compilers from doing this. I'm not sure that
it really needs to be mentioned in the comments or commit description,
though.
Alan
^ permalink raw reply [flat|nested] 42+ messages in thread* RE: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-09-30 18:53 ` Alan Stern
@ 2024-10-01 17:11 ` David Laight
2024-10-01 22:57 ` 'Alan Stern'
0 siblings, 1 reply; 42+ messages in thread
From: David Laight @ 2024-10-01 17:11 UTC (permalink / raw)
To: 'Alan Stern', Jonas Oberhauser
Cc: Mathieu Desnoyers, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
From: Alan Stern
> Sent: 30 September 2024 19:53
>
> On Mon, Sep 30, 2024 at 07:05:06PM +0200, Jonas Oberhauser wrote:
> >
> >
> > Am 9/30/2024 um 6:43 PM schrieb Alan Stern:
> > > On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
> > > >
> > > >
> > > > Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> > > >
> > > > I should also point out that it is not enough to prevent the compiler from
> > > > using @a instead of @b.
> > > >
> > > > It must also be prevented from assigning @b=@a, which it is often allowed to
> > > > do after finding @a==@b.
> > >
> > > Wouldn't that be a bug?
> >
> > That's why I said that it is often allowed to do it. In your case it
> > wouldn't, but it is often possible when a and b are non-atomic &
> > non-volatile (and haven't escaped, and I believe sometimes even then).
> >
> > It happens for example here with GCC 14.1.0 -O3:
> >
> > int fct_hide(void)
> > {
> > int *a, *b;
> >
> > do {
> > a = READ_ONCE(p);
> > asm volatile ("" : : : "memory");
> > b = READ_ONCE(p);
> > } while (a != b);
> > OPTIMIZER_HIDE_VAR(b);
> > return *b;
> > }
> >
> >
> >
> > ldr r1, [r2]
> > ldr r3, [r2]
> > cmp r1, r3
> > bne .L6
> > mov r3, r1 // nay...
>
> A totally unnecessary instruction, which accomplishes nothing other than
> to waste time, space, and energy. But nonetheless, allowed -- I agree.
>
> The people in charge of GCC's optimizer might like to hear about this,
> if they're not already aware of it...
>
> > ldr r0, [r3] // yay!
> > bx lr
>
> One could argue that in this example the compiler _has_ used *a instead
> of *b. However, such an argument would have more force if we had
> described what we are talking about more precisely.
The 'mov r3, r1' has nothing to do with 'a'.
It is a more general problem that OPTIMISER_HIDE_VAR() pretty much
always ends up allocating a different internal 'register' for the
output and then allocating a separate physical rehgister.
There doesn't seem to be a later optimisation path to remove
'pointless' register moves.
David
>
> Yes, we do want to prevent compilers from doing this. I'm not sure that
> it really needs to be mentioned in the comments or commit description,
> though.
>
> Alan
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-01 17:11 ` David Laight
@ 2024-10-01 22:57 ` 'Alan Stern'
2024-10-02 8:13 ` David Laight
0 siblings, 1 reply; 42+ messages in thread
From: 'Alan Stern' @ 2024-10-01 22:57 UTC (permalink / raw)
To: David Laight
Cc: Jonas Oberhauser, Mathieu Desnoyers, Linus Torvalds,
linux-kernel, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Paul E. McKenney, Will Deacon, Peter Zijlstra, Boqun Feng,
John Stultz, Neeraj Upadhyay, Frederic Weisbecker,
Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long, Mark Rutland,
Thomas Gleixner, Vlastimil Babka, maged.michael, Mateusz Guzik,
Gary Guo, rcu, linux-mm, lkmm
On Tue, Oct 01, 2024 at 05:11:05PM +0000, David Laight wrote:
> From: Alan Stern
> > Sent: 30 September 2024 19:53
> >
> > On Mon, Sep 30, 2024 at 07:05:06PM +0200, Jonas Oberhauser wrote:
> > >
> > >
> > > Am 9/30/2024 um 6:43 PM schrieb Alan Stern:
> > > > On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
> > > > >
> > > > >
> > > > > Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> > > > >
> > > > > I should also point out that it is not enough to prevent the compiler from
> > > > > using @a instead of @b.
> > > > >
> > > > > It must also be prevented from assigning @b=@a, which it is often allowed to
> > > > > do after finding @a==@b.
> > > >
> > > > Wouldn't that be a bug?
> > >
> > > That's why I said that it is often allowed to do it. In your case it
> > > wouldn't, but it is often possible when a and b are non-atomic &
> > > non-volatile (and haven't escaped, and I believe sometimes even then).
> > >
> > > It happens for example here with GCC 14.1.0 -O3:
> > >
> > > int fct_hide(void)
> > > {
> > > int *a, *b;
> > >
> > > do {
> > > a = READ_ONCE(p);
> > > asm volatile ("" : : : "memory");
> > > b = READ_ONCE(p);
> > > } while (a != b);
> > > OPTIMIZER_HIDE_VAR(b);
> > > return *b;
> > > }
> > >
> > >
> > >
> > > ldr r1, [r2]
> > > ldr r3, [r2]
> > > cmp r1, r3
> > > bne .L6
> > > mov r3, r1 // nay...
> >
> > A totally unnecessary instruction, which accomplishes nothing other than
> > to waste time, space, and energy. But nonetheless, allowed -- I agree.
> >
> > The people in charge of GCC's optimizer might like to hear about this,
> > if they're not already aware of it...
> >
> > > ldr r0, [r3] // yay!
> > > bx lr
> >
> > One could argue that in this example the compiler _has_ used *a instead
> > of *b. However, such an argument would have more force if we had
> > described what we are talking about more precisely.
>
> The 'mov r3, r1' has nothing to do with 'a'.
What do you mean by that? At this point in the program, a is the
variable whose value is stored in r1 and b is the variable whose value
is stored in r3. "mov r3, r1" copies the value from r1 into r3 and is
therefore equivalent to executing "b = a". (That is why I said one
could argue that the "return *b" statement uses the value of *a.) Thus
it very much does have something to do with "a".
> It is a more general problem that OPTIMISER_HIDE_VAR() pretty much
> always ends up allocating a different internal 'register' for the
> output and then allocating a separate physical rehgister.
What output are you referring to? Does OPTIMISER_HIDE_VAR() have an
output? If it does, the source program above ignores it, discarding any
returned value.
> There doesn't seem to be a later optimisation path to remove
> 'pointless' register moves.
That would be a good thing to add, then.
Alan
^ permalink raw reply [flat|nested] 42+ messages in thread* RE: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-01 22:57 ` 'Alan Stern'
@ 2024-10-02 8:13 ` David Laight
2024-10-02 14:14 ` 'Alan Stern'
0 siblings, 1 reply; 42+ messages in thread
From: David Laight @ 2024-10-02 8:13 UTC (permalink / raw)
To: 'Alan Stern'
Cc: Jonas Oberhauser, Mathieu Desnoyers, Linus Torvalds,
linux-kernel, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Paul E. McKenney, Will Deacon, Peter Zijlstra, Boqun Feng,
John Stultz, Neeraj Upadhyay, Frederic Weisbecker,
Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long, Mark Rutland,
Thomas Gleixner, Vlastimil Babka, maged.michael, Mateusz Guzik,
Gary Guo, rcu, linux-mm, lkmm
From: 'Alan Stern'
> Sent: 01 October 2024 23:57
>
> On Tue, Oct 01, 2024 at 05:11:05PM +0000, David Laight wrote:
> > From: Alan Stern
> > > Sent: 30 September 2024 19:53
> > >
> > > On Mon, Sep 30, 2024 at 07:05:06PM +0200, Jonas Oberhauser wrote:
> > > >
> > > >
> > > > Am 9/30/2024 um 6:43 PM schrieb Alan Stern:
> > > > > On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
> > > > > >
> > > > > >
> > > > > > Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> > > > > >
> > > > > > I should also point out that it is not enough to prevent the compiler from
> > > > > > using @a instead of @b.
> > > > > >
> > > > > > It must also be prevented from assigning @b=@a, which it is often allowed to
> > > > > > do after finding @a==@b.
> > > > >
> > > > > Wouldn't that be a bug?
> > > >
> > > > That's why I said that it is often allowed to do it. In your case it
> > > > wouldn't, but it is often possible when a and b are non-atomic &
> > > > non-volatile (and haven't escaped, and I believe sometimes even then).
> > > >
> > > > It happens for example here with GCC 14.1.0 -O3:
> > > >
> > > > int fct_hide(void)
> > > > {
> > > > int *a, *b;
> > > >
> > > > do {
> > > > a = READ_ONCE(p);
> > > > asm volatile ("" : : : "memory");
> > > > b = READ_ONCE(p);
> > > > } while (a != b);
> > > > OPTIMIZER_HIDE_VAR(b);
> > > > return *b;
> > > > }
> > > >
> > > >
> > > >
> > > > ldr r1, [r2]
> > > > ldr r3, [r2]
> > > > cmp r1, r3
> > > > bne .L6
> > > > mov r3, r1 // nay...
> > >
> > > A totally unnecessary instruction, which accomplishes nothing other than
> > > to waste time, space, and energy. But nonetheless, allowed -- I agree.
> > >
> > > The people in charge of GCC's optimizer might like to hear about this,
> > > if they're not already aware of it...
> > >
> > > > ldr r0, [r3] // yay!
> > > > bx lr
> > >
> > > One could argue that in this example the compiler _has_ used *a instead
> > > of *b. However, such an argument would have more force if we had
> > > described what we are talking about more precisely.
> >
> > The 'mov r3, r1' has nothing to do with 'a'.
>
> What do you mean by that? At this point in the program, a is the
> variable whose value is stored in r1 and b is the variable whose value
> is stored in r3. "mov r3, r1" copies the value from r1 into r3 and is
> therefore equivalent to executing "b = a". (That is why I said one
> could argue that the "return *b" statement uses the value of *a.) Thus
> it very much does have something to do with "a".
After the cmp and bne r1 and r3 have the same value.
The compiler tracks that and will use either register later.
That can never matter.
Remember the compiler tracks values (in pseudo/internal registers)
not variables.
> > It is a more general problem that OPTIMISER_HIDE_VAR() pretty much
> > always ends up allocating a different internal 'register' for the
> > output and then allocating a separate physical rehgister.
>
> What output are you referring to? Does OPTIMISER_HIDE_VAR() have an
> output? If it does, the source program above ignores it, discarding any
> returned value.
Look up OPTIMISER_HIDE_VAR(x) it basically x = f(x) where f() is
the identity operation:
asm ("" : "+r"(x))
I'll bet that gcc allocates a separate internal/pseudo register
for the result so wants to do y = f(x).
Probably generating y = x; y = f(y);
(The 'mov' might be after the asm, but I think that would get
optimised away - the listing file might help.)
So here the compiler has just decided to reuse the register that
held the other of a/b for the extra temporary.
David
>
> > There doesn't seem to be a later optimisation path to remove
> > 'pointless' register moves.
>
> That would be a good thing to add, then.
>
> Alan
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-02 8:13 ` David Laight
@ 2024-10-02 14:14 ` 'Alan Stern'
2024-10-02 15:24 ` David Laight
0 siblings, 1 reply; 42+ messages in thread
From: 'Alan Stern' @ 2024-10-02 14:14 UTC (permalink / raw)
To: David Laight
Cc: Jonas Oberhauser, Mathieu Desnoyers, Linus Torvalds,
linux-kernel, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Paul E. McKenney, Will Deacon, Peter Zijlstra, Boqun Feng,
John Stultz, Neeraj Upadhyay, Frederic Weisbecker,
Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long, Mark Rutland,
Thomas Gleixner, Vlastimil Babka, maged.michael, Mateusz Guzik,
Gary Guo, rcu, linux-mm, lkmm
On Wed, Oct 02, 2024 at 08:13:15AM +0000, David Laight wrote:
> From: 'Alan Stern'
> > Sent: 01 October 2024 23:57
> >
> > On Tue, Oct 01, 2024 at 05:11:05PM +0000, David Laight wrote:
> > > From: Alan Stern
> > > > Sent: 30 September 2024 19:53
> > > >
> > > > On Mon, Sep 30, 2024 at 07:05:06PM +0200, Jonas Oberhauser wrote:
> > > > >
> > > > >
> > > > > Am 9/30/2024 um 6:43 PM schrieb Alan Stern:
> > > > > > On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
> > > > > > >
> > > > > > >
> > > > > > > Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> > > > > > >
> > > > > > > I should also point out that it is not enough to prevent the compiler from
> > > > > > > using @a instead of @b.
> > > > > > >
> > > > > > > It must also be prevented from assigning @b=@a, which it is often allowed to
> > > > > > > do after finding @a==@b.
> > > > > >
> > > > > > Wouldn't that be a bug?
> > > > >
> > > > > That's why I said that it is often allowed to do it. In your case it
> > > > > wouldn't, but it is often possible when a and b are non-atomic &
> > > > > non-volatile (and haven't escaped, and I believe sometimes even then).
> > > > >
> > > > > It happens for example here with GCC 14.1.0 -O3:
> > > > >
> > > > > int fct_hide(void)
> > > > > {
> > > > > int *a, *b;
> > > > >
> > > > > do {
> > > > > a = READ_ONCE(p);
> > > > > asm volatile ("" : : : "memory");
> > > > > b = READ_ONCE(p);
> > > > > } while (a != b);
> > > > > OPTIMIZER_HIDE_VAR(b);
> > > > > return *b;
> > > > > }
> > > > >
> > > > >
> > > > >
> > > > > ldr r1, [r2]
> > > > > ldr r3, [r2]
> > > > > cmp r1, r3
> > > > > bne .L6
> > > > > mov r3, r1 // nay...
> > > >
> > > > A totally unnecessary instruction, which accomplishes nothing other than
> > > > to waste time, space, and energy. But nonetheless, allowed -- I agree.
> > > >
> > > > The people in charge of GCC's optimizer might like to hear about this,
> > > > if they're not already aware of it...
> > > >
> > > > > ldr r0, [r3] // yay!
> > > > > bx lr
> > > >
> > > > One could argue that in this example the compiler _has_ used *a instead
> > > > of *b. However, such an argument would have more force if we had
> > > > described what we are talking about more precisely.
> > >
> > > The 'mov r3, r1' has nothing to do with 'a'.
> >
> > What do you mean by that? At this point in the program, a is the
> > variable whose value is stored in r1 and b is the variable whose value
> > is stored in r3. "mov r3, r1" copies the value from r1 into r3 and is
> > therefore equivalent to executing "b = a". (That is why I said one
> > could argue that the "return *b" statement uses the value of *a.) Thus
> > it very much does have something to do with "a".
>
> After the cmp and bne r1 and r3 have the same value.
> The compiler tracks that and will use either register later.
> That can never matter.
The whole point of this thread is that sometimes it _does_ matter. Not
on x86, but on weakly ordered architectures where using the wrong
register will bypass a dependency and allow the CPU to speculatively
load values earlier than the programmer wants it to.
> Remember the compiler tracks values (in pseudo/internal registers)
> not variables.
>
> > > It is a more general problem that OPTIMISER_HIDE_VAR() pretty much
> > > always ends up allocating a different internal 'register' for the
> > > output and then allocating a separate physical rehgister.
> >
> > What output are you referring to? Does OPTIMISER_HIDE_VAR() have an
> > output? If it does, the source program above ignores it, discarding any
> > returned value.
>
> Look up OPTIMISER_HIDE_VAR(x) it basically x = f(x) where f() is
> the identity operation:
> asm ("" : "+r"(x))
> I'll bet that gcc allocates a separate internal/pseudo register
> for the result so wants to do y = f(x).
> Probably generating y = x; y = f(y);
> (The 'mov' might be after the asm, but I think that would get
> optimised away - the listing file might help.)
>
> So here the compiler has just decided to reuse the register that
> held the other of a/b for the extra temporary.
I think you've got this backward. As mentioned above, a is originally
in r1 and b is in r3. The source says OPTIMIZER_HIDE_VAR(b), so you're
saying that gcc should be copying r3 into a separate internal/pseudo
register. But instead it's copying r1.
Alan
^ permalink raw reply [flat|nested] 42+ messages in thread* RE: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-02 14:14 ` 'Alan Stern'
@ 2024-10-02 15:24 ` David Laight
2024-10-03 1:50 ` 'Alan Stern'
0 siblings, 1 reply; 42+ messages in thread
From: David Laight @ 2024-10-02 15:24 UTC (permalink / raw)
To: 'Alan Stern'
Cc: Jonas Oberhauser, Mathieu Desnoyers, Linus Torvalds,
linux-kernel, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Paul E. McKenney, Will Deacon, Peter Zijlstra, Boqun Feng,
John Stultz, Neeraj Upadhyay, Frederic Weisbecker,
Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long, Mark Rutland,
Thomas Gleixner, Vlastimil Babka, maged.michael, Mateusz Guzik,
Gary Guo, rcu, linux-mm, lkmm
From: 'Alan Stern'
> Sent: 02 October 2024 15:15
>
> On Wed, Oct 02, 2024 at 08:13:15AM +0000, David Laight wrote:
> > From: 'Alan Stern'
> > > Sent: 01 October 2024 23:57
> > >
> > > On Tue, Oct 01, 2024 at 05:11:05PM +0000, David Laight wrote:
> > > > From: Alan Stern
> > > > > Sent: 30 September 2024 19:53
> > > > >
> > > > > On Mon, Sep 30, 2024 at 07:05:06PM +0200, Jonas Oberhauser wrote:
> > > > > >
> > > > > >
> > > > > > Am 9/30/2024 um 6:43 PM schrieb Alan Stern:
> > > > > > > On Mon, Sep 30, 2024 at 01:26:53PM +0200, Jonas Oberhauser wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Am 9/28/2024 um 4:49 PM schrieb Alan Stern:
> > > > > > > >
> > > > > > > > I should also point out that it is not enough to prevent the compiler from
> > > > > > > > using @a instead of @b.
> > > > > > > >
> > > > > > > > It must also be prevented from assigning @b=@a, which it is often allowed to
> > > > > > > > do after finding @a==@b.
> > > > > > >
> > > > > > > Wouldn't that be a bug?
> > > > > >
> > > > > > That's why I said that it is often allowed to do it. In your case it
> > > > > > wouldn't, but it is often possible when a and b are non-atomic &
> > > > > > non-volatile (and haven't escaped, and I believe sometimes even then).
> > > > > >
> > > > > > It happens for example here with GCC 14.1.0 -O3:
> > > > > >
> > > > > > int fct_hide(void)
> > > > > > {
> > > > > > int *a, *b;
> > > > > >
> > > > > > do {
> > > > > > a = READ_ONCE(p);
> > > > > > asm volatile ("" : : : "memory");
> > > > > > b = READ_ONCE(p);
> > > > > > } while (a != b);
> > > > > > OPTIMIZER_HIDE_VAR(b);
> > > > > > return *b;
> > > > > > }
> > > > > >
> > > > > >
> > > > > >
> > > > > > ldr r1, [r2]
> > > > > > ldr r3, [r2]
> > > > > > cmp r1, r3
> > > > > > bne .L6
> > > > > > mov r3, r1 // nay...
> > > > >
> > > > > A totally unnecessary instruction, which accomplishes nothing other than
> > > > > to waste time, space, and energy. But nonetheless, allowed -- I agree.
> > > > >
> > > > > The people in charge of GCC's optimizer might like to hear about this,
> > > > > if they're not already aware of it...
> > > > >
> > > > > > ldr r0, [r3] // yay!
> > > > > > bx lr
> > > > >
> > > > > One could argue that in this example the compiler _has_ used *a instead
> > > > > of *b. However, such an argument would have more force if we had
> > > > > described what we are talking about more precisely.
> > > >
> > > > The 'mov r3, r1' has nothing to do with 'a'.
> > >
> > > What do you mean by that? At this point in the program, a is the
> > > variable whose value is stored in r1 and b is the variable whose value
> > > is stored in r3. "mov r3, r1" copies the value from r1 into r3 and is
> > > therefore equivalent to executing "b = a". (That is why I said one
> > > could argue that the "return *b" statement uses the value of *a.) Thus
> > > it very much does have something to do with "a".
> >
> > After the cmp and bne r1 and r3 have the same value.
> > The compiler tracks that and will use either register later.
> > That can never matter.
>
> The whole point of this thread is that sometimes it _does_ matter. Not
> on x86, but on weakly ordered architectures where using the wrong
> register will bypass a dependency and allow the CPU to speculatively
> load values earlier than the programmer wants it to.
>
> > Remember the compiler tracks values (in pseudo/internal registers)
> > not variables.
> >
> > > > It is a more general problem that OPTIMISER_HIDE_VAR() pretty much
> > > > always ends up allocating a different internal 'register' for the
> > > > output and then allocating a separate physical rehgister.
> > >
> > > What output are you referring to? Does OPTIMISER_HIDE_VAR() have an
> > > output? If it does, the source program above ignores it, discarding any
> > > returned value.
> >
> > Look up OPTIMISER_HIDE_VAR(x) it basically x = f(x) where f() is
> > the identity operation:
> > asm ("" : "+r"(x))
> > I'll bet that gcc allocates a separate internal/pseudo register
> > for the result so wants to do y = f(x).
> > Probably generating y = x; y = f(y);
> > (The 'mov' might be after the asm, but I think that would get
> > optimised away - the listing file might help.)
> >
> > So here the compiler has just decided to reuse the register that
> > held the other of a/b for the extra temporary.
>
> I think you've got this backward. As mentioned above, a is originally
> in r1 and b is in r3. The source says OPTIMIZER_HIDE_VAR(b), so you're
> saying that gcc should be copying r3 into a separate internal/pseudo
> register. But instead it's copying r1.
I think I know what you are trying to do, and you just fail.
Whether something can work is another matter, but that code
can't ever work.
Inside if (a == b) the compiler will always use the same register
for references to a and b - because it knows they have the same value.
Possibly something like:
c = b;
OPTIMISER_HIDE_VAR(c);
if (a == c) {
*b
will ensure that there isn't a speculative load from *a.
You'll get at least one register-register move - but they are safe.
Otherwise you'll need to put the condition inside an asm block.
David
>
> Alan
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-02 15:24 ` David Laight
@ 2024-10-03 1:50 ` 'Alan Stern'
2024-10-03 13:23 ` Mathieu Desnoyers
0 siblings, 1 reply; 42+ messages in thread
From: 'Alan Stern' @ 2024-10-03 1:50 UTC (permalink / raw)
To: David Laight
Cc: Jonas Oberhauser, Mathieu Desnoyers, Linus Torvalds,
linux-kernel, Greg Kroah-Hartman, Sebastian Andrzej Siewior,
Paul E. McKenney, Will Deacon, Peter Zijlstra, Boqun Feng,
John Stultz, Neeraj Upadhyay, Frederic Weisbecker,
Joel Fernandes, Josh Triplett, Uladzislau Rezki, Steven Rostedt,
Lai Jiangshan, Zqiang, Ingo Molnar, Waiman Long, Mark Rutland,
Thomas Gleixner, Vlastimil Babka, maged.michael, Mateusz Guzik,
Gary Guo, rcu, linux-mm, lkmm
On Wed, Oct 02, 2024 at 03:24:45PM +0000, David Laight wrote:
> I think I know what you are trying to do, and you just fail.
> Whether something can work is another matter, but that code
> can't ever work.
>
> Inside if (a == b) the compiler will always use the same register
> for references to a and b - because it knows they have the same value.
According to the other people in this discussion who have actually tried
using this code, it _does_ work (at least some of the time).
However, I'm not one of those people and so I leave it up to them to
decide how to respond to this critique.
Alan
> Possibly something like:
> c = b;
> OPTIMISER_HIDE_VAR(c);
> if (a == c) {
> *b
> will ensure that there isn't a speculative load from *a.
> You'll get at least one register-register move - but they are safe.
> Otherwise you'll need to put the condition inside an asm block.
>
> David
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-03 1:50 ` 'Alan Stern'
@ 2024-10-03 13:23 ` Mathieu Desnoyers
2024-10-03 17:07 ` David Laight
2024-10-07 11:54 ` Jonas Oberhauser
0 siblings, 2 replies; 42+ messages in thread
From: Mathieu Desnoyers @ 2024-10-03 13:23 UTC (permalink / raw)
To: 'Alan Stern', David Laight
Cc: Jonas Oberhauser, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
On 2024-10-03 03:50, 'Alan Stern' wrote:
> On Wed, Oct 02, 2024 at 03:24:45PM +0000, David Laight wrote:
>> I think I know what you are trying to do, and you just fail.
>> Whether something can work is another matter, but that code
>> can't ever work.
>>
>> Inside if (a == b) the compiler will always use the same register
>> for references to a and b - because it knows they have the same value.
>
> According to the other people in this discussion who have actually tried
> using this code, it _does_ work (at least some of the time).
>
> However, I'm not one of those people and so I leave it up to them to
> decide how to respond to this critique.
I suspect that David's comment is about this specific example that
was given in this leg of the email thread:
https://lore.kernel.org/lkml/5d7d8a59-57f5-4125-95bb-fda9c193b9cf@huaweicloud.com/
> > > > > int fct_hide(void)
> > > > > > {
> > > > > > int *a, *b;
> > > > > >
> > > > > > do {
> > > > > > a = READ_ONCE(p);
> > > > > > asm volatile ("" : : : "memory");
> > > > > > b = READ_ONCE(p);
> > > > > > } while (a != b);
> > > > > > OPTIMIZER_HIDE_VAR(b);
> > > > > > return *b;
> > > > > > }
This indeed cannot work because the hide var is done
on @b after it was compared with @a, so after the compiler
was free to use any of the registers due to the equality.
Another example that does *not* work is if we try to hide
vars on the inputs of the equality, and then proceed to do the
comparison on the resulting temporaries, e.g.:
int fct_hide(void)
{
int *a, *b;
do {
a = READ_ONCE(p);
asm volatile ("" : : : "memory");
b = READ_ONCE(p);
} while (OPTIMIZER_HIDE_VAR(a) != OPTIMIZER_HIDE_VAR(b));
return *b;
}
The reason why this does *not* work is because the compiler is
free to use either temporaries for *b at the end, because they
were deemed identical.
What _does_ work however are the following two approaches:
1) Perform the equality check on the original variables, creating
new versions (with OPTIMIZER_HIDE_VAR) of both variables for the
rest of their use, therefore making sure the pointer dereference
are not derived from versions of the variables which were compared
with another pointer. (as suggested by Boqun)
2) Perform the equality check on the versions resulting of hiding
both variables, making sure those versions of the variables are
not dereferenced afterwards. (as suggested by Linus)
Thanks,
Mathieu
>
> Alan
>
>> Possibly something like:
>> c = b;
>> OPTIMISER_HIDE_VAR(c);
>> if (a == c) {
>> *b
>> will ensure that there isn't a speculative load from *a.
>> You'll get at least one register-register move - but they are safe.
>> Otherwise you'll need to put the condition inside an asm block.
>>
>> David
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 42+ messages in thread* RE: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-03 13:23 ` Mathieu Desnoyers
@ 2024-10-03 17:07 ` David Laight
2024-10-03 18:00 ` Mathieu Desnoyers
2024-10-07 11:54 ` Jonas Oberhauser
1 sibling, 1 reply; 42+ messages in thread
From: David Laight @ 2024-10-03 17:07 UTC (permalink / raw)
To: 'Mathieu Desnoyers', 'Alan Stern'
Cc: Jonas Oberhauser, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
...
> What _does_ work however are the following two approaches:
>
> 1) Perform the equality check on the original variables, creating
> new versions (with OPTIMIZER_HIDE_VAR) of both variables for the
> rest of their use, therefore making sure the pointer dereference
> are not derived from versions of the variables which were compared
> with another pointer. (as suggested by Boqun)
If that is
a1 = a; OPTIMISER_HIDE_VAR(a1);
b1 = b; OPTIMISER_HIDE_BAR(b1);
if (a != b}
return;
// code using a1 and b1
then can't the compiler first flip it to:
if (a != b)
return;
a1 = a; OPTIMISER_HIDE_VAR(a1);
b1 = b; OPTIMISER_HIDE_VAR(b1);
and then replace the last line with:
b1 = a; OPTIMISER_HIDE_VAR(b1);
which isn't intended at all.
OTOH if you do:
a1 = a; OPTIMISER_HIDE_VAR(a1);
b1 = b; OPTIMISER_HIDE_VAR(b1);
if (a1 != b1)
return;
// code using a and b
(which I think is)
> 2) Perform the equality check on the versions resulting of hiding
> both variables, making sure those versions of the variables are
> not dereferenced afterwards. (as suggested by Linus)
then the compiler can't possibly reverse the asm blocks.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-03 17:07 ` David Laight
@ 2024-10-03 18:00 ` Mathieu Desnoyers
0 siblings, 0 replies; 42+ messages in thread
From: Mathieu Desnoyers @ 2024-10-03 18:00 UTC (permalink / raw)
To: David Laight, 'Alan Stern'
Cc: Jonas Oberhauser, Linus Torvalds, linux-kernel,
Greg Kroah-Hartman, Sebastian Andrzej Siewior, Paul E. McKenney,
Will Deacon, Peter Zijlstra, Boqun Feng, John Stultz,
Neeraj Upadhyay, Frederic Weisbecker, Joel Fernandes,
Josh Triplett, Uladzislau Rezki, Steven Rostedt, Lai Jiangshan,
Zqiang, Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
On 2024-10-03 19:07, David Laight wrote:
> ...
>> What _does_ work however are the following two approaches:
>>
>> 1) Perform the equality check on the original variables, creating
>> new versions (with OPTIMIZER_HIDE_VAR) of both variables for the
>> rest of their use, therefore making sure the pointer dereference
>> are not derived from versions of the variables which were compared
>> with another pointer. (as suggested by Boqun)
>
> If that is
> a1 = a; OPTIMISER_HIDE_VAR(a1);
> b1 = b; OPTIMISER_HIDE_BAR(b1);
> if (a != b}
> return;
> // code using a1 and b1
> then can't the compiler first flip it to:
> if (a != b)
> return;
> a1 = a; OPTIMISER_HIDE_VAR(a1);
> b1 = b; OPTIMISER_HIDE_VAR(b1);
> and then replace the last line with:
> b1 = a; OPTIMISER_HIDE_VAR(b1);
> which isn't intended at all.
Good point, so I suspect Boqun's ADDRESS_EQ() suggestion did not work:
https://lore.kernel.org/lkml/ZvX12_1mK8983cXm@boqun-archlinux/
>
>
> OTOH if you do:
> a1 = a; OPTIMISER_HIDE_VAR(a1);
> b1 = b; OPTIMISER_HIDE_VAR(b1);
> if (a1 != b1)
> return;
> // code using a and b
> (which I think is)
This is in line with Linus' suggestion, which is the approach I
retained.
>
>> 2) Perform the equality check on the versions resulting of hiding
>> both variables, making sure those versions of the variables are
>> not dereferenced afterwards. (as suggested by Linus)
>
> then the compiler can't possibly reverse the asm blocks.
Indeed.
Thanks,
Mathieu
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-03 13:23 ` Mathieu Desnoyers
2024-10-03 17:07 ` David Laight
@ 2024-10-07 11:54 ` Jonas Oberhauser
2024-10-07 13:18 ` David Laight
1 sibling, 1 reply; 42+ messages in thread
From: Jonas Oberhauser @ 2024-10-07 11:54 UTC (permalink / raw)
To: Mathieu Desnoyers, 'Alan Stern', David Laight
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
Am 10/3/2024 um 3:23 PM schrieb Mathieu Desnoyers:
> What _does_ work however are the following two approaches:
>
> 1) Perform the equality check on the original variables, creating
> new versions (with OPTIMIZER_HIDE_VAR) of both variables for the
> rest of their use, therefore making sure the pointer dereference
> are not derived from versions of the variables which were compared
> with another pointer. (as suggested by Boqun)
This should not be guaranteed to work, because right after the
comparison the compiler can do b=a, then it doesn't matter how much you
hide afterwards.
However it might work if you escape the addresses of a and b first, in
which case the compiler will not do b=a anymore, but it might force the
compiler to put a and b on the stack, which has some performance impact.
>
> 2) Perform the equality check on the versions resulting of hiding
> both variables, making sure those versions of the variables are
> not dereferenced afterwards. (as suggested by Linus)
>
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-07 11:54 ` Jonas Oberhauser
@ 2024-10-07 13:18 ` David Laight
2024-10-07 13:21 ` Mathieu Desnoyers
2024-10-07 14:59 ` Jonas Oberhauser
0 siblings, 2 replies; 42+ messages in thread
From: David Laight @ 2024-10-07 13:18 UTC (permalink / raw)
To: 'Jonas Oberhauser', Mathieu Desnoyers, 'Alan Stern'
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
From: Jonas Oberhauser
> Sent: 07 October 2024 12:55
>
> Am 10/3/2024 um 3:23 PM schrieb Mathieu Desnoyers:
> > What _does_ work however are the following two approaches:
> >
> > 1) Perform the equality check on the original variables, creating
> > new versions (with OPTIMIZER_HIDE_VAR) of both variables for the
> > rest of their use, therefore making sure the pointer dereference
> > are not derived from versions of the variables which were compared
> > with another pointer. (as suggested by Boqun)
>
> This should not be guaranteed to work, because right after the
> comparison the compiler can do b=a, then it doesn't matter how much you
> hide afterwards.
>
> However it might work if you escape the addresses of a and b first, in
> which case the compiler will not do b=a anymore, but it might force the
> compiler to put a and b on the stack, which has some performance impact.
Nope, as pointed out last week, the compiler can move the 'a == b'
check to before the OPTIMISER_HID_VAR() and then use the same register
for both of them.
> > 2) Perform the equality check on the versions resulting of hiding
> > both variables, making sure those versions of the variables are
> > not dereferenced afterwards. (as suggested by Linus)
That (and other things) could usefully use:
#define OPTIMISER_HIDE_VALUE(x) \
({ __auto_type _x = x; OPTIMISER_HIDE_VAR(_x); _x; })
You'll almost certainly end up with a register-register move
even if 'x' isn't used afterwards.
The calling could just become:
if (a == OPTIMISER_HIDE_VALUE(b) ...
since it is likely that you only care about one of the pointers.
(Actually isn't hiding one of them always enough?)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-07 13:18 ` David Laight
@ 2024-10-07 13:21 ` Mathieu Desnoyers
2024-10-07 14:59 ` Jonas Oberhauser
1 sibling, 0 replies; 42+ messages in thread
From: Mathieu Desnoyers @ 2024-10-07 13:21 UTC (permalink / raw)
To: David Laight, 'Jonas Oberhauser', 'Alan Stern'
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
On 2024-10-07 15:18, David Laight wrote:
> From: Jonas Oberhauser
>> Sent: 07 October 2024 12:55
>>
>> Am 10/3/2024 um 3:23 PM schrieb Mathieu Desnoyers:
>>> What _does_ work however are the following two approaches:
>>>
>>> 1) Perform the equality check on the original variables, creating
>>> new versions (with OPTIMIZER_HIDE_VAR) of both variables for the
>>> rest of their use, therefore making sure the pointer dereference
>>> are not derived from versions of the variables which were compared
>>> with another pointer. (as suggested by Boqun)
>>
>> This should not be guaranteed to work, because right after the
>> comparison the compiler can do b=a, then it doesn't matter how much you
>> hide afterwards.
>>
>> However it might work if you escape the addresses of a and b first, in
>> which case the compiler will not do b=a anymore, but it might force the
>> compiler to put a and b on the stack, which has some performance impact.
>
> Nope, as pointed out last week, the compiler can move the 'a == b'
> check to before the OPTIMISER_HID_VAR() and then use the same register
> for both of them.
Yes.
>
>>> 2) Perform the equality check on the versions resulting of hiding
>>> both variables, making sure those versions of the variables are
>>> not dereferenced afterwards. (as suggested by Linus)
>
> That (and other things) could usefully use:
> #define OPTIMISER_HIDE_VALUE(x) \
> ({ __auto_type _x = x; OPTIMISER_HIDE_VAR(_x); _x; })
> You'll almost certainly end up with a register-register move
> even if 'x' isn't used afterwards.
Yes.
>
> The calling could just become:
> if (a == OPTIMISER_HIDE_VALUE(b) ...
> since it is likely that you only care about one of the pointers.
> (Actually isn't hiding one of them always enough?)
Linus asked that we hide both inputs, otherwise it's really asking
for trouble in terms of API misuse.
Thanks,
Mathieu
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH 1/2] compiler.h: Introduce ptr_eq() to preserve address dependency
2024-10-07 13:18 ` David Laight
2024-10-07 13:21 ` Mathieu Desnoyers
@ 2024-10-07 14:59 ` Jonas Oberhauser
1 sibling, 0 replies; 42+ messages in thread
From: Jonas Oberhauser @ 2024-10-07 14:59 UTC (permalink / raw)
To: David Laight, Mathieu Desnoyers, 'Alan Stern'
Cc: Linus Torvalds, linux-kernel, Greg Kroah-Hartman,
Sebastian Andrzej Siewior, Paul E. McKenney, Will Deacon,
Peter Zijlstra, Boqun Feng, John Stultz, Neeraj Upadhyay,
Frederic Weisbecker, Joel Fernandes, Josh Triplett,
Uladzislau Rezki, Steven Rostedt, Lai Jiangshan, Zqiang,
Ingo Molnar, Waiman Long, Mark Rutland, Thomas Gleixner,
Vlastimil Babka, maged.michael, Mateusz Guzik, Gary Guo, rcu,
linux-mm, lkmm
Am 10/7/2024 um 3:18 PM schrieb David Laight:
> From: Jonas Oberhauser
>> Sent: 07 October 2024 12:55
>>
>> Am 10/3/2024 um 3:23 PM schrieb Mathieu Desnoyers:
>>> What _does_ work however are the following two approaches:
>>>
>>> 1) Perform the equality check on the original variables, creating
>>> new versions (with OPTIMIZER_HIDE_VAR) of both variables for the
>>> rest of their use, therefore making sure the pointer dereference
>>> are not derived from versions of the variables which were compared
>>> with another pointer. (as suggested by Boqun)
>>
>> This should not be guaranteed to work, because right after the
>> comparison the compiler can do b=a, then it doesn't matter how much you
>> hide afterwards.
>>
>> However it might work if you escape the addresses of a and b first, in
>> which case the compiler will not do b=a anymore, but it might force the
>> compiler to put a and b on the stack, which has some performance impact.
>
> Nope, as pointed out last week, the compiler can move the 'a == b'
> check to before the OPTIMISER_HID_VAR() and then use the same register
> for both of them.
>
Since the addresses of a and b have escaped, I don't think it can still
put them into the same register (or memory location).
Other threads could be temporarily modifying (inside the escape code) or
concurrently reading (after the escape code) the two variables
independently.
Something in the direction of
a = ...;
...
b = ...;
escape(&a);
escape(&b);
if (a == b) {
OPTIMIZER_HIDE_VAR(b);
*b;
}
Here doing b=a after a==b would (from the point of view of compiler)
potentially introduce a data race.
As I pointed out earlier, the compiler might be able to prove that it is
a benign data race though and theoretically still do b=a. But if you
declare b as volatile on top...
Anyways, the ptr_eq way is much more obvious.
jonas
^ permalink raw reply [flat|nested] 42+ messages in thread