* [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op()
@ 2024-09-05 17:03 Andy Shevchenko
2024-10-16 13:37 ` Andy Shevchenko
2024-10-16 15:44 ` Dave Hansen
0 siblings, 2 replies; 16+ messages in thread
From: Andy Shevchenko @ 2024-09-05 17:03 UTC (permalink / raw)
To: Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm
Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt,
Andy Shevchenko
When percpu_add_op() is used with unsigned argument, it prevents kernel builds
with clang, `make W=1` and CONFIG_WERROR=y:
net/ipv4/tcp_output.c:187:3: error: result of comparison of constant -1 with expression of type 'u8' (aka 'unsigned char') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
187 | NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
188 | tp->compressed_ack);
| ~~~~~~~~~~~~~~~~~~~
...
arch/x86/include/asm/percpu.h:238:31: note: expanded from macro 'percpu_add_op'
238 | ((val) == 1 || (val) == -1)) ? \
| ~~~~~ ^ ~~
Fix this by casting -1 to the type of the parameter and then compare.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
arch/x86/include/asm/percpu.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index c55a79d5feae..e525cd85f999 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -234,9 +234,10 @@ do { \
*/
#define percpu_add_op(size, qual, var, val) \
do { \
- const int pao_ID__ = (__builtin_constant_p(val) && \
- ((val) == 1 || (val) == -1)) ? \
- (int)(val) : 0; \
+ const int pao_ID__ = \
+ (__builtin_constant_p(val) && \
+ ((val) == 1 || \
+ (val) == (typeof(val))-1)) ? (int)(val) : 0; \
\
if (0) { \
typeof(var) pao_tmp__; \
--
2.43.0.rc1.1336.g36b5255a03ac
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-09-05 17:03 [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() Andy Shevchenko @ 2024-10-16 13:37 ` Andy Shevchenko 2024-10-16 15:44 ` Dave Hansen 1 sibling, 0 replies; 16+ messages in thread From: Andy Shevchenko @ 2024-10-16 13:37 UTC (permalink / raw) To: Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On Thu, Sep 05, 2024 at 08:03:56PM +0300, Andy Shevchenko wrote: > When percpu_add_op() is used with unsigned argument, it prevents kernel builds > with clang, `make W=1` and CONFIG_WERROR=y: > > net/ipv4/tcp_output.c:187:3: error: result of comparison of constant -1 with expression of type 'u8' (aka 'unsigned char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] > 187 | NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED, > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > 188 | tp->compressed_ack); > | ~~~~~~~~~~~~~~~~~~~ > ... > arch/x86/include/asm/percpu.h:238:31: note: expanded from macro 'percpu_add_op' > 238 | ((val) == 1 || (val) == -1)) ? \ > | ~~~~~ ^ ~~ > > Fix this by casting -1 to the type of the parameter and then compare. Any comments? Or can it be taken in? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-09-05 17:03 [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() Andy Shevchenko 2024-10-16 13:37 ` Andy Shevchenko @ 2024-10-16 15:44 ` Dave Hansen 2024-10-16 17:03 ` Nick Desaulniers ` (2 more replies) 1 sibling, 3 replies; 16+ messages in thread From: Dave Hansen @ 2024-10-16 15:44 UTC (permalink / raw) To: Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt Andy, The subject here is not very informative. It explains the "what" of the patch, but not the "why". A better subject might have been: x86/percpu: Fix clang warning when dealing with unsigned types > --- a/arch/x86/include/asm/percpu.h > +++ b/arch/x86/include/asm/percpu.h > @@ -234,9 +234,10 @@ do { \ > */ > #define percpu_add_op(size, qual, var, val) \ > do { \ > - const int pao_ID__ = (__builtin_constant_p(val) && \ > - ((val) == 1 || (val) == -1)) ? \ > - (int)(val) : 0; \ > + const int pao_ID__ = \ > + (__builtin_constant_p(val) && \ > + ((val) == 1 || \ > + (val) == (typeof(val))-1)) ? (int)(val) : 0; \ This doesn't _look_ right. Let's assume 'val' is a u8. (u8)-1 is 255, right? So casting the -1 over to a u8 actually changed its value. So the comparison that you added would actually trigger for 255: (val) == (typeof(val))-1)) 255 == (u8)-1 255 == 255 That's not the end of the world because the pao_ID__ still ends up at 255 and the lower if() falls into the "add" bucket, but it isn't great for reading the macro. It seems like it basically works on accident. Wouldn't casting 'val' over to an int be shorter, more readable, not have that logical false match *and* line up with the cast later on in the expression? const int pao_ID__ = (__builtin_constant_p(val) && ((val) == 1 || (int)(val) == -1)) ? (int)(val) : 0; Other suggestions to make it more readable would be welcome. Since I'm making comments, I would have really appreciated some extra info here like why you are hitting this and nobody else is. This is bog standard code that everybody compiles. Is clang use _that_ unusual? Or do most clang users just ignore all the warnings? Or are you using a bleeding edge version of clang that spits out new warnings that other clang users aren't seeing? Another nice thing would have been to say that this produces the exact same code with and without the patch. Or that you had tested it in *some* way. It took me a couple of minutes to convince myself that your version works and doesn't do something silly like a "dec" if you hand in val==255. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-16 15:44 ` Dave Hansen @ 2024-10-16 17:03 ` Nick Desaulniers 2024-10-16 18:06 ` Andy Shevchenko 2024-10-16 19:20 ` Peter Zijlstra 2 siblings, 0 replies; 16+ messages in thread From: Nick Desaulniers @ 2024-10-16 17:03 UTC (permalink / raw) To: Dave Hansen Cc: Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Bill Wendling, Justin Stitt On Wed, Oct 16, 2024 at 8:45 AM Dave Hansen <dave.hansen@intel.com> wrote: > Since I'm making comments, I would have really appreciated some extra > info here like why you are hitting this and nobody else is. This is bog > standard code that everybody compiles. Is clang use _that_ unusual? Or > do most clang users just ignore all the warnings? Or are you using a > bleeding edge version of clang that spits out new warnings that other > clang users aren't seeing? Note the W=1 part in the commit message. That's the part people generally don't test with, but the bots do. On Thu, Sep 5, 2024 at 10:04 AM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > When percpu_add_op() is used with unsigned argument, it prevents kernel builds > with clang, `make W=1` and CONFIG_WERROR=y: -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-16 15:44 ` Dave Hansen 2024-10-16 17:03 ` Nick Desaulniers @ 2024-10-16 18:06 ` Andy Shevchenko 2024-10-16 18:20 ` Andy Shevchenko 2024-10-16 19:20 ` Peter Zijlstra 2 siblings, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2024-10-16 18:06 UTC (permalink / raw) To: Dave Hansen Cc: Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On Wed, Oct 16, 2024 at 08:44:56AM -0700, Dave Hansen wrote: > Andy, > > The subject here is not very informative. It explains the "what" of the > patch, but not the "why". > > A better subject might have been: > > x86/percpu: Fix clang warning when dealing with unsigned types Thanks, makes sense! > > --- a/arch/x86/include/asm/percpu.h > > +++ b/arch/x86/include/asm/percpu.h > > @@ -234,9 +234,10 @@ do { \ > > */ > > #define percpu_add_op(size, qual, var, val) \ > > do { \ > > - const int pao_ID__ = (__builtin_constant_p(val) && \ > > - ((val) == 1 || (val) == -1)) ? \ > > - (int)(val) : 0; \ > > + const int pao_ID__ = \ > > + (__builtin_constant_p(val) && \ > > + ((val) == 1 || \ > > + (val) == (typeof(val))-1)) ? (int)(val) : 0; \ > > This doesn't _look_ right. But if feels right if we really want to supply unsigned types here. Maybe some more magic is needed (like in min() case). > Let's assume 'val' is a u8. (u8)-1 is 255, right? So casting the -1 > over to a u8 actually changed its value. So the comparison that you > added would actually trigger for 255: > > (val) == (typeof(val))-1)) > > 255 == (u8)-1 > 255 == 255 > > That's not the end of the world because the pao_ID__ still ends up at > 255 and the lower if() falls into the "add" bucket, but it isn't great > for reading the macro. It seems like it basically works on accident. > Wouldn't casting 'val' over to an int be shorter, more readable, not > have that logical false match *and* line up with the cast later on in > the expression? Maybe more readable, but wouldn't it be theoretically buggy for u64? I'm talking about the case when u64 == UINT_MAX, which will be true in your case and false in mine. > const int pao_ID__ = (__builtin_constant_p(val) && > ((val) == 1 || (int)(val) == -1)) ? > > (int)(val) : 0; > > Other suggestions to make it more readable would be welcome. > > Since I'm making comments, I would have really appreciated some extra > info here like why you are hitting this and nobody else is. This is bog > standard code that everybody compiles. Is clang use _that_ unusual? Why are you asking me about this? I don't know... > Or do most clang users just ignore all the warnings? Same here. I don't know... Both Qs sounds rhetorical to me. > Or are you using a bleeding edge version of clang that spits out new warnings > that other clang users aren't seeing? AFAICT It's *not* even close to the bleeding edge. It's standard Debian supply. > Another nice thing would have been to say that this produces the exact > same code with and without the patch. Or that you had tested it in > *some* way. I have run percpu_test in both cases and also checked code with `bloat-o-meter` and `cmp -b`. Everything is the same. I even added a test case for the above mentioned situation. > It took me a couple of minutes to convince myself that your > version works and doesn't do something silly like a "dec" if you hand in > val==255. It took me much more to find the best solution that appears not everyone likes :-) P.S. And as Nick pointed out it's simple `make W=1`, what the additional information you wanna see here? Care to provide a template? -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-16 18:06 ` Andy Shevchenko @ 2024-10-16 18:20 ` Andy Shevchenko 2024-10-16 19:43 ` Dave Hansen 0 siblings, 1 reply; 16+ messages in thread From: Andy Shevchenko @ 2024-10-16 18:20 UTC (permalink / raw) To: Dave Hansen Cc: Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On Wed, Oct 16, 2024 at 09:06:13PM +0300, Andy Shevchenko wrote: > On Wed, Oct 16, 2024 at 08:44:56AM -0700, Dave Hansen wrote: ... > > This doesn't _look_ right. See below. ... > Maybe more readable, but wouldn't it be theoretically buggy for u64? > I'm talking about the case when u64 == UINT_MAX, which will be true > in your case and false in mine. > > > const int pao_ID__ = (__builtin_constant_p(val) && > > ((val) == 1 || (int)(val) == -1)) ? > > > > (int)(val) : 0; This code _is_ buggy, thanks to my new test case. [ 66.161375] pcp -1 (0xffffffffffffffff) != expected 4294967295 (0xffffffff) Hence, I'll send a v2 with the test case and updated Subject. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-16 18:20 ` Andy Shevchenko @ 2024-10-16 19:43 ` Dave Hansen 0 siblings, 0 replies; 16+ messages in thread From: Dave Hansen @ 2024-10-16 19:43 UTC (permalink / raw) To: Andy Shevchenko Cc: Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On 10/16/24 11:20, Andy Shevchenko wrote: >> Maybe more readable, but wouldn't it be theoretically buggy for u64? >> I'm talking about the case when u64 == UINT_MAX, which will be true >> in your case and false in mine. >> >>> const int pao_ID__ = (__builtin_constant_p(val) && >>> ((val) == 1 || (int)(val) == -1)) ? >>> >>> (int)(val) : 0; > This code _is_ buggy, thanks to my new test case. > > [ 66.161375] pcp -1 (0xffffffffffffffff) != expected 4294967295 (0xffffffff) Thanks for pointing that out Andy (and Peter too)! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-16 15:44 ` Dave Hansen 2024-10-16 17:03 ` Nick Desaulniers 2024-10-16 18:06 ` Andy Shevchenko @ 2024-10-16 19:20 ` Peter Zijlstra 2024-10-16 19:44 ` Dave Hansen 2 siblings, 1 reply; 16+ messages in thread From: Peter Zijlstra @ 2024-10-16 19:20 UTC (permalink / raw) To: Dave Hansen Cc: Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On Wed, Oct 16, 2024 at 08:44:56AM -0700, Dave Hansen wrote: > Andy, > > The subject here is not very informative. It explains the "what" of the > patch, but not the "why". > > A better subject might have been: > > x86/percpu: Fix clang warning when dealing with unsigned types > > > --- a/arch/x86/include/asm/percpu.h > > +++ b/arch/x86/include/asm/percpu.h > > @@ -234,9 +234,10 @@ do { \ > > */ > > #define percpu_add_op(size, qual, var, val) \ > > do { \ > > - const int pao_ID__ = (__builtin_constant_p(val) && \ > > - ((val) == 1 || (val) == -1)) ? \ > > - (int)(val) : 0; \ > > + const int pao_ID__ = \ > > + (__builtin_constant_p(val) && \ > > + ((val) == 1 || \ > > + (val) == (typeof(val))-1)) ? (int)(val) : 0; \ > > This doesn't _look_ right. > > Let's assume 'val' is a u8. (u8)-1 is 255, right? So casting the -1 > over to a u8 actually changed its value. So the comparison that you > added would actually trigger for 255: > > (val) == (typeof(val))-1)) > > 255 == (u8)-1 > 255 == 255 Which is correct, no? Add of 255 to an u8 is the same as decrement one. > That's not the end of the world because the pao_ID__ still ends up at > 255 and the lower if() falls into the "add" bucket, but it isn't great > for reading the macro. It seems like it basically works on accident. You're correct in that it does not achieve the desired result (in all cases). But this is because (int)(val) will never turn into -1 when val == 255. > Wouldn't casting 'val' over to an int be shorter, more readable, not > have that logical false match *and* line up with the cast later on in > the expression? > > const int pao_ID__ = (__builtin_constant_p(val) && > ((val) == 1 || (int)(val) == -1)) ? > > (int)(val) : 0; > > Other suggestions to make it more readable would be welcome. This is very very wrong. No u8 value when cast to int will ever equal -1. Notably (int)(u8)255 == 255. > Since I'm making comments, I would have really appreciated some extra > info here like why you are hitting this and nobody else is. This is bog > standard code that everybody compiles. Is clang use _that_ unusual? Or > do most clang users just ignore all the warnings? Or are you using a > bleeding edge version of clang that spits out new warnings that other > clang users aren't seeing? The code as is, is wrong, I don't think we'll ever end up in the dec case for 'short' unsigned types. Clang is just clever enough to realize this and issues a warning. Something like so might work: const int pao_ID__ = __builtin_constant_p(val) ? ((typeof(var))(val) == 1 ? 1 : ((typeof(var))(val) == (typeof(var))-1 ? -1 : 0 )) : 0; This should get, assuming typeof(var) is u8, a dec for both 255 and -1. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-16 19:20 ` Peter Zijlstra @ 2024-10-16 19:44 ` Dave Hansen 2024-10-17 18:18 ` Peter Zijlstra 0 siblings, 1 reply; 16+ messages in thread From: Dave Hansen @ 2024-10-16 19:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On 10/16/24 12:20, Peter Zijlstra wrote: > The code as is, is wrong, I don't think we'll ever end up in the dec > case for 'short' unsigned types. Clang is just clever enough to realize > this and issues a warning. Ahhh, that's the key to it. Thanks, Peter. > Something like so might work: > > const int pao_ID__ = __builtin_constant_p(val) ? > ((typeof(var))(val) == 1 ? 1 : > ((typeof(var))(val) == (typeof(var))-1 ? -1 : 0 )) : 0; Would anybody hate if we broke this up a bit, like: const typeof(var) _val = val; const int paoconst = __builtin_constant_p(val); const int paoinc = paoconst && ((_val) == 1); const int paodec = paoconst && ((_val) == (typeof(var))-1); and then did if (paoinc) percpu_unary_op(size, qual, "inc", var); ... Or even: #define PAOINC 1234 const int pao_ID__ = __builtin_constant_p(val) ? ((typeof(var))(val) == 1 ? PAOINC : ... if (PAOINC) percpu_unary_op(size, qual, "inc", var); Since the 1 and -1 ternary results end up just being magic numbers anyway. Otherwise that pao_ID__ expression is pretty gnarly. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-16 19:44 ` Dave Hansen @ 2024-10-17 18:18 ` Peter Zijlstra 2024-10-18 12:21 ` Andy Shevchenko 2024-10-22 19:53 ` Dave Hansen 0 siblings, 2 replies; 16+ messages in thread From: Peter Zijlstra @ 2024-10-17 18:18 UTC (permalink / raw) To: Dave Hansen Cc: Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On Wed, Oct 16, 2024 at 12:44:18PM -0700, Dave Hansen wrote: > Would anybody hate if we broke this up a bit, like: > > const typeof(var) _val = val; > const int paoconst = __builtin_constant_p(val); > const int paoinc = paoconst && ((_val) == 1); > const int paodec = paoconst && ((_val) == (typeof(var))-1); > > and then did > > if (paoinc) > percpu_unary_op(size, qual, "inc", var); > ... I think that is an overall improvement. Proceed! :-) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-17 18:18 ` Peter Zijlstra @ 2024-10-18 12:21 ` Andy Shevchenko 2024-10-22 19:53 ` Dave Hansen 1 sibling, 0 replies; 16+ messages in thread From: Andy Shevchenko @ 2024-10-18 12:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Dave Hansen, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On Thu, Oct 17, 2024 at 08:18:59PM +0200, Peter Zijlstra wrote: > On Wed, Oct 16, 2024 at 12:44:18PM -0700, Dave Hansen wrote: > > > Would anybody hate if we broke this up a bit, like: > > > > const typeof(var) _val = val; > > const int paoconst = __builtin_constant_p(val); > > const int paoinc = paoconst && ((_val) == 1); > > const int paodec = paoconst && ((_val) == (typeof(var))-1); > > > > and then did > > > > if (paoinc) > > percpu_unary_op(size, qual, "inc", var); > > ... > > I think that is an overall improvement. Proceed! :-) Wouldn't typeof(var) be a regression? The val can be wider (in term of bits) than var and cutting it like this might bring different result depending on the signedness. TL;DR: Whatever is done, please add more (corner) test cases to the percpu_test.c. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-17 18:18 ` Peter Zijlstra 2024-10-18 12:21 ` Andy Shevchenko @ 2024-10-22 19:53 ` Dave Hansen 2024-10-22 23:24 ` Christoph Lameter (Ampere) 2024-10-23 14:24 ` Andy Shevchenko 1 sibling, 2 replies; 16+ messages in thread From: Dave Hansen @ 2024-10-22 19:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt [-- Attachment #1: Type: text/plain, Size: 2283 bytes --] On 10/17/24 11:18, Peter Zijlstra wrote: > On Wed, Oct 16, 2024 at 12:44:18PM -0700, Dave Hansen wrote: > >> Would anybody hate if we broke this up a bit, like: >> >> const typeof(var) _val = val; >> const int paoconst = __builtin_constant_p(val); >> const int paoinc = paoconst && ((_val) == 1); >> const int paodec = paoconst && ((_val) == (typeof(var))-1); >> >> and then did >> >> if (paoinc) >> percpu_unary_op(size, qual, "inc", var); >> ... > I think that is an overall improvement. Proceed! 🙂 I poked at this a bit: > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=30e0899c6ab7fe1134e4b96db963f0be89b1dd5a I believe it functions fine. But it surprised me with a few things. Here's one. I assumed that doing an add((unsigned)-1) would be rare. It's not. It's actually pretty common because this: #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) ends up causing problems when 'pcp' is an unsigned type. For example, in this chain: mem_cgroup_exit -> obj_cgroup_put -> percpu_ref_put -> percpu_ref_put_many(ref, 1) -> this_cpu_sub the compiler can see the '1' constant. It effectively expands to: this_cpu_add(pcp, -(unsigned long)(1)) With the old code, gcc manages to generate a 'dec'. Clang generates an 'add'. With my hack above both compilers generate an 'add'. This actually matters in some code that seems potentially rather performance sensitive: add/remove: 0/0 grow/shrink: 219/9 up/down: 755/-141 (614) Function old new delta flush_end_io 905 1070 +165 x86_pmu_cancel_txn 242 338 +96 lru_add 554 594 +40 mlock_folio_batch 3264 3300 +36 compaction_alloc 3813 3838 +25 tcp_leave_memory_pressure 86 110 +24 account_guest_time 270 287 +17 ... So I think Peter's version was the best. It shuts up clang and also preserves the existing (good) gcc 'sub' behavior. I'll send it out for real in a bit, but I'm thinking of something like the attached patch. [-- Attachment #2: 0001-x86-percpu-Avoid-comparing-unsigned-types-to-1.patch --] [-- Type: text/x-patch, Size: 3180 bytes --] From d63bcd350e1a3ba6196dadb26cb2f36f0ba1e182 Mon Sep 17 00:00:00 2001 From: Dave Hansen <dave.hansen@linux.intel.com> Date: Fri, 18 Oct 2024 11:07:47 -0700 Subject: [PATCH] x86/percpu: Avoid comparing unsigned types to -1 clang warns when comparing an unsinged type to -1 since the comparison is always false. This can be quickly reproduced by setting CONFIG_WERROR=y and running: make W=1 CC=clang-14 net/ipv4/tcp_output.o net/ipv4/tcp_output.c:187:3: error: result of comparison of constant -1 with expression of type 'u8' (aka 'unsigned char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 187 | NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 | tp->compressed_ack); | ~~~~~~~~~~~~~~~~~~~ ... arch/x86/include/asm/percpu.h:238:31: note: expanded from macro 'percpu_add_op' 238 | ((val) == 1 || (val) == -1)) ? \ | ~~~~~ ^ ~~ Fix this by avoiding a comparison of an uncast -1 to 'val'. Doing this in addition to the existing 'pao_ID__' calculation would make it even more unreadable. Remove 'pao_ID__' and replace it with the three components of its calculation. This preserves some unintuitive but useful behavior. For instance, gcc sees: percpu_add_op(..., var, (u8)-1); and can transform that into a "dec". Clang, on the other hand, sees the 'u8' type and assumes that "(val) == -1" is false, which was the root of the warning. This is useful gcc behavior because: #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) so any code that does: this_cpu_sub(A, 1) where 'A' is an unsigned type generates a "dec". Clang, on the other hand generates a less-efficient "add". Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> --- arch/x86/include/asm/percpu.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index c55a79d5feae..57d9759c692e 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -234,18 +234,19 @@ do { \ */ #define percpu_add_op(size, qual, var, val) \ do { \ - const int pao_ID__ = (__builtin_constant_p(val) && \ - ((val) == 1 || (val) == -1)) ? \ - (int)(val) : 0; \ + const int pao_const__ = __builtin_constant_p(val); \ + const int pao_inc__ = (val) == 1; \ + const int pao_dec__ = (typeof(var))(val) == \ + (typeof(var))-1; \ \ if (0) { \ typeof(var) pao_tmp__; \ pao_tmp__ = (val); \ (void)pao_tmp__; \ } \ - if (pao_ID__ == 1) \ + if (pao_const__ && pao_inc__) \ percpu_unary_op(size, qual, "inc", var); \ - else if (pao_ID__ == -1) \ + else if (pao_const__ && pao_dec__) \ percpu_unary_op(size, qual, "dec", var); \ else \ percpu_binary_op(size, qual, "add", var, val); \ -- 2.34.1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-22 19:53 ` Dave Hansen @ 2024-10-22 23:24 ` Christoph Lameter (Ampere) 2024-10-23 17:15 ` Dave Hansen 2024-10-23 14:24 ` Andy Shevchenko 1 sibling, 1 reply; 16+ messages in thread From: Christoph Lameter (Ampere) @ 2024-10-22 23:24 UTC (permalink / raw) To: Dave Hansen Cc: Peter Zijlstra, Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt [-- Attachment #1: Type: text/plain, Size: 327 bytes --] On Tue, 22 Oct 2024, Dave Hansen wrote: > So I think Peter's version was the best. It shuts up clang and also > preserves the existing (good) gcc 'sub' behavior. I'll send it out for > real in a bit, but I'm thinking of something like the attached patch. The desired behavior is a "dec". "sub" has a longer op code AFAICT. [-- Attachment #2: Type: text/x-patch, Size: 3180 bytes --] From d63bcd350e1a3ba6196dadb26cb2f36f0ba1e182 Mon Sep 17 00:00:00 2001 From: Dave Hansen <dave.hansen@linux.intel.com> Date: Fri, 18 Oct 2024 11:07:47 -0700 Subject: [PATCH] x86/percpu: Avoid comparing unsigned types to -1 clang warns when comparing an unsinged type to -1 since the comparison is always false. This can be quickly reproduced by setting CONFIG_WERROR=y and running: make W=1 CC=clang-14 net/ipv4/tcp_output.o net/ipv4/tcp_output.c:187:3: error: result of comparison of constant -1 with expression of type 'u8' (aka 'unsigned char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 187 | NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 | tp->compressed_ack); | ~~~~~~~~~~~~~~~~~~~ ... arch/x86/include/asm/percpu.h:238:31: note: expanded from macro 'percpu_add_op' 238 | ((val) == 1 || (val) == -1)) ? \ | ~~~~~ ^ ~~ Fix this by avoiding a comparison of an uncast -1 to 'val'. Doing this in addition to the existing 'pao_ID__' calculation would make it even more unreadable. Remove 'pao_ID__' and replace it with the three components of its calculation. This preserves some unintuitive but useful behavior. For instance, gcc sees: percpu_add_op(..., var, (u8)-1); and can transform that into a "dec". Clang, on the other hand, sees the 'u8' type and assumes that "(val) == -1" is false, which was the root of the warning. This is useful gcc behavior because: #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) so any code that does: this_cpu_sub(A, 1) where 'A' is an unsigned type generates a "dec". Clang, on the other hand generates a less-efficient "add". Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> --- arch/x86/include/asm/percpu.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index c55a79d5feae..57d9759c692e 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -234,18 +234,19 @@ do { \ */ #define percpu_add_op(size, qual, var, val) \ do { \ - const int pao_ID__ = (__builtin_constant_p(val) && \ - ((val) == 1 || (val) == -1)) ? \ - (int)(val) : 0; \ + const int pao_const__ = __builtin_constant_p(val); \ + const int pao_inc__ = (val) == 1; \ + const int pao_dec__ = (typeof(var))(val) == \ + (typeof(var))-1; \ \ if (0) { \ typeof(var) pao_tmp__; \ pao_tmp__ = (val); \ (void)pao_tmp__; \ } \ - if (pao_ID__ == 1) \ + if (pao_const__ && pao_inc__) \ percpu_unary_op(size, qual, "inc", var); \ - else if (pao_ID__ == -1) \ + else if (pao_const__ && pao_dec__) \ percpu_unary_op(size, qual, "dec", var); \ else \ percpu_binary_op(size, qual, "add", var, val); \ -- 2.34.1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-22 23:24 ` Christoph Lameter (Ampere) @ 2024-10-23 17:15 ` Dave Hansen 2024-10-23 21:40 ` H. Peter Anvin 0 siblings, 1 reply; 16+ messages in thread From: Dave Hansen @ 2024-10-23 17:15 UTC (permalink / raw) To: Christoph Lameter (Ampere) Cc: Peter Zijlstra, Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On 10/22/24 16:24, Christoph Lameter (Ampere) wrote: > On Tue, 22 Oct 2024, Dave Hansen wrote: > >> So I think Peter's version was the best. It shuts up clang and also >> preserves the existing (good) gcc 'sub' behavior. I'll send it out for >> real in a bit, but I'm thinking of something like the attached patch. > The desired behavior is a "dec". "sub" has a longer op code AFAICT. Gah, yes, of course. I misspoke. We want "inc" and "dec" for +1 and -1. "add" and "sub" are heftier and get used for everything else. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-23 17:15 ` Dave Hansen @ 2024-10-23 21:40 ` H. Peter Anvin 0 siblings, 0 replies; 16+ messages in thread From: H. Peter Anvin @ 2024-10-23 21:40 UTC (permalink / raw) To: Dave Hansen, Christoph Lameter (Ampere) Cc: Peter Zijlstra, Andy Shevchenko, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On 10/23/24 10:15, Dave Hansen wrote: > On 10/22/24 16:24, Christoph Lameter (Ampere) wrote: >> On Tue, 22 Oct 2024, Dave Hansen wrote: >> >>> So I think Peter's version was the best. It shuts up clang and also >>> preserves the existing (good) gcc 'sub' behavior. I'll send it out for >>> real in a bit, but I'm thinking of something like the attached patch. >> The desired behavior is a "dec". "sub" has a longer op code AFAICT. > > Gah, yes, of course. I misspoke. > > We want "inc" and "dec" for +1 and -1. "add" and "sub" are heftier and > get used for everything else. Do we really? I don't know if there are any microarchitectures where the partial register update still matters. It is only one byte difference. -hpa ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() 2024-10-22 19:53 ` Dave Hansen 2024-10-22 23:24 ` Christoph Lameter (Ampere) @ 2024-10-23 14:24 ` Andy Shevchenko 1 sibling, 0 replies; 16+ messages in thread From: Andy Shevchenko @ 2024-10-23 14:24 UTC (permalink / raw) To: Dave Hansen Cc: Peter Zijlstra, Ingo Molnar, Uros Bizjak, linux-mm, linux-kernel, llvm, Dennis Zhou, Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt On Tue, Oct 22, 2024 at 12:53:01PM -0700, Dave Hansen wrote: > On 10/17/24 11:18, Peter Zijlstra wrote: > > On Wed, Oct 16, 2024 at 12:44:18PM -0700, Dave Hansen wrote: ... > >> Would anybody hate if we broke this up a bit, like: > >> > >> const typeof(var) _val = val; > >> const int paoconst = __builtin_constant_p(val); > >> const int paoinc = paoconst && ((_val) == 1); > >> const int paodec = paoconst && ((_val) == (typeof(var))-1); > >> > >> and then did > >> > >> if (paoinc) > >> percpu_unary_op(size, qual, "inc", var); > >> ... > > I think that is an overall improvement. Proceed! 🙂 > > I poked at this a bit: Thanks for looking into this! > > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/commit/?h=testme&id=30e0899c6ab7fe1134e4b96db963f0be89b1dd5a > > I believe it functions fine. But it surprised me with a few things. > Here's one. I assumed that doing an add((unsigned)-1) would be rare. > It's not. It's actually pretty common because this: > > #define this_cpu_sub(pcp, val) this_cpu_add(pcp, -(typeof(pcp))(val)) > > ends up causing problems when 'pcp' is an unsigned type. For example, > in this chain: > > mem_cgroup_exit -> > obj_cgroup_put -> > percpu_ref_put -> > percpu_ref_put_many(ref, 1) -> > this_cpu_sub > > the compiler can see the '1' constant. It effectively expands to: > > this_cpu_add(pcp, -(unsigned long)(1)) > > With the old code, gcc manages to generate a 'dec'. Clang generates an > 'add'. With my hack above both compilers generate an 'add'. This > actually matters in some code that seems potentially rather performance > sensitive: > > add/remove: 0/0 grow/shrink: 219/9 up/down: 755/-141 (614) > Function old new delta > flush_end_io 905 1070 +165 > x86_pmu_cancel_txn 242 338 +96 > lru_add 554 594 +40 > mlock_folio_batch 3264 3300 +36 > compaction_alloc 3813 3838 +25 > tcp_leave_memory_pressure 86 110 +24 > account_guest_time 270 287 +17 > ... > > So I think Peter's version was the best. It shuts up clang and also > preserves the existing (good) gcc 'sub' behavior. I'll send it out for > real in a bit, but I'm thinking of something like the attached patch. I am fine as long as you keep the (added) test cases and maybe even extend them. I dunno how you will go with the fact that Andrew applied my version already. ... > This can be quickly reproduced by setting CONFIG_WERROR=y and running: > > make W=1 CC=clang-14 net/ipv4/tcp_output.o Hint: You can use LLVM=-14 instead of CC=clang-14. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-10-23 21:41 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-09-05 17:03 [PATCH v1 1/1] x86/percpu: Cast -1 to argument type when comparing in percpu_add_op() Andy Shevchenko 2024-10-16 13:37 ` Andy Shevchenko 2024-10-16 15:44 ` Dave Hansen 2024-10-16 17:03 ` Nick Desaulniers 2024-10-16 18:06 ` Andy Shevchenko 2024-10-16 18:20 ` Andy Shevchenko 2024-10-16 19:43 ` Dave Hansen 2024-10-16 19:20 ` Peter Zijlstra 2024-10-16 19:44 ` Dave Hansen 2024-10-17 18:18 ` Peter Zijlstra 2024-10-18 12:21 ` Andy Shevchenko 2024-10-22 19:53 ` Dave Hansen 2024-10-22 23:24 ` Christoph Lameter (Ampere) 2024-10-23 17:15 ` Dave Hansen 2024-10-23 21:40 ` H. Peter Anvin 2024-10-23 14:24 ` Andy Shevchenko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox