* Re: Build performance regressions originating from min()/max() macros
2024-07-24 8:14 ` Jürgen Groß
@ 2024-07-24 8:31 ` Lorenzo Stoakes
2024-07-24 9:40 ` Jürgen Groß
2024-07-24 8:34 ` David Laight
2024-07-24 8:50 ` Arnd Bergmann
2 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-07-24 8:31 UTC (permalink / raw)
To: Jürgen Groß
Cc: Andrew Morton, david.laight, Arnd Bergmann, willy, torvalds,
Jason, hch, andriy.shevchenko, pedro.falcato, Mateusz Guzik,
linux-mm, linux-kernel
On Wed, Jul 24, 2024 at 10:14:12AM GMT, Jürgen Groß wrote:
> On 23.07.24 23:59, Lorenzo Stoakes wrote:
> > Arnd reported a significant build slowdown [0], which was bisected to the
> > series spanning commit 80fcac55385c ("minmax: relax check to allow
> > comparison between unsigned arguments and signed constants") to commit
> > 867046cc70277 ("minmax: relax check to allow comparison between unsigned
> > arguments and signed constants"), originating from the series "minmax:
> > Relax type checks in min() and max()." [1].
[snip]
> I can send a patch to simplify the problematic construct, but OTOH this
> will avoid only one particularly bad example.
Thanks, appreciated but I am a little concerned that we might get stuck in
whack-a-mole here a bit. I'm pretty sure we've had previous patches that
have addressed invocation points, but obviously the underlying issue are
these macros which will keep cropping up again and again.
>
>
> Juergen
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Build performance regressions originating from min()/max() macros
2024-07-24 8:31 ` Lorenzo Stoakes
@ 2024-07-24 9:40 ` Jürgen Groß
2024-07-24 9:43 ` Lorenzo Stoakes
2024-07-24 14:47 ` David Laight
0 siblings, 2 replies; 10+ messages in thread
From: Jürgen Groß @ 2024-07-24 9:40 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, david.laight, Arnd Bergmann, willy, torvalds,
Jason, hch, andriy.shevchenko, pedro.falcato, Mateusz Guzik,
linux-mm, linux-kernel
On 24.07.24 10:31, Lorenzo Stoakes wrote:
> On Wed, Jul 24, 2024 at 10:14:12AM GMT, Jürgen Groß wrote:
>> On 23.07.24 23:59, Lorenzo Stoakes wrote:
>>> Arnd reported a significant build slowdown [0], which was bisected to the
>>> series spanning commit 80fcac55385c ("minmax: relax check to allow
>>> comparison between unsigned arguments and signed constants") to commit
>>> 867046cc70277 ("minmax: relax check to allow comparison between unsigned
>>> arguments and signed constants"), originating from the series "minmax:
>>> Relax type checks in min() and max()." [1].
>
> [snip]
>
>> I can send a patch to simplify the problematic construct, but OTOH this
>> will avoid only one particularly bad example.
>
> Thanks, appreciated but I am a little concerned that we might get stuck in
> whack-a-mole here a bit. I'm pretty sure we've had previous patches that
> have addressed invocation points, but obviously the underlying issue are
> these macros which will keep cropping up again and again.
The xen example seems to be one of the worst due to nesting of min3() and
min(), so being de facto a min4().
I think drivers/firmware/sysfb_simplefb.c has a similar problem, as it is
nesting max() with max3(). Same applies to arch/x86/kernel/cpu/cacheinfo.c
and multiple times to fs/xfs/libxfs/xfs_trans_resv.c.
There are probably more such extreme cases.
Juergen
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Build performance regressions originating from min()/max() macros
2024-07-24 9:40 ` Jürgen Groß
@ 2024-07-24 9:43 ` Lorenzo Stoakes
2024-07-24 14:47 ` David Laight
1 sibling, 0 replies; 10+ messages in thread
From: Lorenzo Stoakes @ 2024-07-24 9:43 UTC (permalink / raw)
To: Jürgen Groß
Cc: Andrew Morton, david.laight, Arnd Bergmann, willy, torvalds,
Jason, hch, andriy.shevchenko, pedro.falcato, Mateusz Guzik,
linux-mm, linux-kernel
On Wed, Jul 24, 2024 at 11:40:07AM GMT, Jürgen Groß wrote:
> On 24.07.24 10:31, Lorenzo Stoakes wrote:
> > On Wed, Jul 24, 2024 at 10:14:12AM GMT, Jürgen Groß wrote:
> > > On 23.07.24 23:59, Lorenzo Stoakes wrote:
> > > > Arnd reported a significant build slowdown [0], which was bisected to the
> > > > series spanning commit 80fcac55385c ("minmax: relax check to allow
> > > > comparison between unsigned arguments and signed constants") to commit
> > > > 867046cc70277 ("minmax: relax check to allow comparison between unsigned
> > > > arguments and signed constants"), originating from the series "minmax:
> > > > Relax type checks in min() and max()." [1].
> >
> > [snip]
> >
> > > I can send a patch to simplify the problematic construct, but OTOH this
> > > will avoid only one particularly bad example.
> >
> > Thanks, appreciated but I am a little concerned that we might get stuck in
> > whack-a-mole here a bit. I'm pretty sure we've had previous patches that
> > have addressed invocation points, but obviously the underlying issue are
> > these macros which will keep cropping up again and again.
>
> The xen example seems to be one of the worst due to nesting of min3() and
> min(), so being de facto a min4().
>
> I think drivers/firmware/sysfb_simplefb.c has a similar problem, as it is
> nesting max() with max3(). Same applies to arch/x86/kernel/cpu/cacheinfo.c
> and multiple times to fs/xfs/libxfs/xfs_trans_resv.c.
>
> There are probably more such extreme cases.
>
Yeah to be clear, I am not opposed to these patches, I just don't want us to
lose sight of the need to fix the underlying problem if possible.
It feels like we are leaving the worst kind of landmine - a construct that you
simply wouldn't expect to cause massive build perf degradation - for others to
step on.
I suspect there are probably a few specific O(n^3) cases (as David pointed out)
that account for most of the problem and a bunch of other less problematic ones
that hit perhaps O(n^2) cases that add up.
>
> Juergen
^ permalink raw reply [flat|nested] 10+ messages in thread* RE: Build performance regressions originating from min()/max() macros
2024-07-24 9:40 ` Jürgen Groß
2024-07-24 9:43 ` Lorenzo Stoakes
@ 2024-07-24 14:47 ` David Laight
1 sibling, 0 replies; 10+ messages in thread
From: David Laight @ 2024-07-24 14:47 UTC (permalink / raw)
To: 'Jürgen Groß', Lorenzo Stoakes
Cc: Andrew Morton, Arnd Bergmann, willy, torvalds, Jason, hch,
andriy.shevchenko, pedro.falcato, Mateusz Guzik, linux-mm,
linux-kernel
From: Jürgen Groß
> Sent: 24 July 2024 10:40
>
> On 24.07.24 10:31, Lorenzo Stoakes wrote:
> > On Wed, Jul 24, 2024 at 10:14:12AM GMT, Jürgen Groß wrote:
> >> On 23.07.24 23:59, Lorenzo Stoakes wrote:
> >>> Arnd reported a significant build slowdown [0], which was bisected to the
> >>> series spanning commit 80fcac55385c ("minmax: relax check to allow
> >>> comparison between unsigned arguments and signed constants") to commit
> >>> 867046cc70277 ("minmax: relax check to allow comparison between unsigned
> >>> arguments and signed constants"), originating from the series "minmax:
> >>> Relax type checks in min() and max()." [1].
> >
> > [snip]
> >
> >> I can send a patch to simplify the problematic construct, but OTOH this
> >> will avoid only one particularly bad example.
> >
> > Thanks, appreciated but I am a little concerned that we might get stuck in
> > whack-a-mole here a bit. I'm pretty sure we've had previous patches that
> > have addressed invocation points, but obviously the underlying issue are
> > these macros which will keep cropping up again and again.
>
> The xen example seems to be one of the worst due to nesting of min3() and
> min(), so being de facto a min4().
>
> I think drivers/firmware/sysfb_simplefb.c has a similar problem, as it is
> nesting max() with max3(). Same applies to arch/x86/kernel/cpu/cacheinfo.c
> and multiple times to fs/xfs/libxfs/xfs_trans_resv.c.
>
> There are probably more such extreme cases.
I've just sent in a 7-part patch series that slightly reduces the complexity
and directly implements min3() and max3().
The latter should help.
This is based on part of a series I send months ago that will have 'got lost'.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Build performance regressions originating from min()/max() macros
2024-07-24 8:14 ` Jürgen Groß
2024-07-24 8:31 ` Lorenzo Stoakes
@ 2024-07-24 8:34 ` David Laight
2024-07-24 8:50 ` Arnd Bergmann
2 siblings, 0 replies; 10+ messages in thread
From: David Laight @ 2024-07-24 8:34 UTC (permalink / raw)
To: 'Jürgen Groß', Lorenzo Stoakes, Andrew Morton
Cc: Arnd Bergmann, willy, torvalds, Jason, hch, andriy.shevchenko,
pedro.falcato, Mateusz Guzik, linux-mm, linux-kernel
From: Jürgen Groß
> Sent: 24 July 2024 09:14
>
> On 23.07.24 23:59, Lorenzo Stoakes wrote:
> > Arnd reported a significant build slowdown [0], which was bisected to the
> > series spanning commit 80fcac55385c ("minmax: relax check to allow
> > comparison between unsigned arguments and signed constants") to commit
> > 867046cc70277 ("minmax: relax check to allow comparison between unsigned
> > arguments and signed constants"), originating from the series "minmax:
> > Relax type checks in min() and max()." [1].
> >
> > I have reproduced this locally, reverting this series and manually fixing
> > up all call sites that invoke min()/max() for a simple x86-64 defconfig (+
> > some other debug flags I use for debug kernels, I can provide the .config
> > if needed).
> >
> > Arnd noted that the arch/x86/xen/setup.c file was particularly problematic,
> > taking 15 (!) seconds to pre-process on his machine, so I also enabled
> > CONFIG_XEN to test this and obtained performance numbers with this set/not
> > set.
> >
> > I was able to reproduce this very significant pre-processor time on this
> > file, noting that with the series reverted compile time for the file is
> > 0.79s, with it in place, it takes 6.90s for a 873.4% slowdown.
...
> > Which suggests a worryingly significant slowdown of ~45s with CONFIG_XEN
> > enabled and ~35s even without it.
> >
> > The underlying problems seems to be very large macro expansions, which Arnd
> > noted in the xen case originated from the line:
> >
> > extra_pages = min3(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
> > extra_pages, max_pages - max_pfn);
Jeepers, that is definitely going to be big - it never was small.
The expansion of min() itself isn't that bad.
But both arguments get expanded a few times (and a few more than the old code).
So a nested min/max causes an O(n^2) expansion - livable.
But min3(a,b,c) is just min(min(a,b),c) - so a nested call.
(It seems to have been added to avoid the cost of the nested call, but
just implemented a nested call anyway.)
Here one of the arguments to min3() is a min() - and I suspect it goes
into the inner min() so O(n^3) - which is bad news.
> >
> > And resulted in the generation of 47 MB (!) of pre-processor output.
> >
> > It seems a lot of code now relies on the relaxed conditions of the newly
> > changed min/max() macros, so the question is - what can we do to address
> > these regressions?
>
> I can send a patch to simplify the problematic construct, but OTOH this
> will avoid only one particularly bad example.
I suspect that reversing the order of the args to min3() will help this case.
There is an updated patch set I sent in January that reduces the
expansion a bit.
I've a reworked version that removes the last few patches (removing
support for constant output and using MIN() for that instead).
I'll repost it and maybe Arnd will pick it up?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Build performance regressions originating from min()/max() macros
2024-07-24 8:14 ` Jürgen Groß
2024-07-24 8:31 ` Lorenzo Stoakes
2024-07-24 8:34 ` David Laight
@ 2024-07-24 8:50 ` Arnd Bergmann
2 siblings, 0 replies; 10+ messages in thread
From: Arnd Bergmann @ 2024-07-24 8:50 UTC (permalink / raw)
To: Juergen Gross, Lorenzo Stoakes, Andrew Morton, David Laight
Cc: Matthew Wilcox, Linus Torvalds, Jason A . Donenfeld,
Christoph Hellwig, Andy Shevchenko, pedro.falcato, Mateusz Guzik,
linux-mm, linux-kernel
On Wed, Jul 24, 2024, at 10:14, Jürgen Groß wrote:
> On 23.07.24 23:59, Lorenzo Stoakes wrote:
>>
>> And resulted in the generation of 47 MB (!) of pre-processor output.
>>
>> It seems a lot of code now relies on the relaxed conditions of the newly
>> changed min/max() macros, so the question is - what can we do to address
>> these regressions?
>
> I can send a patch to simplify the problematic construct, but OTOH this
> will avoid only one particularly bad example.
It's probably a good idea do change the xen/setup.c file anyway,
as I haven't found any other file that had a regression this bad,
and it only needs a single temporary variable for a 1000x speedup.
For the overall kernel, I see at best a 2.3% speedup (20 second
CPU time) by replacing the current min()/max() macros with a version
that drops both the constant expression output feature and the
assertion, measuring an x86 defconfig build, which has xen
disabled. On a defconfig+xen kernel, that difference increases
to 4.4% or 37 seconds.
Removing only the constexpr side requires a handful of fixups
for x86 allmodconfig to replace min()/max() with something else in
drivers/edac/sb_edac.c
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
drivers/gpu/drm/drm_color_mgmt.c
drivers/input/touchscreen/cyttsp4_core.c
drivers/md/dm-integrity.c
drivers/net/can/usb/etas_es58x/es58x_devlink.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
fs/btrfs/tree-checker.c
lib/vsprintf.c
net/ipv4/proc.c
net/ipv6/proc.c
This gives about half the speed difference, the other
half comes from removing the assertion, but that is not
a good idea unless we can replace it with an equivalent
assertion that works on the unique_x/unique_y variables
instead of expanding the arguments.
Arnd
^ permalink raw reply [flat|nested] 10+ messages in thread