* [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
@ 2016-07-22 10:41 David Howells
2016-07-22 15:52 ` Christian Borntraeger
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: David Howells @ 2016-07-22 10:41 UTC (permalink / raw)
To: ksummit-discuss
Are there additional things we can get the compiler to do for us? Some
things I've seen brought up:
(1) Additional __atomic_*() ops could be useful. Suggestions I've heard
include direct LL/SC support - though the compiler people don't seem so
keen on that.
(2) -mmodel=kernel flag so that the compiler can optimise better for the
kernel memory model.
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 10:41 [Ksummit-discuss] [TECH TOPIC] Compiler shopping list David Howells
@ 2016-07-22 15:52 ` Christian Borntraeger
2016-07-22 15:59 ` David Woodhouse
2016-07-25 18:49 ` Luis R. Rodriguez
2016-07-29 1:32 ` Steven Rostedt
2 siblings, 1 reply; 11+ messages in thread
From: Christian Borntraeger @ 2016-07-22 15:52 UTC (permalink / raw)
To: David Howells, ksummit-discuss
On 07/22/2016 12:41 PM, David Howells wrote:
> Are there additional things we can get the compiler to do for us? Some
> things I've seen brought up:
>
> (1) Additional __atomic_*() ops could be useful. Suggestions I've heard
> include direct LL/SC support - though the compiler people don't seem so
> keen on that.
>
> (2) -mmodel=kernel flag so that the compiler can optimise better for the
> kernel memory model.
Some years ago (actually many) Linus proposed to have an endianess attribute to data
types, so that the compiler can do the bswap automatically. For some reason this
was never implemented, but this might be a good idea anyway.
e.g.
unsigned long x[10] __attribute__(("bigendian"));
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 15:52 ` Christian Borntraeger
@ 2016-07-22 15:59 ` David Woodhouse
2016-07-22 17:05 ` Christian Borntraeger
2016-07-23 20:35 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 11+ messages in thread
From: David Woodhouse @ 2016-07-22 15:59 UTC (permalink / raw)
To: Christian Borntraeger, David Howells, ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]
On Fri, 2016-07-22 at 17:52 +0200, Christian Borntraeger wrote:
> On 07/22/2016 12:41 PM, David Howells wrote:
> > Are there additional things we can get the compiler to do for us? Some
> > things I've seen brought up:
> >
> > (1) Additional __atomic_*() ops could be useful. Suggestions I've heard
> > include direct LL/SC support - though the compiler people don't seem so
> > keen on that.
> >
> > (2) -mmodel=kernel flag so that the compiler can optimise better for the
> > kernel memory model.
>
> Some years ago (actually many) Linus proposed to have an endianess attribute to data
> types, so that the compiler can do the bswap automatically. For some reason this
> was never implemented, but this might be a good idea anyway.
>
> e.g.
>
> unsigned long x[10] __attribute__(("bigendian"));
I'm not sure Linus proposed that. I certainly did, many times.
With the work I put in to make use of __builtin_bswapXX() we do have a
*certain* amount of the functionality that full endianness attribution
would give us — the compiler can see and optimise certain
load/mask/save operations, and can use movbe and equivalent
instructions.
But a full implementation that let us just do assignment without
jumping through the hoops might still be nice.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 15:59 ` David Woodhouse
@ 2016-07-22 17:05 ` Christian Borntraeger
2016-07-22 17:17 ` James Bottomley
2016-07-29 1:16 ` Steven Rostedt
2016-07-23 20:35 ` Benjamin Herrenschmidt
1 sibling, 2 replies; 11+ messages in thread
From: Christian Borntraeger @ 2016-07-22 17:05 UTC (permalink / raw)
To: David Woodhouse, David Howells, ksummit-discuss
On 07/22/2016 05:59 PM, David Woodhouse wrote:
> On Fri, 2016-07-22 at 17:52 +0200, Christian Borntraeger wrote:
>> On 07/22/2016 12:41 PM, David Howells wrote:
>>> Are there additional things we can get the compiler to do for us? Some
>>> things I've seen brought up:
>>>
>>> (1) Additional __atomic_*() ops could be useful. Suggestions I've heard
>>> include direct LL/SC support - though the compiler people don't seem so
>>> keen on that.
>>>
>>> (2) -mmodel=kernel flag so that the compiler can optimise better for the
>>> kernel memory model.
>>
>> Some years ago (actually many) Linus proposed to have an endianess attribute to data
>> types, so that the compiler can do the bswap automatically. For some reason this
>> was never implemented, but this might be a good idea anyway.
>>
>> e.g.
>>
>> unsigned long x[10] __attribute__(("bigendian"));
>
> I'm not sure Linus proposed that. I certainly did, many times.
Yes, I know at least 3 people suggesting that and thinking this is useful
( Can you beat Linus' 2001 https://gcc.gnu.org/ml/gcc/2001-12/msg00932.html? ;-) )
>
> With the work I put in to make use of __builtin_bswapXX() we do have a
> *certain* amount of the functionality that full endianness attribution
> would give us — the compiler can see and optimise certain
> load/mask/save operations, and can use movbe and equivalent
> instructions.
>
> But a full implementation that let us just do assignment without
> jumping through the hoops might still be nice.
Absolutely.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 17:05 ` Christian Borntraeger
@ 2016-07-22 17:17 ` James Bottomley
2016-07-22 17:33 ` David Woodhouse
2016-07-29 1:16 ` Steven Rostedt
1 sibling, 1 reply; 11+ messages in thread
From: James Bottomley @ 2016-07-22 17:17 UTC (permalink / raw)
To: Christian Borntraeger, David Woodhouse, David Howells, ksummit-discuss
On Fri, 2016-07-22 at 19:05 +0200, Christian Borntraeger wrote:
> On 07/22/2016 05:59 PM, David Woodhouse wrote:
> > On Fri, 2016-07-22 at 17:52 +0200, Christian Borntraeger wrote:
> > > On 07/22/2016 12:41 PM, David Howells wrote:
> > > > Are there additional things we can get the compiler to do for
> > > > us? Some things I've seen brought up:
> > > >
> > > > (1) Additional __atomic_*() ops could be useful. Suggestions
> > > > I've heard include direct LL/SC support - though the
> > > > compiler people don't seem so keen on that.
> > > >
> > > > (2) -mmodel=kernel flag so that the compiler can optimise
> > > > better for the kernel memory model.
> > >
> > > Some years ago (actually many) Linus proposed to have an
> > > endianess attribute to data types, so that the compiler can do
> > > the bswap automatically. For some reason this was never
> > > implemented, but this might be a good idea anyway.
> > >
> > > e.g.
> > >
> > > unsigned long x[10] __attribute__(("bigendian"));
> >
> > I'm not sure Linus proposed that. I certainly did, many times.
>
> Yes, I know at least 3 people suggesting that and thinking this is
> useful ( Can you beat Linus' 2001
> https://gcc.gnu.org/ml/gcc/2001-12/msg00932.html? ;-) )
The fact that it's been discussed on and off for 15 years tends to
suggest that where we've ended up is about good enough for everyday and
no-one can really be bothered to take the extra effort.
So what's the overriding reason we should spend the extra effort now?
James
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 17:17 ` James Bottomley
@ 2016-07-22 17:33 ` David Woodhouse
0 siblings, 0 replies; 11+ messages in thread
From: David Woodhouse @ 2016-07-22 17:33 UTC (permalink / raw)
To: James Bottomley, Christian Borntraeger, David Howells, ksummit-discuss
[-- Attachment #1: Type: text/plain, Size: 2329 bytes --]
On Fri, 2016-07-22 at 10:17 -0700, James Bottomley wrote:
> On Fri, 2016-07-22 at 19:05 +0200, Christian Borntraeger wrote:
> > On 07/22/2016 05:59 PM, David Woodhouse wrote:
> > > On Fri, 2016-07-22 at 17:52 +0200, Christian Borntraeger wrote:
> > > > On 07/22/2016 12:41 PM, David Howells wrote:
> > > > > Are there additional things we can get the compiler to do for
> > > > > us? Some things I've seen brought up:
> > > > >
> > > > > (1) Additional __atomic_*() ops could be useful. Suggestions
> > > > > I've heard include direct LL/SC support - though the
> > > > > compiler people don't seem so keen on that.
> > > > >
> > > > > (2) -mmodel=kernel flag so that the compiler can optimise
> > > > > better for the kernel memory model.
> > > >
> > > > Some years ago (actually many) Linus proposed to have an
> > > > endianess attribute to data types, so that the compiler can do
> > > > the bswap automatically. For some reason this was never
> > > > implemented, but this might be a good idea anyway.
> > > >
> > > > e.g.
> > > >
> > > > unsigned long x[10] __attribute__(("bigendian"));
> > >
> > > I'm not sure Linus proposed that. I certainly did, many times.
> >
> > Yes, I know at least 3 people suggesting that and thinking this is
> > useful ( Can you beat Linus' 2001
> > https://gcc.gnu.org/ml/gcc/2001-12/msg00932.html? ;-) )
>
> The fact that it's been discussed on and off for 15 years tends to
> suggest that where we've ended up is about good enough for everyday and
> no-one can really be bothered to take the extra effort.
>
> So what's the overriding reason we should spend the extra effort now?
I don't think there's much more to be gained than what I've already
done. We *have* the compiler visibility into endianness changes, so it
can optimise them correctly.
What we *don't* have is the ease of programming, where we can happily
*forget* that certain storage is of a specific endianness, do a direct
assignment and let the compiler sort it out for us. But that's not
actually much use until we can depend on *everyone* having a compiler
that supports that. And we have the sparse attributes to catch mistakes
where we *forget* the explicit handling anyway, so I don't think
there's really much more to be gained.
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 15:59 ` David Woodhouse
2016-07-22 17:05 ` Christian Borntraeger
@ 2016-07-23 20:35 ` Benjamin Herrenschmidt
2016-07-23 23:09 ` Alexei Starovoitov
1 sibling, 1 reply; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-23 20:35 UTC (permalink / raw)
To: David Woodhouse, Christian Borntraeger, David Howells, ksummit-discuss
On Fri, 2016-07-22 at 16:59 +0100, David Woodhouse wrote:
> I'm not sure Linus proposed that. I certainly did, many times.
>
> With the work I put in to make use of __builtin_bswapXX() we do have
> a
> *certain* amount of the functionality that full endianness
> attribution
> would give us — the compiler can see and optimise certain
> load/mask/save operations, and can use movbe and equivalent
> instructions.
>
> But a full implementation that let us just do assignment without
> jumping through the hoops might still be nice.
One advantage of that is it might allow to work around a limitation
with the current __biultin_bswap* and READ_ONCE/ACCESS_ONCE (such
as used in gup).
The ACCESS_ONCE magic pretty much forces the compiler to separate
the load from the swap, it thus prevents us from using the byteswapped-
load instructions that we have on powerpc, thus degrading to a load
followed by the 5 or 6 instructions (with back-to-back dependencies)
needed to do the swap.
This caused a measurable performance hit on microbenchmarks when
we forced our page tables big endian on a little endian kernel (in
order to accomodate for POWER9 new radix MMU).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-23 20:35 ` Benjamin Herrenschmidt
@ 2016-07-23 23:09 ` Alexei Starovoitov
0 siblings, 0 replies; 11+ messages in thread
From: Alexei Starovoitov @ 2016-07-23 23:09 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: ksummit-discuss
On Sun, Jul 24, 2016 at 06:35:49AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2016-07-22 at 16:59 +0100, David Woodhouse wrote:
> > I'm not sure Linus proposed that. I certainly did, many times.
> >
> > With the work I put in to make use of __builtin_bswapXX() we do have
> > a
> > *certain* amount of the functionality that full endianness
> > attribution
> > would give us — the compiler can see and optimise certain
> > load/mask/save operations, and can use movbe and equivalent
> > instructions.
both llvm and gcc already optimize load + builtin_bswap into movbe on x64.
> > But a full implementation that let us just do assignment without
> > jumping through the hoops might still be nice.
>
> One advantage of that is it might allow to work around a limitation
> with the current __biultin_bswap* and READ_ONCE/ACCESS_ONCE (such
> as used in gup).
>
> The ACCESS_ONCE magic pretty much forces the compiler to separate
> the load from the swap, it thus prevents us from using the byteswapped-
> load instructions that we have on powerpc, thus degrading to a load
> followed by the 5 or 6 instructions (with back-to-back dependencies)
> needed to do the swap.
yeah, looks like volatile somehow preventing gcc to optimize it,
but that's a compiler missing an optimization. New 'bigendian' attribute
for a variable is not going to help this situation.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 10:41 [Ksummit-discuss] [TECH TOPIC] Compiler shopping list David Howells
2016-07-22 15:52 ` Christian Borntraeger
@ 2016-07-25 18:49 ` Luis R. Rodriguez
2016-07-29 1:32 ` Steven Rostedt
2 siblings, 0 replies; 11+ messages in thread
From: Luis R. Rodriguez @ 2016-07-25 18:49 UTC (permalink / raw)
To: David Howells; +Cc: Michael Matz, agraf, ksummit-discuss, valentinrothberg
On Fri, Jul 22, 2016 at 11:41:00AM +0100, David Howells wrote:
> Are there additional things we can get the compiler to do for us? Some
> things I've seen brought up:
>
> (1) Additional __atomic_*() ops could be useful. Suggestions I've heard
> include direct LL/SC support - though the compiler people don't seem so
> keen on that.
>
> (2) -mmodel=kernel flag so that the compiler can optimise better for the
> kernel memory model.
Do we have enough compiler folk presentation attending? I know a few kernel
developers take on compiler features on their own these days, but I think
this is rather rare.
One idea that came up while evaluating further optimizations possible with
paravirtualized kernels was the possibility of supporting a thing called
"compiler multiverse" support [0] which would try to generalize the binary
patching technique used in the Linux kernel for use for any application. While
this topic and precise domain interests only a very few, a generic solution for
this sort of problem has uses outside of PV support, and even outside of Linux.
One of the side benefits of a thing could be for instance a mechanism to avoid
/ vet for dead code and vetting such code never runs. I've had my eyes on a
kernel-based solution for this, compiler multiverse support is a counter idea
by Alexander that came up in evaluating similar issues with other code bases
(qemu in particular) and trying to brain storm a more general solution.
I'll note, as it stands, the potential size constraints (even though only bool
has been considered), and the fact we already have a framework for dealing with
some of these sorts of things (although not exactly this very feature -- *yet*),
has put this feature lower on a priority list of things to write it is worth
mentioning should others out there working on the kernel likely be looking for
something similar to help address dead code and which would be generic as well.
A kernel-based-only solution pivoted on our existing alternatives model and
further features still being developed may enable such feature without this
compiler feature, but this is still in the works.
If this is a topic of interest folks required would be:
o Michael Matz <matz@suse.de>
o Valentin Rothberg <valentinrothberg@gmail.com>
o Alexander Graf <agraf@suse.de>
This sort of feature is useful when distributions support large variability
and such variability incurs significant run time deltas, and dead code becomes
more of a concern.
[0] https://kernelnewbies.org/KernelProjects/compiler-multiverse
Luis
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 17:05 ` Christian Borntraeger
2016-07-22 17:17 ` James Bottomley
@ 2016-07-29 1:16 ` Steven Rostedt
1 sibling, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2016-07-29 1:16 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: ksummit-discuss
On Fri, 22 Jul 2016 19:05:06 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> Yes, I know at least 3 people suggesting that and thinking this is useful
> ( Can you beat Linus' 2001 https://gcc.gnu.org/ml/gcc/2001-12/msg00932.html? ;-) )
"[ Asbestos suit: ON ] And hey, it's not inconceivable that big-endian
will be only a historical remnant in another ten years. [ Evil grin ]"
Hmm, his prediction is off. Maybe it will require another 10 years ;)
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list
2016-07-22 10:41 [Ksummit-discuss] [TECH TOPIC] Compiler shopping list David Howells
2016-07-22 15:52 ` Christian Borntraeger
2016-07-25 18:49 ` Luis R. Rodriguez
@ 2016-07-29 1:32 ` Steven Rostedt
2 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2016-07-29 1:32 UTC (permalink / raw)
To: David Howells; +Cc: ksummit-discuss
On Fri, 22 Jul 2016 11:41:00 +0100
David Howells <dhowells@redhat.com> wrote:
> Are there additional things we can get the compiler to do for us? Some
> things I've seen brought up:
>
> (1) Additional __atomic_*() ops could be useful. Suggestions I've heard
> include direct LL/SC support - though the compiler people don't seem so
> keen on that.
>
> (2) -mmodel=kernel flag so that the compiler can optimise better for the
> kernel memory model.
>
I'll bring this up, well, just because. It was something I wanted
before, but I'm not sure how much of a benefit it will be without it
being actually implemented where we can test it.
What about assigning blocks of code to a section?
if (y) {
some_code;
} __attribute__((section("x")))
where the code "some_code" will be moved to that section. The use case
I'm thinking about is for trace points. The tracepoint code does inject
a bit of its own code at each tracepoint location (see
include/linux/tracepoint.h trace_##name()). It's under a
static_key_false() which is an unlikely, thus gcc does sometimes do
well to move it to the bottom of a function. But I'm wondering if it
would be better to get it into its own section all together? I'm just
worried about the icache hit where there are tracepoints in critical
sections.
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-07-29 1:32 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-22 10:41 [Ksummit-discuss] [TECH TOPIC] Compiler shopping list David Howells
2016-07-22 15:52 ` Christian Borntraeger
2016-07-22 15:59 ` David Woodhouse
2016-07-22 17:05 ` Christian Borntraeger
2016-07-22 17:17 ` James Bottomley
2016-07-22 17:33 ` David Woodhouse
2016-07-29 1:16 ` Steven Rostedt
2016-07-23 20:35 ` Benjamin Herrenschmidt
2016-07-23 23:09 ` Alexei Starovoitov
2016-07-25 18:49 ` Luis R. Rodriguez
2016-07-29 1:32 ` Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox