From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp2.linuxfoundation.org (smtp2.linux-foundation.org [172.17.192.36]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 402BE6C for ; Sat, 23 Jul 2016 20:42:07 +0000 (UTC) Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by smtp2.linuxfoundation.org (Postfix) with ESMTPS id 82FFB1DAA9 for ; Sat, 23 Jul 2016 20:42:05 +0000 (UTC) Message-ID: <1469306149.8568.209.camel@kernel.crashing.org> From: Benjamin Herrenschmidt To: David Woodhouse , Christian Borntraeger , David Howells , ksummit-discuss@lists.linuxfoundation.org Date: Sun, 24 Jul 2016 06:35:49 +1000 In-Reply-To: <1469203184.120686.212.camel@infradead.org> References: <15569.1469184060@warthog.procyon.org.uk> <5792414F.5040902@de.ibm.com> <1469203184.120686.212.camel@infradead.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: Re: [Ksummit-discuss] [TECH TOPIC] Compiler shopping list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Fri, 2016-07-22 at 16:59 +0100, David Woodhouse wrote: > I'm not sure Linus proposed that. I certainly did, many times. > > With the work I put in to make use of __builtin_bswapXX() we do have > a > *certain* amount of the functionality that full endianness > attribution > would give us — the compiler can see and optimise certain > load/mask/save operations, and can use movbe and equivalent > instructions. > > But a full implementation that let us just do assignment without > jumping through the hoops might still be nice. One advantage of that is it might allow to work around a limitation with the current __biultin_bswap* and READ_ONCE/ACCESS_ONCE (such as used in gup). The ACCESS_ONCE magic pretty much forces the compiler to separate the load from the swap, it thus prevents us from using the byteswapped- load instructions that we have on powerpc, thus degrading to a load followed by the 5 or 6 instructions (with back-to-back dependencies) needed to do the swap. This caused a measurable performance hit on microbenchmarks when we forced our page tables big endian on a little endian kernel (in order to accomodate for POWER9 new radix MMU). Cheers, Ben.