From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 165D1C83030 for ; Sat, 5 Jul 2025 20:16:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 63DB36B03F6; Sat, 5 Jul 2025 16:16:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5EF016B03F7; Sat, 5 Jul 2025 16:16:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 504966B03F8; Sat, 5 Jul 2025 16:16:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3E6C76B03F6 for ; Sat, 5 Jul 2025 16:16:34 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8B3701284AD for ; Sat, 5 Jul 2025 20:16:33 +0000 (UTC) X-FDA: 83631318666.09.0F2C5A0 Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by imf08.hostedemail.com (Postfix) with ESMTP id 10F7D16000C for ; Sat, 5 Jul 2025 20:16:29 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; spf=pass (imf08.hostedemail.com: domain of segher@kernel.crashing.org designates 63.228.1.57 as permitted sender) smtp.mailfrom=segher@kernel.crashing.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751746592; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/KVTlMcn1gjvoMF/2QD16fykyN3hMq+DuDN2DHqGFRg=; b=NiOol3DTVUj8RI8DDP/xHKfE/BdUtiOc6CK2o01QKc/eWgM1UncZDOGvmkn2qjtU2Pge3E XzHu6b8xkDvNsMLPYjO4Z70hqV9RX7GG3jcyf6VRcbmUkUuiIDsxq0muEmsL0nMUF7sVfg PYdg1AemvfaO1GhjPYsH+jmfLbX7i3k= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=none; spf=pass (imf08.hostedemail.com: domain of segher@kernel.crashing.org designates 63.228.1.57 as permitted sender) smtp.mailfrom=segher@kernel.crashing.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751746592; a=rsa-sha256; cv=none; b=nNDWI/cD1TRyXjpSkFceVFySgFhMPnWkjNfEmbYVyhWuTLqHkDeg9bRaN+cNv05TOgdvjK cgCwJ7osPKCYweB/o0Me7ikerjWIj8dtmCUWr/iKcripnTx5BBOREG/BeESG8AbJbDYHOa RRd8ehwxuQBIDIZJb5hKs9lJyzl7irA= Received: from gate.crashing.org (localhost [127.0.0.1]) by gate.crashing.org (8.18.1/8.18.1/Debian-2) with ESMTP id 565KG2rq174660; Sat, 5 Jul 2025 15:16:02 -0500 Received: (from segher@localhost) by gate.crashing.org (8.18.1/8.18.1/Submit) id 565KFv2e174653; Sat, 5 Jul 2025 15:15:57 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Sat, 5 Jul 2025 15:15:57 -0500 From: Segher Boessenkool To: David Laight Cc: Christophe Leroy , Michael Ellerman , Nicholas Piggin , Naveen N Rao , Madhavan Srinivasan , Alexander Viro , Christian Brauner , Jan Kara , Thomas Gleixner , Ingo Molnar , Peter Zijlstra , Darren Hart , Davidlohr Bueso , Andre Almeida , Andrew Morton , Dave Hansen , Linus Torvalds , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 0/5] powerpc: Implement masked user access Message-ID: References: <20250622172043.3fb0e54c@pumpkin> <20250624131714.GG17294@gate.crashing.org> <20250624175001.148a768f@pumpkin> <20250624182505.GH17294@gate.crashing.org> <20250624220816.078f960d@pumpkin> <83fb5685-a206-477c-bff3-03e0ebf4c40c@csgroup.eu> <20250626220148.GR17294@gate.crashing.org> <20250705193332.251e0b1f@pumpkin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250705193332.251e0b1f@pumpkin> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 10F7D16000C X-Stat-Signature: 8xmbkds6xs7oh1kwj9kkz4rgrza1qqm1 X-Rspam-User: X-HE-Tag: 1751746589-102417 X-HE-Meta: U2FsdGVkX18nIL1WK3XDEYz63gNf/ReQLysOSgh8/OVurz7/m92+aaVxGPE4XFP1kMfjOgO93UDGjduqXHulPhkYu46mLeWtHB3BmGpvDZJptf9PDrivnRugf1l6FRIGv+g8xgPP78Lw09E2dWXboZXs5W9OCpYMq8hGZRTEktQLMujJwIjezkCEBQmQAvk9S5NGT/PybWvtrskYmbDWlcUt3qsLgQ4lKxtsT3rYavjyeG5ulC1rjLdHKgkWxk06fVkz4VKHPj+JyqgMEz5Z233NGBS34P2wCTnjm3qrMPfhU2xYez83wZKVGqbNjDzW2eeogveOCzzdR3a96eGCIE4kNkhwu0qgcOX8DJcu8SS8Bt3o0hJ+1tlILUj4EtwN5LQeMkC+P7hpYOlS4VGDQLRV0IWDyvyy+9G0wUTS8w+Z0c/WNDlOXuj0pMTobhLeZ1svFfTC1QjKHWdBWdBpiybtqTipIS+gA8MGS1dV5CnKRvmdPKPawyKw2KsDlU5hnvXn3P7hyoF8pH73eW/mptBDZvNrrVMWYl+0jqJCQR1oQ8SOPL2aKQn6DPSmZLCu5zc5VXCskt2P7L8m7T2SMoeKVpwwDw0xjRUrOqwpLw4zI7t10z2nmOU+2IlCwK2BnyK7oJvaxsndGi1dzTjSSkYbfLIFU3GsLKchp5edX2AU4N6Zeq/EGOX6ILb+d41aC5D0YvlejcZoYvnmMmEz3yf70hs7T2AmMWYbOikBGVHzmRNzxMP5ZN6+p+3/E2ml38LhkImPvc4TTaw35/GvSCCBNxpxqEHI2BcnGAq8xBwkkYg4GPLpibFpG4MdrJX+IjK88Cv9kyG6jNG+c+/dcIVCp43SaIuyvOeHZhdqUuwljqpj/JXZC0H6BZhzatXQfHwKfk8lbNpFliDDjvEf8WNwJHSWZmAfeeJMqynmbbFI94MS7N9GSoaBJMWLHQQ8v3aiBpisvpT4uu6nvvj wabcpQsm YzQfZjPh4GGlPEHs0zUKsrJTJA7595D28iT78bfOnHuwkNYEC9T9Kf1cQEXTEnF+EnzQ9u+0shUBgUxSKbSVNiKJ910Q0sF4o019i0PtOR33ISNny94L66AY+7Fa5f9G3ktqnqm9MpvFiH/9z5xCCdVxkDLpujSplFh/0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi! On Sat, Jul 05, 2025 at 07:33:32PM +0100, David Laight wrote: > On Thu, 26 Jun 2025 17:01:48 -0500 > Segher Boessenkool wrote: > > On Thu, Jun 26, 2025 at 07:56:10AM +0200, Christophe Leroy wrote: > ... > > I have no idea why you think power9 has it while older CPUS do not. In > > the GCC source code we have this comment: > > /* For ISA 2.06, don't add ISEL, since in general it isn't a win, but > > altivec is a win so enable it. */ > > and in fact we do not enable it for ISA 2.06 (p8) either, probably for 2.07 I meant of course. Sigh. > > a similar reason. > > Odd, I'd have thought that replacing a conditional branch with a > conditional move would pretty much always be a win. > Unless, of course, you only consider benchmark loops where the > branch predictor in 100% accurate. The isel machine instruction is super expensive on p8: it is marked as first in an instruction group, and has latency 5 for the GPR sources, and 8 for the CR field source. On p7 it wasn't great either, it was actually converted to a branch sequence internally! On p8 there are bc+8 optimisations done by the core as well, conditional branches that skip one insn are faster than equivalent isel insns! Since p9 it is a lot better :-) > OTOH isn't altivec 'simd' instructions? AltiVec is the old motorola marketing name for what is called the "Vector Facility" in the architecture, and which at IBM is still called VMX, the name it was developed under ("Vector Multimedia Extension"). Since p7 (ISA 2.06, 2010) there also is the Vector-Scalar Extension Facility, VSX, which adds another 32 vector registers, and the traditional floating point registers are physically the same (but those use only the first half of each vector reg). Many new VSX instructions can do simple floating point stuff on all 64 VSX registers, either just on the first lane ("scalar") or on all lanes ("vector"). This does largely mean that all floating point is stored in IEEE DP format internally (on older cores usually some close to 70-bit format was used internally), which in olden times actually allowed to make the cores faster. Only when storing a value to memory it was actually converted to IEEE format (but of course it was always rounded correctly, etc.) > They pretty much only help for loops with lots of iterations. > I don't know about ppc, but I've seen gcc make a real 'pigs breakfast' > of loop vectorisation on x86. For PowerPC (or Power, the more modern name) of course we also have our fair share of problems with vectorisation. It does help that we were the first architecture used by GCC that had a serious Vector thing, the C syntax extension for Vector literals is taken from the old extensions in the AltiVec PIM but using curly brackets {} instead of round brackets (), for example. > For the linux kernel (which as Linus keeps reminding people) tends > to run 'cold cache', you probably want conditional moves in order > to avoid mis-predicted branches and non-linear execution, but > don't want loop vectorisation because the setup and end cases > cost too much compared to the gain for each iteration. You are best off using what GCC gives you, usually. It is very well tuned, both the generic and the machine-specific code :-) The kernel of course disables all Vector and FP stuff, essentially it disables use of any of the associated registers, and that's pretty much the end of it ;-) (The reason for that is that it would make task switches more expensive, long ago all task switches, but nowadays still user<->kernel switches). Segher