From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E5FBCF58FC for ; Fri, 20 Sep 2024 09:48:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D5C46B0082; Fri, 20 Sep 2024 05:48:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 086066B0083; Fri, 20 Sep 2024 05:48:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E90866B0085; Fri, 20 Sep 2024 05:48:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CBAE86B0082 for ; Fri, 20 Sep 2024 05:48:22 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 598281A01CB for ; Fri, 20 Sep 2024 09:48:22 +0000 (UTC) X-FDA: 82584641244.30.D7E8F68 Received: from pandora.armlinux.org.uk (pandora.armlinux.org.uk [78.32.30.218]) by imf04.hostedemail.com (Postfix) with ESMTP id 01F4440013 for ; Fri, 20 Sep 2024 09:48:19 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=armlinux.org.uk header.s=pandora-2019 header.b=FCaBAyrk; spf=none (imf04.hostedemail.com: domain of "linux+linux-mm=kvack.org@armlinux.org.uk" has no SPF policy when checking 78.32.30.218) smtp.mailfrom="linux+linux-mm=kvack.org@armlinux.org.uk"; dmarc=pass (policy=none) header.from=armlinux.org.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726825549; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lqLyI3VFGw/HAwz/TUdgSpSC7qmKkHxnZTuEYK+TROA=; b=8p92x0IErzCNApVOd4JsElrTDIW844nAr5oiKiwFzG0hL5/wUM9sv8glOnvappnDNyUld3 8UTpw+cbsTCoPdMPRA/KM4LpyUURh2DLJHyMW43Job746ftPd3wRERIC6/HNJOo2A8BRrY QuhSrom03oYqtJinfh6pTgv+C5zqnFw= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=armlinux.org.uk header.s=pandora-2019 header.b=FCaBAyrk; spf=none (imf04.hostedemail.com: domain of "linux+linux-mm=kvack.org@armlinux.org.uk" has no SPF policy when checking 78.32.30.218) smtp.mailfrom="linux+linux-mm=kvack.org@armlinux.org.uk"; dmarc=pass (policy=none) header.from=armlinux.org.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726825549; a=rsa-sha256; cv=none; b=D/UKom+x3ngZQJAAGWlJS2TAVZyDJv2yoG6rMnDOPf47JUrKRhZjhh5FBMgX59Qljrv40D XPNZSDQI2Q1gmbebJtOJ78vWbbZ/ohMRPBFOGe6/MBVHDxl/WhX1JxRFZN0Yn0ID98h5ln T383kE/NENOu3EtUdWNwqsX7xOPAAgI= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=armlinux.org.uk; s=pandora-2019; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=lqLyI3VFGw/HAwz/TUdgSpSC7qmKkHxnZTuEYK+TROA=; b=FCaBAyrkr2GhMgqDOcDSm15x44 nBBZNKK4o9GTMPjz2ia1eI3kls0oviUY0lhgsfF078MVJ95hghGLUWqxjj7IQJmhvgpLIPV/yzCoi UwCVkT0y2oAqMAnQ7V4iCoNNGeAaXGZoVmnKe+e4D8PVlQrF+WlpvAQT5N3EzL52CV2AbsyillS/E G7kqNOAMyp5VI22eqWfpdik8O6y0z3/ezk72CoHzgDI7/QfbfayfpKTVZe42RhbO6Kyg63huHbJyD lrcsuFtw8ntkZDjFpG/+BpKsPH4eBNbbt48coqkUgq60sBcdvKdIMvsGIaTRuXyBxkKozH6oFjq34 HASJesyQ==; Received: from shell.armlinux.org.uk ([fd8f:7570:feb6:1:5054:ff:fe00:4ec]:33476) by pandora.armlinux.org.uk with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1sraF3-0001NG-1b; Fri, 20 Sep 2024 10:47:57 +0100 Received: from linux by shell.armlinux.org.uk with local (Exim 4.96) (envelope-from ) id 1sraEr-0002Qp-2B; Fri, 20 Sep 2024 10:47:45 +0100 Date: Fri, 20 Sep 2024 10:47:45 +0100 From: "Russell King (Oracle)" To: Ryan Roberts Cc: Anshuman Khandual , kernel test robot , linux-mm@kvack.org, llvm@lists.linux.dev, oe-kbuild-all@lists.linux.dev, Andrew Morton , David Hildenbrand , "Mike Rapoport (IBM)" , Arnd Bergmann , x86@kernel.org, linux-m68k@lists.linux-m68k.org, linux-fsdevel@vger.kernel.org, kasan-dev@googlegroups.com, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dimitri Sivanich , Alexander Viro , Muchun Song , Andrey Ryabinin , Miaohe Lin , Dennis Zhou , Tejun Heo , Christoph Lameter , Uladzislau Rezki , Christoph Hellwig Subject: Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries Message-ID: References: <20240917073117.1531207-8-anshuman.khandual@arm.com> <202409190310.ViHBRe12-lkp@intel.com> <8f43251a-5418-4c54-a9b0-29a6e9edd879@arm.com> <82fa108e-5b15-435a-8b61-6253766c7d88@arm.com> <5bd51798-cb47-4a7b-be40-554b5a821fe7@arm.com> <9e68ffad-8a7e-40d7-a6f3-fa989a834068@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9e68ffad-8a7e-40d7-a6f3-fa989a834068@arm.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 01F4440013 X-Stat-Signature: s1qfu4u3c4drnmeb35kpdmdjwe3p5yku X-Rspam-User: X-HE-Tag: 1726825699-23310 X-HE-Meta: U2FsdGVkX1+J0yz8c4IR5t8B6Bctq+/HwyAPtHsPQXeTDxOyiUqWZB1nYwm0wpDxaAtjzytPnB3i1QZVbd2tt3sxw03kEtE9s5bzUsLTahR+ebAjD7JMPrpMmL48x0RVNwNqNzImqvECHkuDNxcfxbqYe3NuDQRjs6GckdlM/Ba6OmN5EBCDEo7rIWEST7VH6HWF7q1kwgT2084qjaA0PojP4bxLnipMWwyVMnYUWYnUvTnZOOvT+h7jQozmW0IXx9D/rb5mhWgl5xo4sW5NTmYJWwUMtA5JgDGG8tgwM91RHxsTN5NPJP9FEbhV6JyMO2hBO1xVF6SK2P/N6bNoVUrA7/uCvTuSoF+4wZX2socpfaldMBcj5B7/Cb2hp4Yil/asJb4ZKZxjYiyUtY+hbd7pK9YjYhqa09E067pj3ybu6x6jDYVJ83ronJQuXUmFZYsqgSg9pFYNk+W8VoX73Bm/BCJ+a3/oRUwAsj2FLD9c21mvBTOkVQF4VaeQfiAzzNxQD2U7sWXk61zCA/N17i8ddh7v/+b+mKa0G7MLwQ9TBn5rVwigXbucgc40YBZleImfMlNAeh7Ufsp6VQFrrSlLW2tFp+SiqzOeLozbmq8y7XUkgwLXRJWyL0uIAUh82lqDgEzU2L6TkggAjbHXQnMTqH0tVFo603WOc/GCrC8lYT+H3xw2BMuIBqumbF/y83xKBFfkkxZhdq2mx2JAUtqIl0k0IgObxXB3rgkV5Xo/CnDYSMJ7QhpYpLOjOtJIAgZMymBd12HHVU30JTisBQKWvGtV6sY/mc7vL7202pSGJQGYBqXzoSDTsW/7p46jwV/jGiDKFiOIKoQUnEXZO2nmJQpM2anPcxSoJaIW8aKN+VdasWnTDpZ3J8r3nRNfYCu9+LNE08b8GHH01fkfFKq939wqW0aryWy7zaCeQH7QqFWlMkGBJTjFHOhuADTeGCinIpo9GrHo+vd5AAu NXJg0p0L BmAF4X+qglb/fdAtxuF8KGEAPvDHD+WWx598vfZMoNoqY9AcBfkcX1/7wkjxdjyWXkZ4oWCF85XFB/3ElaqfbsWQxVLH0p+UrC+0AMj7nDA8NkjBkMSFcWf1pvVJiOQN33Ir8JC6eEDDsgkQXqu7S551aDcLLUUlFMISx/FyMgl+lFMQ6X9D4J4ivDHQDlB2DE9bCkf8WxGR/KANhnyM21V0X2gwFwfl854JboEW6CTWRULfE7ZssQxC7wA5VeKqH3ucdTyFN1p61mINqvZmsRh+mq3ixTdsf43m44R7trBTO8/BnieDYgsADjbQH3m2D4yo4BDug+I0Bh0C//lVWHp+Mbm5LEKrllARQP38O9XklLlqV15aTouo1yq/QCGBE87jd39vLtCFYrb8aIbNLHPXx0tO51RN0f2L7zbCxhEDrQyCjA2QgQzoD7P9DZ/dHEyERgx4KWRjLVmp7fZsFEk4GSu63oX9g3VcyAL72p3eLUBGBSgqHuGgPJPHfPloly3vhO3xxWjL28XBERyPsJ5qkW4AnoaoOpZzPNkSbHIdHF326YavDDWC2lN37TGXW6HgQ9SNJaaCnizclhhnM6Jb7wmpV4vRbaL6PS/1qP+YeYFsF21JxBkdm6UGf6EFJBX1vtFQ7yn0DXXjqZnD7GCjyjA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Sep 20, 2024 at 08:57:23AM +0200, Ryan Roberts wrote: > On 19/09/2024 21:25, Russell King (Oracle) wrote: > > On Thu, Sep 19, 2024 at 07:49:09PM +0200, Ryan Roberts wrote: > >> On 19/09/2024 18:06, Russell King (Oracle) wrote: > >>> On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote: > >>>>> 32-bit arm uses, in some circumstances, an array because each level 1 > >>>>> page table entry is actually two descriptors. It needs to be this way > >>>>> because each level 2 table pointed to by each level 1 entry has 256 > >>>>> entries, meaning it only occupies 1024 bytes in a 4096 byte page. > >>>>> > >>>>> In order to cut down on the wastage, treat the level 1 page table as > >>>>> groups of two entries, which point to two consecutive 1024 byte tables > >>>>> in the level 2 page. > >>>>> > >>>>> The level 2 entry isn't suitable for the kernel's use cases (there are > >>>>> no bits to represent accessed/dirty and other important stuff that the > >>>>> Linux MM wants) so we maintain the hardware page tables and a separate > >>>>> set that Linux uses in the same page. Again, the software tables are > >>>>> consecutive, so from Linux's perspective, the level 2 page tables > >>>>> have 512 entries in them and occupy one full page. > >>>>> > >>>>> This is documented in arch/arm/include/asm/pgtable-2level.h > >>>>> > >>>>> However, what this means is that from the software perspective, the > >>>>> level 1 page table descriptors are an array of two entries, both of > >>>>> which need to be setup when creating a level 2 page table, but only > >>>>> the first one should ever be dereferenced when walking the tables, > >>>>> otherwise the code that walks the second level of page table entries > >>>>> will walk off the end of the software table into the actual hardware > >>>>> descriptors. > >>>>> > >>>>> I've no idea what the idea is behind introducing pgd_get() and what > >>>>> it's semantics are, so I can't comment further. > >>>> > >>>> The helper is intended to read the value of the entry pointed to by the passed > >>>> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no > >>>> tearing. Further, the PTL is expected to be held when calling the getter. If the > >>>> HW can write to the entry such that its racing with the lock holder (i.e. HW > >>>> update of access/dirty) then READ_ONCE() should be suitable for most > >>>> architectures. If there is no possibility of racing (because HW doesn't write to > >>>> the entry), then a simple dereference would be sufficient, I think (which is > >>>> what the core code was already doing in most cases). > >>> > >>> The core code should be making no access to the PGD entries on 32-bit > >>> ARM since the PGD level does not exist. Writes are done at PMD level > >>> in arch code. Reads are done by core code at PMD level. > >>> > >>> It feels to me like pgd_get() just doesn't fit the model to which 32-bit > >>> ARM was designed to use decades ago, so I want full details about what > >>> pgd_get() is going to be used for and how it is going to be used, > >>> because I feel completely in the dark over this new development. I fear > >>> that someone hasn't understood the Linux page table model if they're > >>> wanting to access stuff at levels that effectively "aren't implemented" > >>> in the architecture specific kernel model of the page tables. > >> > >> This change isn't as big and scary as I think you fear. > > > > The situation is as I state above. Core code must _not_ dereference pgd > > pointers on 32-bit ARM. > > Let's just rewind a bit. This thread exists because the kernel test robot failed > to compile pgd_none_or_clear_bad() (a core-mm function) for the arm architecture > after Anshuman changed the direct pgd dereference to pgdp_get(). The reason > compilation failed is because arm defines its own pgdp_get() override, but it is > broken (there is a typo). Let's not rewind, because had you fully read and digested my reply, you would have seen why this isn't a problem... but let me spell it out. > > Code before Anshuman's change: > > static inline int pgd_none_or_clear_bad(pgd_t *pgd) > { > if (pgd_none(*pgd)) > return 1; > if (unlikely(pgd_bad(*pgd))) { > pgd_clear_bad(pgd); > return 1; > } > return 0; > } This isn't a problem as the code stands. While there is a dereference in C, that dereference is a simple struct copy, something that we use everywhere in the kernel. However, that is as far as it goes, because neither pgd_none() and pgd_bad() make use of their argument, and thus the compiler will optimise it away, resulting in no actual access to the page tables - _as_ _intended_. If these are going to be converted to pgd_get(), then we need pgd_get() to _also_ be optimised away, and if e.g. this is the only place that pgd_get() is going to be used, the suggestion I made in my previous email is entirely reasonable, since we know that the result of pgd_get() will not actually be used. > As an aside, the kernel also dereferences p4d, pud, pmd and pte pointers in > various circumstances. I already covered these in my previous reply. > And other changes in this series are also replacing those > direct dereferences with calls to similar helpers. The fact that these are all > folded (by a custom arm implementation if I've understood the below correctly) > just means that each dereference is returning what you would call the pmd from > the HW perspective, I think? It'll "return" the first of each pair of level-1 page table entries, which is pgd[0] or *p4d, *pud, *pmd - but all of these except *pmd need to be optimised away, so throwing lots of READ_ONCE() around this code without considering this is certainly the wrong approach. > >> The core-mm today > >> dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See > >> follow_pfnmap_start(), > > > > Doesn't seem to exist at least not in 6.11. > > Appologies, I'm on mm-unstable and that isn't upstream yet. See follow_pte() in > v6.11 or __apply_to_page_range(), or pgd_none_or_clear_bad() as per above. Looking at follow_pte(), it's not a problem. I think we wouldn't be having this conversation before: commit a32618d28dbe6e9bf8ec508ccbc3561a7d7d32f0 Author: Russell King Date: Tue Nov 22 17:30:28 2011 +0000 ARM: pgtable: switch to use pgtable-nopud.h where: -#define pgd_none(pgd) (0) -#define pgd_bad(pgd) (0) existed before this commit - and thus the dereference in things like: pgd_none(*pgd) wouldn't even be visible to beyond the preprocessor step. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!