From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAC6ACF3962 for ; Thu, 19 Sep 2024 17:49:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2EAE56B0085; Thu, 19 Sep 2024 13:49:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29A346B0088; Thu, 19 Sep 2024 13:49:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 189D56B008A; Thu, 19 Sep 2024 13:49:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F31C76B0085 for ; Thu, 19 Sep 2024 13:49:24 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 77016C0F5F for ; Thu, 19 Sep 2024 17:49:24 +0000 (UTC) X-FDA: 82582224648.06.0B89AF1 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf05.hostedemail.com (Postfix) with ESMTP id 9C05610000B for ; Thu, 19 Sep 2024 17:49:21 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726768049; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OBrtwX2GVE6dCu0mnw4wcEfpxmnF7kPLipqIHnc/mmo=; b=0V7hK29h8CPOHtz/G/SUl574KSl1mUCqWJsEs0z2JhsZWzLLnV94mtyCyClfJCoA+Y3OmS L/vRpFawXY+SulxfGT9dvrQAbmRvzGqAXg/PEJMBBG966ZWYDTUw55XYriAdkBs4vmqqid I645glIHldwb3P6eSWgOnrs3hzrz6dk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726768049; a=rsa-sha256; cv=none; b=pY78GfF/oWIQIGWD39rtPUi01ex3xbIBL0i2y3gh8palWTvigOTuXLGTEtnrBNoktaEza7 urZs6jBmIS9QJEsYoatkQEXv2caIIROnCl8GVKx1czG0jfbpA8ASOu+ZXp8eh6Yeaa7Ko2 mA0cMVIlfmBFBplA30QN0TgVtbgRbqo= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 10F3A1007; Thu, 19 Sep 2024 10:49:50 -0700 (PDT) Received: from [10.57.82.79] (unknown [10.57.82.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 973013F64C; Thu, 19 Sep 2024 10:49:12 -0700 (PDT) Message-ID: <5bd51798-cb47-4a7b-be40-554b5a821fe7@arm.com> Date: Thu, 19 Sep 2024 19:49:09 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V2 7/7] mm: Use pgdp_get() for accessing PGD entries Content-Language: en-GB To: "Russell King (Oracle)" Cc: Anshuman Khandual , kernel test robot , linux-mm@kvack.org, llvm@lists.linux.dev, oe-kbuild-all@lists.linux.dev, Andrew Morton , David Hildenbrand , "Mike Rapoport (IBM)" , Arnd Bergmann , x86@kernel.org, linux-m68k@lists.linux-m68k.org, linux-fsdevel@vger.kernel.org, kasan-dev@googlegroups.com, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, Dimitri Sivanich , Alexander Viro , Muchun Song , Andrey Ryabinin , Miaohe Lin , Dennis Zhou , Tejun Heo , Christoph Lameter , Uladzislau Rezki , Christoph Hellwig References: <20240917073117.1531207-8-anshuman.khandual@arm.com> <202409190310.ViHBRe12-lkp@intel.com> <8f43251a-5418-4c54-a9b0-29a6e9edd879@arm.com> <82fa108e-5b15-435a-8b61-6253766c7d88@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9C05610000B X-Stat-Signature: rqzapses3e9fge38pk4s86ehkuyeygun X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1726768161-761007 X-HE-Meta: U2FsdGVkX195DW/y9elF+VVMI79ICefvLxhDZLRXzJpzKCtIuKqWTB7w3pk1EbLGnQUX4yDa/nBRrsEWUYnnABG+JmCweEPImrEyJoKSQRjfrE53+BFL/X+rm6dQPUxhF9EndAS2mKTrOj0tbxjtBjfzt+fNtOnjR3HpLAh/MQhUbtQ4LrJU17N3ngL5Lm7qhYY4IQ71eumSYRWawFoKccBIQHRRSvSCvxAnUiEsxxXQcTm0Ves+/RCk0mHc5hhm5mY0eQGg7aZ0ycyy5Sg3pwpyVaq3Ie2sJg8Eqb8VB6jUQikyxbKOMVskvT4rjY/dC0jbFx8BQ8OJK/cHwPGaPtBKlwc53j1+gFkTpPGLbyhRqZycZAPnKGQyFP3ssjYQcEM7gRJjYUARXHmWU3JIlCGky2q7Xx/Cf9nhpsYJ3RGMWheIC69n99kgCZyS8eOoRQ8HKqYDyg4aHj/V/539qx0ZU+A4oOczKX+fIQgVd4J/bz23Ibl6KfwPowB9O8NGPpYYA0g+5BQCVMlQVnlVmoHXTGdnAAAFoFJDb60piBEZGC18ccDQQ4deHVSuqax2ARlo0ah1/Mk4yK0YPdG23m+rlqxM9I7wSTmYNSr624eUEHUVyxM7ZehrjxctuoYOGIi9+JYhbSF35tlirltze82ryzLz1TvdOVwdGPISZS35pv+KvlV1Vrn8yOHiUjTibA92aNiCZX40gk82HZM36izbmWAgqXqoEiTygN9iI9v+E1OWJirgDFZW4f61rGoQpKx7FQ9+YMBOHicDxr7q9JV2Ir7y0hLRpT2EEB0S8c+1nzrsmI15Kt5FuZGFxncVs2fBuJN2yTeNCbGdKSN32gMYdHVdu4yfWIhELlrDFvLoF2Y7QOGwUWlcH24JSISI1F8ptEat65B/iY+ebHS6phdqkuwN+OFX+bpYhLB3/SVIOw0Ye23Z+h1RJcd04s5Ve9/vj32ejONFwIWh2o5 z3xjuVgP lvIvWDi7JPiwRidTRuw28Knf8/qYno6gDxD8jNVCVSQvq2c8QEUg8gBaCEoBDe+NohU9Ey7R93amGcUuagWkx7JPucozkp+0fya1LBBKph7DK5YHfkllb2e3oPZWjmij+aDBLgSXZGG9NqesJf+Fny2G5DJUwhDMsPwrl4uVaZ63HJwXg1TrJkX5g2w6tyoh6GcEbt2QIxPMhXWKQIIZXdKYA6IAaW153sK+wDkAzxMF/eVHQICtTpyqy+t5z04GD/oT8eLNJk7wXjJo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 19/09/2024 18:06, Russell King (Oracle) wrote: > On Thu, Sep 19, 2024 at 05:48:58PM +0200, Ryan Roberts wrote: >>> 32-bit arm uses, in some circumstances, an array because each level 1 >>> page table entry is actually two descriptors. It needs to be this way >>> because each level 2 table pointed to by each level 1 entry has 256 >>> entries, meaning it only occupies 1024 bytes in a 4096 byte page. >>> >>> In order to cut down on the wastage, treat the level 1 page table as >>> groups of two entries, which point to two consecutive 1024 byte tables >>> in the level 2 page. >>> >>> The level 2 entry isn't suitable for the kernel's use cases (there are >>> no bits to represent accessed/dirty and other important stuff that the >>> Linux MM wants) so we maintain the hardware page tables and a separate >>> set that Linux uses in the same page. Again, the software tables are >>> consecutive, so from Linux's perspective, the level 2 page tables >>> have 512 entries in them and occupy one full page. >>> >>> This is documented in arch/arm/include/asm/pgtable-2level.h >>> >>> However, what this means is that from the software perspective, the >>> level 1 page table descriptors are an array of two entries, both of >>> which need to be setup when creating a level 2 page table, but only >>> the first one should ever be dereferenced when walking the tables, >>> otherwise the code that walks the second level of page table entries >>> will walk off the end of the software table into the actual hardware >>> descriptors. >>> >>> I've no idea what the idea is behind introducing pgd_get() and what >>> it's semantics are, so I can't comment further. >> >> The helper is intended to read the value of the entry pointed to by the passed >> in pointer. And it shoiuld be read in a "single copy atomic" manner, meaning no >> tearing. Further, the PTL is expected to be held when calling the getter. If the >> HW can write to the entry such that its racing with the lock holder (i.e. HW >> update of access/dirty) then READ_ONCE() should be suitable for most >> architectures. If there is no possibility of racing (because HW doesn't write to >> the entry), then a simple dereference would be sufficient, I think (which is >> what the core code was already doing in most cases). > > The core code should be making no access to the PGD entries on 32-bit > ARM since the PGD level does not exist. Writes are done at PMD level > in arch code. Reads are done by core code at PMD level. > > It feels to me like pgd_get() just doesn't fit the model to which 32-bit > ARM was designed to use decades ago, so I want full details about what > pgd_get() is going to be used for and how it is going to be used, > because I feel completely in the dark over this new development. I fear > that someone hasn't understood the Linux page table model if they're > wanting to access stuff at levels that effectively "aren't implemented" > in the architecture specific kernel model of the page tables. This change isn't as big and scary as I think you fear. The core-mm today dereferences pgd pointers (and p4d, pud, pmd pointers) directly in its code. See follow_pfnmap_start(), gup_fast_pgd_leaf(), and many other sites. These changes aim to abstract those dereferences into an inline function that the architecture can override and implement if it so wishes. The core-mm implements default versions of these helper functions which do READ_ONCE(), but does not currently use them consistently. >From Anshuman's comments earlier in this thread, it looked to me like the arm pgd_t type is too big to read with READ_ONCE() - it can't be atomically read on that arch. So my proposal was to implement the override for arm to do exactly what the core-mm used to do, which is a pointer dereference. So that would result in exact same behaviour for the arm arch. > > Essentially, on 32-bit 2-level ARM, the PGD is merely indexed by the > virtual address. As far as the kernel is concerned, each entry is > 64-bit, and the generic kernel code has no business accessing that > through the pgd pointer. > > The pgd pointer is passed through the PUD and PMD levels, where it is > typecast down through the kernel layers to a pmd_t pointer, where it > becomes a 32-bit quantity. This results in only the _first_ level 1 > pointer being dereferenced by kernel code to a 32-bit pmd_t quantity. > pmd_page_vaddr() converts this pmd_t quantity to a pte pointer (which > points at the software level 2 page tables, not the hardware page > tables.) As an aside, my understanding of Linux's pgtable model differs from what you describe. As I understand it, Linux's logical page table model has 5 levels (pgd, p4d, pud, pmd, pte). If an arch doesn't support all 5 levels, then the middle levels can be folded away (p4d first, then pud, then pmd). But the core-mm still logically walks all 5 levels. So if the HW supports 2 levels, those levels are (pgd, pte). But you are suggesting that arm exposes pmd and pte, which is not what Linux expects? (Perhaps you call it the pmd in the arch, but that is being folded and accessed through the pgd helpers in core code, I believe? > > So, as I'm now being told that the kernel wants to dereference the > pgd level despite the model I describe above, alarm bells are ringing. > I want full information please. > This is not new; the kernel already dereferences the pgd pointers. Thanks, Ryan