From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4D19C7EE29 for ; Tue, 13 Jun 2023 08:43:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 402C28E000B; Tue, 13 Jun 2023 04:43:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B1728E0002; Tue, 13 Jun 2023 04:43:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2C7288E000B; Tue, 13 Jun 2023 04:43:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1E28D8E0002 for ; Tue, 13 Jun 2023 04:43:34 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E1BAAC044B for ; Tue, 13 Jun 2023 08:43:33 +0000 (UTC) X-FDA: 80897085906.21.0713295 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf30.hostedemail.com (Postfix) with ESMTP id D0B2A80002 for ; Tue, 13 Jun 2023 08:43:26 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686645812; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cHl8P6WSaBRbGFcU7rs3D3rVoGH1IrrMPTHFEXUEirI=; b=14qcnWvdp0X971bmAvpQ8H3TSXc6hBmI7kxJSF51kBiG/vuWVteIhXq2X94hWenjVPqib0 FRzti+u2n3iowF3OFp2C9kcjhM6ONWEDk8Z2hdgIbbZbSWrR8yuDppanf0DZ7+pFwIc6KX GXm9YLLMufewfJpPbwqQdV6bKQlXlI0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf30.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686645812; a=rsa-sha256; cv=none; b=gRpdKHpT++SHrmXaekexw6SRt9nE2G8tRPxWrbbk68m2Y1DhZyfwakTvOqfkQgTLYYZBdO w9HO4ClypO5BV1pmNWjTAA38ZBaJqJXCY/K2T02Yyo29RBxPGv4DAo5aZJFbLJHVj9K2/B ingDtTt1o7OEWPweSD0+fFeXQS4KAEI= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8C8C41FB; Tue, 13 Jun 2023 01:44:10 -0700 (PDT) Received: from [10.57.74.148] (unknown [10.57.74.148]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5F5C33F71E; Tue, 13 Jun 2023 01:43:20 -0700 (PDT) Message-ID: <6b328080-ca6e-a2b1-4644-a29b62d6ac6e@arm.com> Date: Tue, 13 Jun 2023 09:43:18 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH v3 0/3] Encapsulate PTE contents from non-arch code To: Andrew Morton Cc: SeongJae Park , "Matthew Wilcox (Oracle)" , "Kirill A. Shutemov" , Mike Rapoport , Yu Zhao , Jason Gunthorpe , David Airlie , Daniel Vetter , Dimitri Sivanich , Alex Williamson , Oleksandr Tyshchenko , Alexander Viro , Christian Brauner , Mike Kravetz , Muchun Song , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Naoya Horiguchi , Miaohe Lin , Pasha Tatashin , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , linux-kernel@vger.kernel.org, linux-mm@kvack.org, damon@lists.linux.dev References: <20230612151545.3317766-1-ryan.roberts@arm.com> <20230612131656.2ba4f95865f27e6b3b984936@linux-foundation.org> From: Ryan Roberts In-Reply-To: <20230612131656.2ba4f95865f27e6b3b984936@linux-foundation.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: D0B2A80002 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: igc8nd8ahsckhe3zsneu6cqtq7e4tbd4 X-HE-Tag: 1686645806-127164 X-HE-Meta: U2FsdGVkX19VXuGmLyIt4QsJPy/VES0SiJ1CiShMFPQ82xJ2oXrQ6o3AAcr8p3uPeYGuQmiuEYmlo67aTHU9SwZXhP2vsFadEc54kdHW1xfAyuEi1SlRolW0L9KhhHGX9s/VVyndvb7b7ZQtVadXL8q7+uqz02V4uggSRzlfw71l4jdt1dn5G7Z7qSjYKMyTzQuyuJtmS1ei43ez3jDWOW/GkFy8AwTWdgwCxCQ1TmFd5fpv5/5MoqTV1L12yTB+PDpuUZZ5EpDrB9beL6ziijkdyUM+SUIw4V/7/aukWjsymcBB+xPw4I/qSrdgiyHRAXRG+NrFfk18qlcMHKYSWvUWSTFIfQXCehg4nP5Qxfi++2ZqhFlVd0rOfw7yJs6xRCVuyCVnrcwG+8V4eTbl18D5sxpjiLC/9yPxQaaxu3ZV7bCtEwMZyb2R2lu1ckZDi9/k+Dg+TW7RmnW1k4vDUgSizBftpeGlOncgtxSzCwSAhJ01lREalaUii0/i+AZCPTftjCAbNr9ZHeGeogQPKy+pXV3pi6Q9wygEJbdBgnANFWU+8ySMqTfeyS8OMItiduh6t8WoFZbtEzbyjh1yopNF3Gy80LIB7nvwXNppIoOifXNyKU3o2QWEAKdcir6Lkoke1gcdlxplDvaSWcPAzSs+k0pBNOOf7krjIJFsjWX7TYDpmyuLPiZ0L+4S6RX+U4Vk4tjvkgHAfSqzhl0oFhH0Q7X5Diya42yXHNNST+2KAdXTyG1dF6dt/uPlmRF646ePqn/T826/SWnLs9hVNWjBCjI0zBX0SyOWMtfiPkO9iMZeVav9eRUqEGTMFbal/3jhtNMMbD0MbIsWW8Bf1wspusSgxxpfjOd3RnnGkGAO+7Kg5vComdIMA7VWGLuz9MMjsLDL4gNJC1jn//f++XP2+x77hN2004aWGFfUYHQheQ+v1v9juSqIU+hGi0zzIvvlaVFVpC847MAHdr7 8N3J2V5G xCC8tuYoN4FysfDzXvQmohiwk9wusQl5rIvKMdgHaomNsnyBoyoe3wikTEi0uDqZC7krL7tWypBgOObRTx0HKDlQyytNfo6FbiCM4JC5pXE89Peczb9t+21tOsUHLy+WlOe2Y08oqKu3BzZxlsNtUGiL27sZKpROuBinfHnwtYhRO/KI1IE47Qwe1AudmmmCha+ZUVCP5qghElYNq3MZWP8zUbp6qwE98u48mcjyv67gA4oI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Andrew, Thanks for taking the patches! On 12/06/2023 21:16, Andrew Morton wrote: > On Mon, 12 Jun 2023 16:15:42 +0100 Ryan Roberts wrote: > >> Hi All, >> >> (Including wider audience this time since changes touch a fair few subsystems) >> >> This is the second half of v3 of a series to improve the encapsulation of pte >> entries by disallowing non-arch code from directly dereferencing pte_t pointers. > > That's basically all we have here for [0/N] cover letter content. I > stole some words from the [3/3] changelog, so we now have: > > : A series to improve the encapsulation of pte entries by disallowing > : non-arch code from directly dereferencing pte_t pointers. > : > : This means that by default, the accesses change from a C dereference to a > : READ_ONCE(). This is technically the correct thing to do since where > : pgtables are modified by HW (for access/dirty) they are volatile and > : therefore we should always ensure READ_ONCE() semantics. > : > : But more importantly, by always using the helper, it can be overridden by > : the architecture to fully encapsulate the contents of the pte. Arch code > : is deliberately not converted, as the arch code knows best. It is > : intended that arch code (arm64) will override the default with its own > : implementation that can (e.g.) hide certain bits from the core code, or > : determine young/dirty status by mixing in state from another source. I'm not sure I've understood what you are saying here? Is there some expectation for the content of the cover letter for mm patches, which I'm not aware of? If you can point me to some docs I'll make sure I'm conformant in future. > > >> Based on earlier feedback, I split the series in 2; the first part, fixes for >> existing bugs, was already posted at [3] and merged into mm-stable. This second >> part contains the conversion from direct dereferences to instead use >> ptep_get()/ptep_get_lockless(). >> >> See the v1 cover letter at [1] for rationale for this work. >> >> Based on feedback at v2, I've removed the new ptep_deref() helper I originally >> added, and am now using the existing ptep_get() and ptep_get_lockless() helpers. >> Testing on Ampere Altra (arm64) showed no difference in performance when using >> ptep_deref() (*pte) vs ptep_get() (READ_ONCE(*pte)). >> >> Patches are based on mm-unstable (49e038b1919e) and a branch is available at [4] >> (Let me know if this is the wrong branch to target - I'm still not familiar with >> the details of the mm- dev process!). Note that Hugh Dickins's "mm: allow >> pte_offset_map[_lock]() to fail" (now in mm-unstable) patch set caused a number >> of conflicts which I've resolved. But due to that, you won't be able to apply >> these patches on top of Linus's tree. I have an alternate branch on top of >> v6.4-rc6 at [5]. > > Yep, that's all great, thanks. > > > Is there some clever trick we can do to prevent new open-coded derefs > of pte_t* from being introduced? I've been thinking about this myself but don't have a good answer at the moment. I used 2 techniques to find them all, but neither are fullproof so required hand editing: 1) global find/replace of "pte_t *" to "pte_handle_t ", and define pte_handle_t as void *. Then the compiler will throw an error for an attempted dereference. This obbiously only checks code that is actually being compiled, and it doesn't prevent anyone from declaring a pte_t* and directly dereferencing in future. 2) Using the coccinelle script. This worked pretty well, but finds a few places where the pte_t pointer is not pointing to a page table, but a variable on the stack, and there it is legitimate to dereference (we don't want to use ptep_get()/ptep_set_at() there). And also flags up lots of arch code that needs to be ignored. There are coccinelle scripts kept in scripts/coccinelle that apprarently get run semi-regularly [1]. So could submit something there, but would need to add support for excluding the arch directory and for marking known false-positives, neither of which I think are currently available. Does anyone know better? > > I suppose we could convert pte_t to a single-member struct to force a > compile error. That struct will get passed by value to ptep_get() so > that's OK. But this isn't viable unless/until all architectures are > converted :( It already is defined that way for arm64 at least: typedef struct { pteval_t pte; } pte_t; But I'm not sure how that helps? You can still legally deref it in C. > > Or we rely upon Ryan to grep the tree occasionally ;) This is my least favourite option ;-) But it wouldn't be the end of the world in the short term. Once I achieve the full objective of using arm64's contpte capability, open coded derefs will manifest as bugs (at least for cases where the accuracy of the access/dirty bits matter), which will certainly make sure they get fixed. Thanks, Ryan