From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90774C2D0CD for ; Wed, 21 May 2025 11:16:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FE9F6B00A9; Wed, 21 May 2025 07:16:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D6646B00AA; Wed, 21 May 2025 07:16:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EBE76B00AB; Wed, 21 May 2025 07:16:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0435F6B00A9 for ; Wed, 21 May 2025 07:16:10 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A32D61D2ED0 for ; Wed, 21 May 2025 11:16:10 +0000 (UTC) X-FDA: 83466660900.01.767761C Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf23.hostedemail.com (Postfix) with ESMTP id 9E7E614000A for ; Wed, 21 May 2025 11:16:08 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747826169; a=rsa-sha256; cv=none; b=E5cibVLtKblqyBSjuF6SjTibgJMY8Aa6DkXCoWboxnCVvOx6AyGGit47EwbBJWCoSXI/np IkTsoCrbzkJCF28vtllmVVZUyA5dAsTLj6pzb1AQdA2ZvRVd2rH7vohoR5jT+fowKLWZ/D 3ewETz0zgnQB3FyDlgVfZNd/2r0d9a8= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf23.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747826169; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ow1YsBKMJyJO4Jy9CGela0xfrQwwwKForTk6v9nzhdI=; b=vpbus7dr8qZGVGrI7FUWXx1f+RBRp00r1BFUJy9iwJKeKXFZKFsK85Yzxxybm/yDBHINjX vxRnqPaL0QCkpCBKOp3wTLCzN3X7jzYnNYuxAqQNckoG2dkWd5oz5AclLyJWZlEN+R0Mki T5S4DLL+Gc52ikqR6Q2H+QCZmLP1rzA= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EA2FC263D; Wed, 21 May 2025 04:15:53 -0700 (PDT) Received: from [10.57.94.227] (unknown [10.57.94.227]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1A21A3F6A8; Wed, 21 May 2025 04:16:03 -0700 (PDT) Message-ID: <59242559-5e90-4422-82f7-179a44eb968a@arm.com> Date: Wed, 21 May 2025 12:16:02 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/5] mm: Add batched versions of ptep_modify_prot_start/commit Content-Language: en-GB To: Dev Jain , akpm@linux-foundation.org Cc: david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com References: <20250519074824.42909-1-dev.jain@arm.com> <20250519074824.42909-3-dev.jain@arm.com> From: Ryan Roberts In-Reply-To: <20250519074824.42909-3-dev.jain@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9E7E614000A X-Stat-Signature: cegkmmf6a5titcp6qm9swoqtowftnqp3 X-HE-Tag: 1747826168-192956 X-HE-Meta: U2FsdGVkX19ivIBT6ObCt5ZeUtAZSjwtGxGLlCqXT7+YKQfL/kESrOxO9NX6gjZ0Gu+T79fcrabuz/mHiMUQgys65B0vxX2Nzgln7uly2e7qUAeVCwDSV4ijh0z8G3QfgxsygNtx+aRG/9dat6kPg/yHzzApXhNWc5lFOHjyi5IzLFQ3z0PmTynPmo5lNsuH9GBVcvMlDGbq9HZ3bmcoCB/jR+RsMDrKpknum5ryHF7KfOhDaELV9QDWsDHoV60wurNjwNLTdoNSkAPST8KMS4UZ/NaNdeuuGphp9+ajrqEabzEKyc9+Kc3j8Lkcr2h6bjhCyvCviuo3OHOPJ3e41bX3IvMpFxoDHfzkSqZ1HIfWb04b5TnmsbEVXdEf6h0AtrM136XiKTglTYfO4eQvmQhIz2jA4QAWpyw6GjYgXBcm96thaBUTDmMlvqI+WbiXtU9AgZ+eLKX2iuJfeo0I4MP56gJAjz1insPAyAvShyljSgtjnIIHrDRTNOuiUx5UXGRmYAWFCMegXNyiY8gWm2GxpGMCssYmYWPNbSXzfDteL0/TUB6BQGvgAS8UORyz0kEH3fLsC5AUPW1KAs4S41ZvqND5Y8GCJLD5E5YV5wSktEOP4N5LOAikQodhCDfNf6Cf/mfvKKzGwopeib+8iit/DDaWY3FeW4tXkln4kYLJa75Eg5BWCz3XqLbyrVeAzdH7P32laXrxANTxvsCF0bFY0luGZYvfMCy3DLOfOKJQCVrhjQf1fPLb3f3QElNS7deOKr3TOxhpvOW09HiZc3WrXcoUYr5H9BmHtH8kyMZT8sR+Nfi3NYW1ufubgXaYojE34Vs6Hhkakitmwi8JMdBioW3EQj1MNWpHuCFbGis/Bv/vXskjGWwaRaxmhtuEkxO2Uit5Oa5vTtHm16RnfVo1+AFcBXoEGW4H79PLX3gdlE4c4dxaz4QdLLgV1/ovjgiewPvSwyZiFWdOuYN SbjcaLLb lOmZjo1q3lYmX82+RnJyHFEX6mWPFFNkUKqi8ICl8/PnH2jBbUrPSIo7OOsXQhnOC192rUh6uucL1g3zSiTwF9yc7kForCHtDCr6MPtYloaXQJvh2wqqGSWrfexS1IuONNFQzsidSfQ7s9m9g3LgrPnuw0pEvxYYdt10F+WVfr3a3GOEA6t+kcxaYyQ9dg+Rl/l04bt2bFwAsR3ZkjPQOjXAy42fcqWHf25EFmjpAvtbQg7ek6KGBivJCMg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 19/05/2025 08:48, Dev Jain wrote: > Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect. > Architecture can override these helpers; in case not, they are implemented > as a simple loop over the corresponding single pte helpers. > > Signed-off-by: Dev Jain > --- > include/linux/pgtable.h | 75 +++++++++++++++++++++++++++++++++++++++++ > mm/mprotect.c | 4 +-- > 2 files changed, 77 insertions(+), 2 deletions(-) > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index b50447ef1c92..e40ed57e034d 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -1333,6 +1333,81 @@ static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, > __ptep_modify_prot_commit(vma, addr, ptep, pte); > } > #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */ > + > +/** > + * modify_prot_start_ptes - Start a pte protection read-modify-write transaction > + * over a batch of ptes, which protects against asynchronous hardware modifications nit: This overflows the 80 char soft limit. > + * to the ptes. The intention is not to prevent the hardware from making pte > + * updates, but to prevent any updates it may make from being lost. > + * Please see the comment above ptep_modify_prot_start() for full description. > + * > + * @vma: The virtual memory area the pages are mapped into. > + * @addr: Address the first page is mapped at. > + * @ptep: Page table pointer for the first entry. > + * @nr: Number of entries. > + * > + * May be overridden by the architecture; otherwise, implemented as a simple > + * loop over ptep_modify_prot_start(), collecting the a/d bits of the mapped > + * folio. nit: "mapped folio" is a bit confusing given we are operating on ptes. Perhaps "collecting the a/d bits from each pte in the batch" is clearer. > + * > + * Note that PTE bits in the PTE range besides the PFN can differ. nit: Perhaps "batch" would be more consistent than "range"? > + * > + * Context: The caller holds the page table lock. The PTEs map consecutive > + * pages that belong to the same folio. The PTEs are all in the same PMD. > + */ > +#ifndef modify_prot_start_ptes > +static inline pte_t modify_prot_start_ptes(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep, unsigned int nr) I thought David H suggested modify_prot_ptes_start() and modify_prot_ptes_commit(), which we settled on? I'm personally fine with either though. > +{ > + pte_t pte, tmp_pte; > + > + pte = ptep_modify_prot_start(vma, addr, ptep); > + while (--nr) { I thought we agreed to make the loop logic a bit more standard. I don't recall exactly what was finally agreed, but I would think something like this would be better: for (i = 1; i < nr; i++) { > + ptep++; > + addr += PAGE_SIZE; > + tmp_pte = ptep_modify_prot_start(vma, addr, ptep); > + if (pte_dirty(tmp_pte)) > + pte = pte_mkdirty(pte); > + if (pte_young(tmp_pte)) > + pte = pte_mkyoung(pte); > + } > + return pte; > +} > +#endif > + > +/** > + * modify_prot_commit_ptes - Commit an update to a batch of ptes, leaving any > + * hardware-controlled bits in the PTE unmodified. > + * > + * @vma: The virtual memory area the pages are mapped into. > + * @addr: Address the first page is mapped at. > + * @ptep: Page table pointer for the first entry. You've missed pte and old_pte params here. > + * @nr: Number of entries. > + * > + * May be overridden by the architecture; otherwise, implemented as a simple > + * loop over ptep_modify_prot_commit(). > + * > + * Note that PTE bits in the PTE range besides the PFN can differ. How can it? All the applied bits other than the PFN will be exactly the same for the range because they all come from pte. I think this line can be dropped. > + * > + * Context: The caller holds the page table lock. The PTEs map consecutive > + * pages that belong to the same folio. The PTEs are all in the same PMD. The middle sentance doesn't apply; the PTEs will all initially be none if using the default version of modify_prot_start_ptes(). I think that can be dropped. But I think you need to explain that this will be the case on exit. > + */ > +#ifndef modify_prot_commit_ptes > +static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long addr, > + pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr) > +{ > + int i; > + > + for (i = 0; i < nr; ++i) { > + ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); > + ptep++; > + addr += PAGE_SIZE; > + old_pte = pte_next_pfn(old_pte); > + pte = pte_next_pfn(pte); > + } > +} > +#endif I have some general concerns about the correctness of batching these functions. The support was originally added by Commit 1ea0704e0da6 ("mm: add a ptep_modify_prot transaction abstraction"), and the intent was to make it easier to defer the pte updates for XEN on x86. Your default implementations of the batched versions will match the number of ptep_modify_prot_start() calls with the same number of ptep_modify_prot_commit() calls, even if modify_prot_commit_ptes() is called incrementally for sub-batches of the batch used for modify_prot_start_ptes(). That's a requirement and you've met it. But in the batched case, there are 2 differences; - You can now have multiple PTEs within a start-commit block at one time. I hope none of the specialized implementations care about that (i.e. XEN). - when calling ptep_modify_prot_commit(), old_pte may not be exactly what ptep_modify_prot_start() returned for that pte. You have collected the A/D bits, and according to your docs "PTE bits in the PTE range besides the PFN can differ" when calling modify_prot_start_ptes() so R/W and other things could differ here. I'm not sure if these are problems in practice; they probably are not. But have you checked the XEN implementation (and any other specialized implementations) are definitely compatible with your batched semantics? Thanks, Ryan > + > #endif /* CONFIG_MMU */ > > /* > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 1ee160ed0b14..124612ce3d24 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -188,7 +188,7 @@ static long change_pte_range(struct mmu_gather *tlb, > jiffies_to_msecs(jiffies)); > } > > - oldpte = ptep_modify_prot_start(vma, addr, pte); > + oldpte = modify_prot_start_ptes(vma, addr, pte, nr_ptes); > ptent = pte_modify(oldpte, newprot); > > if (uffd_wp) > @@ -214,7 +214,7 @@ static long change_pte_range(struct mmu_gather *tlb, > can_change_pte_writable(vma, addr, ptent)) > ptent = pte_mkwrite(ptent, vma); > > - ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); > + modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr_ptes); > if (pte_needs_flush(oldpte, ptent)) > tlb_flush_pte_range(tlb, addr, PAGE_SIZE); > pages++;