From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03ED7C8302F for ; Tue, 1 Jul 2025 07:33:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68D0D6B009E; Tue, 1 Jul 2025 03:33:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 664BB6B00A0; Tue, 1 Jul 2025 03:33:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A1936B00A1; Tue, 1 Jul 2025 03:33:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4975F6B009E for ; Tue, 1 Jul 2025 03:33:43 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 71632123884 for ; Tue, 1 Jul 2025 07:33:42 +0000 (UTC) X-FDA: 83614881084.30.10CE1A6 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 1A71780008 for ; Tue, 1 Jul 2025 07:33:39 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf02.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751355220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2uF502t7bopi+ifJXTVFE3KSkeICu9xtOi/Hy3LSWC0=; b=cV1+u+iPFMrbHBuc/JhnKHxDoFda7a/zgJA7k9O5MW68V81w8Zns9AiBmIOl3oe8og1cLP yj4dbuIcHbtj2ZlHcZQkKy1HUsAGbTZCdsg5Nofh60OxiqSZipu6V57yajT5yO0/x1Fr2v /jImBuCu23PQawQZRrZrFV9qzueNx5Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751355220; a=rsa-sha256; cv=none; b=mwdYanIHoULfeJSoFa3NFK+KpQk6CcG7Ghq9RIFFe7QISmGnVvEpRGyE9CmrM8sBwzIA0f mC9nkAEG6fME6qMJ6sxRMY5NJZMY3VceK2UBQGhqrNqr2KX0v9GXhqHGkrpYpm+4C1llj6 rkYRLWWp6CWLtZEvNuYe/zc2NfQX7Wo= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf02.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 359941595; Tue, 1 Jul 2025 00:33:23 -0700 (PDT) Received: from [10.57.84.129] (unknown [10.57.84.129]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 606453F6A8; Tue, 1 Jul 2025 00:33:34 -0700 (PDT) Message-ID: <23a213b8-280a-4544-a210-7c18a0caf8a9@arm.com> Date: Tue, 1 Jul 2025 08:33:32 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 2/4] mm: Add batched versions of ptep_modify_prot_start/commit Content-Language: en-GB To: Dev Jain , Lorenzo Stoakes Cc: akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com References: <20250628113435.46678-1-dev.jain@arm.com> <20250628113435.46678-3-dev.jain@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1A71780008 X-Stat-Signature: wjgjmrce3yzenbhhribhxdmi54bq568d X-HE-Tag: 1751355219-480950 X-HE-Meta: U2FsdGVkX1/lx21kJpPif0SlHbVRi0xMxV0t4tKEqsS/IGvqwCyxWujLBRBKHdp1UJ62wlWDG2N8Omc5jH8SQil4Qe2rfjHctWJsSZRJjmr6EABNWL0Upu6WYQwJIepb4UIU4ozahWG/4JCGfN1nZX8kxKCwijIb1acxWshQy1nCu93fAaDltbiCAi7a+mCu8240xDzEN35K+78QB5xyvrwaFJ7LNsQeaDRH4qXMvRwo1dWV5JtV7JbwJqE54Mf6NDZtUS+QunNH7L+TxlE0ZKeSpcis43bVrU2dhRnrFuAmRyAVHMyTZUmtnxMEE9+9TyvrWaVt6twYmBWM1etZ1JqwfWqwvaQtCvLLBUX4AZ27SoWzkOBIh9Jb/w3q3vhueVFQ2rQGCxyxpe0ss6/mY265ysi3u+wpsvQhJpSLLXuOoFRbfG3Y/VAhWhjZWjokwFntPeSlLU20rr8sJU7r+idernJt9RNiccJZ62zpWIvzaeTXkbHlGfVNyzR6XGtukiIhYIenwfa/f9SnVGjdodqPTRhEGbhUGBSg5ayfHAM7+qVXV0CATwccBF01QaoiEjWX0p3BzpFK6rWN8EduiRxVIBnWGiujZEQXLhFYUl/XK6mWyoO3Y0RuV1eFEz8Ww0zrPSGfXU7xk8XDaRav7SJWYT7Q8VAiTUQPJMU4YD6eTb2aGCbqX1Xu1kHC55dMEi4yVxeWe0uhuKxOWs/e/Qh+sQZzXl7uy6tyeD4P351bPZ2HygoUYsoNYvCIiMh1GZGBTvqk5rqB2g8swVylgy38ZlSpc9tAVUaLhmnAxvUd1M4QHWLOW+dFD0FbUYVAHYhlXkYJU6NtgvsKv//0qQzuuwFJWn+Xq8r1nHbHZlXKLxsm7OLnNjFtZlnZ8C0rpwKHQRzEk8b3TOYjK2l7gSbSvE3bAa99NZaB9pVRDvAUi5i/7JMEv2a9/kEknRBkTAhTzxqIJtzMAW+VaBQ psIvqpat /pu+42A2eI0RZLu6ijWytiDU2yjKZ9rdXdtoNR24jU7CV96xHUe7BHGecvrN03mUkUxKR5XThpwzzZTrBAseGsCLumYhNE1vfAM38qXXbUJrtN61xQiBpkSwVgL+cNfWCV2WQ1Enbb3HYFboJA9Isw3j0zQxStg7OYIHjaK7CK7Epw9V6NIL9dNC8cMDksKNSQruP/V3IJKm9piW878x9A0bvGeZej1wwWXCJDd17WON8yISChGuZlIdYMlnv6BDa27Zmh3gGEsCKBzgwUGJm/hPRCO0IvB9vIrle/ueWQ13SaDG+xkd15854kA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 01/07/2025 05:44, Dev Jain wrote: > > On 30/06/25 6:27 pm, Lorenzo Stoakes wrote: >> On Sat, Jun 28, 2025 at 05:04:33PM +0530, Dev Jain wrote: >>> Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect. >>> Architecture can override these helpers; in case not, they are implemented >>> as a simple loop over the corresponding single pte helpers. >>> >>> Signed-off-by: Dev Jain >> Looks generally sensible! Some comments below. >> >>> --- >>>   include/linux/pgtable.h | 83 ++++++++++++++++++++++++++++++++++++++++- >>>   mm/mprotect.c           |  4 +- >>>   2 files changed, 84 insertions(+), 3 deletions(-) >>> >>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h >>> index cf1515c163e2..662f39e7475a 100644 >>> --- a/include/linux/pgtable.h >>> +++ b/include/linux/pgtable.h >>> @@ -1331,7 +1331,8 @@ static inline pte_t ptep_modify_prot_start(struct >>> vm_area_struct *vma, >>> >>>   /* >>>    * Commit an update to a pte, leaving any hardware-controlled bits in >>> - * the PTE unmodified. >>> + * the PTE unmodified. The pte may have been "upgraded" w.r.t a/d bits compared >>> + * to the old_pte, as in, it may have a/d bits on which were off in old_pte. >>>    */ >>>   static inline void ptep_modify_prot_commit(struct vm_area_struct *vma, >>>                          unsigned long addr, >>> @@ -1340,6 +1341,86 @@ static inline void ptep_modify_prot_commit(struct >>> vm_area_struct *vma, >>>       __ptep_modify_prot_commit(vma, addr, ptep, pte); >>>   } >>>   #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */ >>> + >>> +/** >>> + * modify_prot_start_ptes - Start a pte protection read-modify-write >>> transaction >>> + * over a batch of ptes, which protects against asynchronous hardware >>> + * modifications to the ptes. The intention is not to prevent the hardware from >>> + * making pte updates, but to prevent any updates it may make from being lost. >>> + * Please see the comment above ptep_modify_prot_start() for full description. >>> + * >>> + * @vma: The virtual memory area the pages are mapped into. >>> + * @addr: Address the first page is mapped at. >>> + * @ptep: Page table pointer for the first entry. >>> + * @nr: Number of entries. >>> + * >>> + * May be overridden by the architecture; otherwise, implemented as a simple >>> + * loop over ptep_modify_prot_start(), collecting the a/d bits from each pte >>> + * in the batch. >>> + * >>> + * Note that PTE bits in the PTE batch besides the PFN can differ. >>> + * >>> + * Context: The caller holds the page table lock.  The PTEs map consecutive >>> + * pages that belong to the same folio.  The PTEs are all in the same PMD. >>> + * Since the batch is determined from folio_pte_batch, the PTEs must differ >>> + * only in a/d bits (and the soft dirty bit; see fpb_t flags in >>> + * mprotect_folio_pte_batch()). >>> + */ >>> +#ifndef modify_prot_start_ptes >>> +static inline pte_t modify_prot_start_ptes(struct vm_area_struct *vma, >>> +        unsigned long addr, pte_t *ptep, unsigned int nr) >>> +{ >>> +    pte_t pte, tmp_pte; >>> + >>> +    pte = ptep_modify_prot_start(vma, addr, ptep); >>> +    while (--nr) { >>> +        ptep++; >>> +        addr += PAGE_SIZE; >>> +        tmp_pte = ptep_modify_prot_start(vma, addr, ptep); >>> +        if (pte_dirty(tmp_pte)) >>> +            pte = pte_mkdirty(pte); >>> +        if (pte_young(tmp_pte)) >>> +            pte = pte_mkyoung(pte); >>> +    } >>> +    return pte; >>> +} >>> +#endif >>> + >>> +/** >>> + * modify_prot_commit_ptes - Commit an update to a batch of ptes, leaving any >>> + * hardware-controlled bits in the PTE unmodified. >>> + * >>> + * @vma: The virtual memory area the pages are mapped into. >>> + * @addr: Address the first page is mapped at. >>> + * @ptep: Page table pointer for the first entry. >>> + * @old_pte: Old page table entry (for the first entry) which is now cleared. >>> + * @pte: New page table entry to be set. >>> + * @nr: Number of entries. >>> + * >>> + * May be overridden by the architecture; otherwise, implemented as a simple >>> + * loop over ptep_modify_prot_commit(). >>> + * >>> + * Context: The caller holds the page table lock. The PTEs are all in the same >>> + * PMD. On exit, the set ptes in the batch map the same folio. The pte may have >>> + * been "upgraded" w.r.t a/d bits compared to the old_pte, as in, it may have >>> + * a/d bits on which were off in old_pte. >>> + */ >>> +#ifndef modify_prot_commit_ptes >>> +static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, >>> unsigned long addr, >>> +        pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr) >>> +{ >>> +    int i; >>> + >>> +    for (i = 0; i < nr; ++i) { >>> +        ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); >>> +        ptep++; >> Weird place to put this increment, maybe just stick it in the for loop. >> >>> +        addr += PAGE_SIZE; >> Same comment here. > > Sure. > >> >>> +        old_pte = pte_next_pfn(old_pte); >> Could be: >> >>         old_pte = pte; >> >> No? > > We will need to update old_pte also since that > is used by powerpc in radix__ptep_modify_prot_commit(). I think perhaps Lorenzo has the model in his head where old_pte is the previous pte in the batch. That's not the case. old_pte is the value of the pte in the current position of the batch before any changes were made. pte is the new value for the pte. So we need to expliticly advance the PFN in both old_pte and pte each iteration round the loop. > >> >>> +        pte = pte_next_pfn(pte); >>> +    } >>> +} >>> +#endif >>> + >>>   #endif /* CONFIG_MMU */ >>> >>>   /* >>> diff --git a/mm/mprotect.c b/mm/mprotect.c >>> index af10a7fbe6b8..627b0d67cc4a 100644 >>> --- a/mm/mprotect.c >>> +++ b/mm/mprotect.c >>> @@ -206,7 +206,7 @@ static long change_pte_range(struct mmu_gather *tlb, >>>                       continue; >>>               } >>> >>> -            oldpte = ptep_modify_prot_start(vma, addr, pte); >>> +            oldpte = modify_prot_start_ptes(vma, addr, pte, nr_ptes); >>>               ptent = pte_modify(oldpte, newprot); >>> >>>               if (uffd_wp) >>> @@ -232,7 +232,7 @@ static long change_pte_range(struct mmu_gather *tlb, >>>                   can_change_pte_writable(vma, addr, ptent)) >>>                   ptent = pte_mkwrite(ptent, vma); >>> >>> -            ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); >>> +            modify_prot_commit_ptes(vma, addr, pte, oldpte, ptent, nr_ptes); >>>               if (pte_needs_flush(oldpte, ptent)) >>>                   tlb_flush_pte_range(tlb, addr, PAGE_SIZE); >>>               pages++; >>> -- >>> 2.30.2 >>>