From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D18FC369D9 for ; Wed, 30 Apr 2025 06:25:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 45BDF6B00C0; Wed, 30 Apr 2025 02:25:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 407186B00DB; Wed, 30 Apr 2025 02:25:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A7556B00DC; Wed, 30 Apr 2025 02:25:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0406E6B00C0 for ; Wed, 30 Apr 2025 02:25:27 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 533121605D5 for ; Wed, 30 Apr 2025 06:25:28 +0000 (UTC) X-FDA: 83389723536.07.97A7012 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf17.hostedemail.com (Postfix) with ESMTP id 7DE2840004 for ; Wed, 30 Apr 2025 06:25:26 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745994326; a=rsa-sha256; cv=none; b=0lePghwT8/ibytya/3xF6U+L2ErcqhouQP5qiESO4zN7cADXT9HP83aJq56DI4cfTmfQNb XyLo0/P8bJZCnZYsjHqDJI+jzF5MvMAjfUyozVpEC4FAOlqJMznI8JEUO1Uhmy3gvcEfmg aScmCVJKQ8A3pmiO13q5CbsMzng7f88= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745994326; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rHLDjy6xZh51j4ykpPn2hi6fp3uAit0XD1dneySpwj8=; b=RLNYvvhNzPZXcmoQVtUIaWjIWqfc0abAFTEtZiAwWdtEnHasm5eXwCc5POeQdiRA6Z6X7e CcypmKrcbj/lw/v7Xzu9Z+0j3YsysuImuuqRqyMBAGaRUUNKvIwwOoNFCMu3nkyyxPMX/G rX4sPNd4X/ra3urZl3kMT5Lb5dBMq70= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8B8C7106F; Tue, 29 Apr 2025 23:25:18 -0700 (PDT) Received: from [10.163.79.251] (unknown [10.163.79.251]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3FA0F3F66E; Tue, 29 Apr 2025 23:25:15 -0700 (PDT) Message-ID: Date: Wed, 30 Apr 2025 11:55:12 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 3/7] mm: Add batched versions of ptep_modify_prot_start/commit To: Lorenzo Stoakes Cc: akpm@linux-foundation.org, ryan.roberts@arm.com, david@redhat.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, Liam.Howlett@oracle.com, vbabka@suse.cz, jannh@google.com, anshuman.khandual@arm.com, peterx@redhat.com, joey.gouly@arm.com, ioworker0@gmail.com, baohua@kernel.org, kevin.brodsky@arm.com, quic_zhenhuah@quicinc.com, christophe.leroy@csgroup.eu, yangyicong@hisilicon.com, linux-arm-kernel@lists.infradead.org, namit@vmware.com, hughd@google.com, yang@os.amperecomputing.com, ziy@nvidia.com References: <20250429052336.18912-1-dev.jain@arm.com> <20250429052336.18912-4-dev.jain@arm.com> <8780e63d-22c1-4133-a800-dec50fd1b5fa@lucifer.local> Content-Language: en-US From: Dev Jain In-Reply-To: <8780e63d-22c1-4133-a800-dec50fd1b5fa@lucifer.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 7DE2840004 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: xrybtfcr3mprqzs9dy7qgq9wphn463ws X-HE-Tag: 1745994326-386578 X-HE-Meta: U2FsdGVkX1+2TUnpVxC7pIfpHyv0rmB8CNao/MF9r9gaYLbLBenO9Un5hEDLI+fwESO3ncJRv1p8xH56z3Gkkuuiz7Xa16jZfWpkzteYOYHfP94dRm4B3lrQLng7hRH1NzdFj4ECujAKDZ+wyeeorEzqcDRKM8qSPVqUm/lQbPIG7v0BdQh7yCSlaFHNHmfxM/ZynsOyDcqhGnEt6FL7oPMY4Ar6C0ko8bMAJ2KCuNFAY5bdi12ynDJkSeXofK2qLz2SxeXwmzOttCWLAEBS2VbA8hVIguo0xW6dNqMeY1NjoxRfxtlIn9qUNyGHdN7yWkO6XaIYl1bJJ6ZsoCi0LjRVJXS4DXHCr2ml6KSvexspkqzzRsyHQi0Dl8H0vkuNp1veejGEOciKyvMObL28sbhhWgKJ6j8MHKT4+G7L2Jh7xGzBSzj1mk5FVFR6msALS18144oFOputLryBIkKD8oeksHFXNpS29BHZcQ/mPW6JhKfyFx/S7wclbXhQMPpzu06CrVbNXEYO3E8sPeGXVtG+ZgS3Z725xpG/8nbIpqHkXRGf+W+3GDnLPEoIpV0urxTQu+2+ZloDbJAW+/JQlUnThdjOFN2A/YZuv0V0VRZPMyEqh/X6suplqzoNNJQr2Wy3sxlnJA5VaB+HbdQGY0t0up/YnWN1ywIam88slbyW38wNyzuMLl1Zj6TOn4w9CsT3wH5MDwEmGWlDCEdUkwIrzR+c2M6wCfNnZ9ofWYE6frdJ+hRNDSTu7KJ/YyUmfmNB3EX8t1aqHZi774TUAiHMOaYxPoTEp5TRhCbBO7kET+bo8HTU1MwyiNT8hJQlhU5zkhfwAlta3FpDuTWUIo+hUFXW8FVlNuB43MK6Ho7mxaWtz/HVFKNv0D88SKfR6m8CDmxXb3zhs4zrz+/ME6CDWTshersTlGnvY2jfZuc6M0hnpzjohVVC6sSaXiktcyXy9Cf52mLlNXXgm2H 8cahXTJb abR22tnjt1YCEVH31Mwa21Oxza9xrVusRNYHcq8L4iQf2UKuqQw6B316IEBb5hreOfSxeJphlIddBoeq0aku3qUsbvF8X53ckXaiW5vSAUEqUpOhMHNyb4Q7ogjkQJaRZcjuBCZ8V+kP/2jPi+FN9IHp4A3oJh+1UTGijHNK/tjUwIxmclt0nNnh0GmeWLPRtSoqGTRZdiEz0K7tAerKmoxIt7qId3btMKZn/mpwCG6rT4ybM8nUq39LosA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 29/04/25 7:22 pm, Lorenzo Stoakes wrote: > On Tue, Apr 29, 2025 at 10:53:32AM +0530, Dev Jain wrote: >> Batch ptep_modify_prot_start/commit in preparation for optimizing mprotect. >> Architecture can override these helpers. >> >> Signed-off-by: Dev Jain >> --- >> include/linux/pgtable.h | 38 ++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 38 insertions(+) >> >> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h >> index b50447ef1c92..ed287289335f 100644 >> --- a/include/linux/pgtable.h >> +++ b/include/linux/pgtable.h >> @@ -891,6 +891,44 @@ static inline void wrprotect_ptes(struct mm_struct *mm, unsigned long addr, >> } >> #endif >> >> +/* See the comment for ptep_modify_prot_start */ > > I feel like you really should add a little more here, perhaps point out > that it's batched etc. Sure. I couldn't easily figure out a way to write the documentation nicely, I'll do it this time. > >> +#ifndef modify_prot_start_ptes >> +static inline pte_t modify_prot_start_ptes(struct vm_area_struct *vma, >> + unsigned long addr, pte_t *ptep, unsigned int nr) > > This name is a bit confusing, it's not any ptes, it's those pte entries > belonging to a large folio capped to the PTE table right that you are > batching right? yes, but I am just following the convention. See wrprotect_ptes(), etc. I don't have a strong preference anyways. > > Perhaps modify_prot_start_large_folio() ? Or something with 'batched' in > the name? How about modify_prot_start_batched_ptes()? > > We definitely need to mention in comment or name or _somewhere_ the intent > and motivation for this. > >> +{ >> + pte_t pte, tmp_pte; >> + > > are we not validating what 'nr' is? Even with debug asserts? I'm not sure I > love this interface, where you require the user to know the number of > remaining PTE entries in a PTE table. Shall I write in the comments that the range is supposed to be within a PTE table? > >> + pte = ptep_modify_prot_start(vma, addr, ptep); >> + while (--nr) { > > This loop is a bit horrible. It seems needlessly confusing and you're in > _dire_ need of comments to explain what's going on. Again, following the pattern of get_and_clear_full_ptes :) > > So my understanding is, you have the user figure out: > > nr = min(nr_pte_entries_in_pte, nr_pgs_in_folio) > > Then, you want to return the pte entry belonging to the start of the large > folio batch, but you want to adjust that pte value to propagate dirty and > young page table flags if any page table entries within the range contain > those page table flags, having called ptep_modify_prot_start() on all of > them? > > This is quite a bit to a. put in a header like this and b. not > comment/explain. > > So maybe something like: > > pte = ptep_modify_prot_start(vma, addr, ptep); > > /* Iterate through large folio tail PTEs. */ > for (pg = 1; pg < nr; pg++) { > pte_t inner_pte; > > ptep++; > addr += PAGE_SIZE; > > inner_pte = ptep_modify_prot_start(vma, addr, ptep); > > /* We must propagate A/D state from tail PTEs. */ > if (pte_dirty(inner_pte)) > pte = pte_mkdirty(pte); > if (pte_young(inner_pte)) > pte = pte_mkyoung(pte); > } > > Would work better? No preference, I'll do this then. > > > >> + ptep++; >> + addr += PAGE_SIZE; >> + tmp_pte = ptep_modify_prot_start(vma, addr, ptep); > > > >> + if (pte_dirty(tmp_pte)) >> + pte = pte_mkdirty(pte); >> + if (pte_young(tmp_pte)) >> + pte = pte_mkyoung(pte); > > Why are you propagating these? Because the a/d bits are per-folio; and, this will help us batch around can_change_pte_writable (return pte_dirty(pte)) and, batch around pte_needs_flush() for parisc. > >> + } >> + return pte; >> +} >> +#endif >> + >> +/* See the comment for ptep_modify_prot_commit */ > > Same comments as above, needs more meat on the bones! > >> +#ifndef modify_prot_commit_ptes >> +static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned long addr, > > Again need to reference large folio, batched or something relevant here, > 'ptes' is super vague. > >> + pte_t *ptep, pte_t old_pte, pte_t pte, unsigned int nr) > > Nit, but you put 'p' suffix on ptep but not on 'old_pte'? Because ptep is a pointer, and old_pte isn't. > > I'm even more concerned about the 'nr' API here now. > > So this is now a user-calculated: > > min3(large_folio_pages, number of pte entries left in ptep, > number of pte entries left in old_pte) > > It really feels like something that should be calculated here, or at least > be broken out more clearly. > > You definitely _at the very least_ need to document it in a comment. > >> +{ >> + for (;;) { >> + ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); >> + if (--nr == 0) >> + break; > > Why are you doing an infinite loop here with a break like this? Again feels > needlessly confusing. Following wrprotect_ptes(). I agree that this is confusing, which is why I thought why it was done in the first place :) but I just followed what already is. I'll change this to a simple for loop if that is your inclination. > > I think it's ok to duplicate this single line for the sake of clarity, > also. > > Which gives us: > > unsigned long pg; > > ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); > for (pg = 1; pg < nr; pg++) { > ptep++; > addr += PAGE_SIZE; > old_pte = pte_next_pfn(old_pte); > pte = pte_next_pfn(pte); > > ptep_modify_prot_commit(vma, addr, ptep, old_pte, pte); > } > > There are alternative approaches, but I think doing an infinite loop that > breaks and especially the confusing 'if (--foo) break;' stuff is much > harder to parse than a super simple ranged loop. > >> + ptep++; >> + addr += PAGE_SIZE; >> + old_pte = pte_next_pfn(old_pte); >> + pte = pte_next_pfn(pte); >> + } >> +} >> +#endif >> + >> /* >> * On some architectures hardware does not set page access bit when accessing >> * memory page, it is responsibility of software setting this bit. It brings >> -- >> 2.30.2 >>