From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2014C5B543 for ; Fri, 30 May 2025 12:53:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3646D6B010B; Fri, 30 May 2025 08:53:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 314736B0119; Fri, 30 May 2025 08:53:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 203886B011A; Fri, 30 May 2025 08:53:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id F0F846B010B for ; Fri, 30 May 2025 08:53:37 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5FD15160758 for ; Fri, 30 May 2025 12:53:37 +0000 (UTC) X-FDA: 83499565674.18.C519DFE Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id 5DD471A0004 for ; Fri, 30 May 2025 12:53:35 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748609615; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zST7fOyfUXzujgLWo7a+xP09hr2f/bg8V9cDDQMRXCU=; b=O4Gdz4xHyGGcJ88iy3MpVs0rIGW1GgWxA9C3HJES/pyVjVW7Z8dGo65GIVp5MkJPDkt/7m z2pBN6FpCHqaaS6Zzoc38V0PEWDoDQAISHWFIOsyRgKx0qt3U2pbAQQY3xGciNzQMrJjaR 7rc27OBQNs0qIEeS9GZCgIZeHECEKI0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748609615; a=rsa-sha256; cv=none; b=5iHXpp602UleB+IugXBQxO4iklRxSJrE+PIzyPbJjGOlC5hsHwz3kSqNSoSvxf/I0C/lcS 6Vqv8MUEn8qJY4+sp7je8tl4XjD6ezgBgs6wtw0I2NzXInsViuH639Iy9b5oMxhpyAnSgv fBy5m9thu8a1HbJo23tTyJUdyQXCYNw= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 115CB16F2; Fri, 30 May 2025 05:53:18 -0700 (PDT) Received: from [10.57.95.14] (unknown [10.57.95.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E28393F5A1; Fri, 30 May 2025 05:53:31 -0700 (PDT) Message-ID: Date: Fri, 30 May 2025 13:53:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/3] arm64: pageattr: Use walk_page_range_novma() to change memory permissions Content-Language: en-GB To: Dev Jain , akpm@linux-foundation.org, david@redhat.com, catalin.marinas@arm.com, will@kernel.org Cc: lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, suzuki.poulose@arm.com, steven.price@arm.com, gshan@redhat.com, linux-arm-kernel@lists.infradead.org References: <20250530090407.19237-1-dev.jain@arm.com> <20250530090407.19237-3-dev.jain@arm.com> From: Ryan Roberts In-Reply-To: <20250530090407.19237-3-dev.jain@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 5DD471A0004 X-Stat-Signature: jrkqe88zgmd55ocx47d3z9xddaj5q3zg X-Rspam-User: X-HE-Tag: 1748609615-437822 X-HE-Meta: U2FsdGVkX1/z7tu1L8Dp3iYqS+uhag5ZaTmsuYCCcbqcxNLWknP7ia66Rv4HPZyzA8X5YVk+WJhfigsz9xvGtg53gkBWwqYwgfVMCvRLPiuphDNr1VbHSfVZbhCrF5Rzd+J5oXA48nh9s/mqRVZVrT1kH4zmx2z52x4eqo18GXTnnA3nhUJ3lnckbHCfiFZ60zCjdGFX3R18z1Fbr6L5W1ysIllJCS6kF3MMdn+maU6n0I/+5I+pGVFEwG2TNsNfXP5jIWehbwQc258LpfrdihQuny6FM663BHqr2NtubA6sYc+2LvcYrRHPmCRb34oOVb1lr8Hf4bbzHFteuuKqw1oHuJLEnSfh7L6xUaw+iCT1DfW3HkuhXxxlqyVwlAFybr4ocfsG9hcemCy/ggGcMZX5nB/PbqnaSuCDh3kCFjeqCavYwLmP+BI8oZNWtf1ECAaxIbHJDHZEx3BhN1R9kZeZVXrGTlGxEwuqvdP2U+vvUPofh14FhsPq4F5PiK/cQCDbStDXkDKVIe+/wEIEJMblj0zMaZeNJenzikoCRNTjJWm+klLPKp4fPdlMPbhiPbQ10TKblWS6HCAyjxelWQPthuEjuwUZ67IUerP+vSz5jsYQSBNMtH3MSEOm3v7smA7TpsOjq1p66XGubGXYmvHq/ZANJneHgadnohE5seH7YwHnz9ibALLPjWG0c9rKGMRphdUEw5qtOU1l/MUX4QbI8ILOtAoaudUdwz8FrvR2Qi9A3UpGu7zhwyU4Plhw3EWl58u/CLPX714EokY1fp0L2fyQVHxSaYE9VJhqpt1KSfXBqp+j1UqSY/IybH2GF4Uqd+7IFNlX3fso800XATK7HZy+RnxPddEgj+UCAZ8BEwGEMocN8ZbSAVeS9lc9uOfv4Qi2/Zn0pQd3+n0ZiR60qukLGfk5fTSgAJj1FkQ5foSmguTFQzHfyLeixsCqGiqOQ6YwiHIFMemyrjx lUySTHVl 9RNkHEdgEk520FZVxoHyzccWnB/l11yeNm5PNvEmeg5+p7JyVl2fe5oxs2EBpxjzoQ0K06crJBCNm8pxIaMo179CErynGPQMeFn+WiMia7M1aNltpJu7CMBt+ZEu1NwurOxw9GZES3bleGJY6LtqKJK/z6hZ33ka2ZztWNsBH5dAlKcHT73Ltc/QUrpxbjeSYtArjG9LPGMGpSK/bO8N4EQ+v5Yl8BHy4mckdVv4f1KPSjF9TmBz16KOB4g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 30/05/2025 10:04, Dev Jain wrote: > Move away from apply_to_page_range(), which does not honour leaf mappings, > to walk_page_range_novma(). The callbacks emit a warning and return EINVAL > if a partial range is detected. Perhaps: """ apply_to_page_range(), which was previously used to change permissions for kernel mapped memory, can only operate on page mappings. In future we want to support block mappings for more efficient TLB usage. Reimplement pageattr.c to instead use walk_page_range_novma() to visit and modify leaf mappings of all sizes. We only require that the start and end of a given range lie on leaf mapping boundaries. If this is not the case, emit a warning and return -EINVAL. """ > > Signed-off-by: Dev Jain > --- > arch/arm64/mm/pageattr.c | 69 +++++++++++++++++++++++++++++++++++++--- > 1 file changed, 64 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c > index 39fd1f7ff02a..a5c829c64969 100644 > --- a/arch/arm64/mm/pageattr.c > +++ b/arch/arm64/mm/pageattr.c > @@ -8,6 +8,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -20,6 +21,67 @@ struct page_change_data { > pgprot_t clear_mask; > }; > > +static pteval_t set_pageattr_masks(unsigned long val, struct mm_walk *walk) Please don't use unsigned long for raw ptes; This will break with D128 pgtables. Anshuman had a patch in flight to introduce ptdesc_t as a generic/any level raw value. It would be preferable to incorporate that patch and use it. pteval_t isn't really correct because this is for any level and that implies pte level only. > +{ > + struct page_change_data *masks = walk->private; > + unsigned long new_val = val; why do you need new_val? Why not just update and return val? > + > + new_val &= ~(pgprot_val(masks->clear_mask)); > + new_val |= (pgprot_val(masks->set_mask)); > + > + return new_val; > +} One potential pitfall of having a generic function that can change the permissions for ptes at all levels is that bit 1 is defined differently for level 3 vs the other levels. I don't think there should be any issues here in practice having had a quick look at all the masks that users currently pass in though. > + > +static int pageattr_pud_entry(pud_t *pud, unsigned long addr, > + unsigned long next, struct mm_walk *walk) > +{ > + pud_t val = pudp_get(pud); > + > + if (pud_leaf(val)) { > + if (WARN_ON_ONCE((next - addr) != PUD_SIZE)) > + return -EINVAL; > + val = __pud(set_pageattr_masks(pud_val(val), walk)); > + set_pud(pud, val); > + walk->action = ACTION_CONTINUE; > + } > + > + return 0; > +} > + > +static int pageattr_pmd_entry(pmd_t *pmd, unsigned long addr, > + unsigned long next, struct mm_walk *walk) > +{ > + pmd_t val = pmdp_get(pmd); > + > + if (pmd_leaf(val)) { > + if (WARN_ON_ONCE((next - addr) != PMD_SIZE)) > + return -EINVAL; > + val = __pmd(set_pageattr_masks(pmd_val(val), walk)); > + set_pmd(pmd, val); > + walk->action = ACTION_CONTINUE; > + } > + > + return 0; > +} > + > +static int pageattr_pte_entry(pte_t *pte, unsigned long addr, > + unsigned long next, struct mm_walk *walk) > +{ > + pte_t val = ptep_get(pte); BUG: Use __ptep_get(), which is "below" the contpte management layer. ptep_get() will look at the contiguous bit and potentially decide to accumulate all the a/d bits in the block which is not relavent for kernel mappings. > + > + val = __pte(set_pageattr_masks(pte_val(val), walk)); > + set_pte(pte, val); BUG: Use __set_pte(). Same reasoning as above. But this is more harmful because set_pte() will try to detect contpte blocks and may zero/flush the entries. Which would be very bad for kernel mappings. > + > + return 0; > +} > + > +static const struct mm_walk_ops pageattr_ops = { > + .pud_entry = pageattr_pud_entry, > + .pmd_entry = pageattr_pmd_entry, > + .pte_entry = pageattr_pte_entry, Is there a reason why you don't have pgd and p4d entries? I think there are configs where the pgd may contain leaf mappings. Possibly 64K/42-bit, which will have 2 levels and I think they will be pgd and pte. So I think you'd better implement all levels to be correct. > + .walk_lock = PGWALK_NOLOCK, > +}; > + > bool rodata_full __ro_after_init = IS_ENABLED(CONFIG_RODATA_FULL_DEFAULT_ENABLED); > > bool can_set_direct_map(void) > @@ -49,9 +111,6 @@ static int change_page_range(pte_t *ptep, unsigned long addr, void *data) > return 0; > } > > -/* > - * This function assumes that the range is mapped with PAGE_SIZE pages. > - */ > static int __change_memory_common(unsigned long start, unsigned long size, > pgprot_t set_mask, pgprot_t clear_mask) > { > @@ -61,8 +120,8 @@ static int __change_memory_common(unsigned long start, unsigned long size, > data.set_mask = set_mask; > data.clear_mask = clear_mask; > > - ret = apply_to_page_range(&init_mm, start, size, change_page_range, > - &data); > + ret = walk_page_range_novma(&init_mm, start, start + size, > + &pageattr_ops, NULL, &data); > > /* > * If the memory is being made valid without changing any other bits I notice that set_direct_map_invalid_noflush() and set_direct_map_default_noflush() don't use __change_memory_common() but instead call apply_to_page_range() direct. (presumably because they don't want the associated tlb flush). Is there a reason not to update these callers too? Perhaps it would be cleaner to wrap in ___change_memory_common (3 leading underscores) which does everything except the flush). Thanks, Ryan