From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1882C0219E for ; Mon, 10 Feb 2025 11:05:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2089A6B007B; Mon, 10 Feb 2025 06:05:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1B8DF6B0083; Mon, 10 Feb 2025 06:05:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0804C6B0085; Mon, 10 Feb 2025 06:05:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E1A4F6B007B for ; Mon, 10 Feb 2025 06:05:03 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 3CF63806BD for ; Mon, 10 Feb 2025 11:05:03 +0000 (UTC) X-FDA: 83103752886.02.8A3254D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf05.hostedemail.com (Postfix) with ESMTP id 174F8100013 for ; Mon, 10 Feb 2025 11:05:00 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739185501; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aeLvAmxrg/5Ejj7NTrLJNOs2CmzgO1iyfD5lJ+GaPas=; b=KqABZ+Ieyx8YDkqAI6t/uOxxaCPwkSOmBSG49idejJlZnaZVnR3wO26qC5EcVWhNbdFpJg gILGgKCtAtm9CMxES8VAWKjx0bNtRaNNKHDo9y3+UwmwI8VU/nkR3aATsKR5pEzpHYjXCT KWham5e9PgYFYoXAAXTbEEGe0FBXrk4= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; spf=pass (imf05.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739185501; a=rsa-sha256; cv=none; b=lPS5oijnmazqClAtX1jxDJRR+I82wGBVJidpiyEVUX528PED8PyeayUQ62g+uNcyLZWTGw fy1nzg8JCseAL4pJeBqdwBA4XN/qQcuMVnQSAq5UGB0zKYqBTqtlNGK4o7/epDHh+q3Bf4 jyk39t6fh1EC3ouOMXWv0DcJHJ2bLRQ= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DC26F1BC0; Mon, 10 Feb 2025 03:05:21 -0800 (PST) Received: from [10.57.81.206] (unknown [10.57.81.206]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7A7193F5A1; Mon, 10 Feb 2025 03:04:57 -0800 (PST) Message-ID: Date: Mon, 10 Feb 2025 11:04:55 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 15/16] mm: Generalize arch_sync_kernel_mappings() To: Anshuman Khandual , Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250205151003.88959-1-ryan.roberts@arm.com> <20250205151003.88959-16-ryan.roberts@arm.com> <43c256b6-7e44-41bb-b0c5-866fd4faab5c@arm.com> Content-Language: en-GB From: Ryan Roberts In-Reply-To: <43c256b6-7e44-41bb-b0c5-866fd4faab5c@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 174F8100013 X-Stat-Signature: ywfhgwg48ec5iybpz6cdoymg5oat7x9g X-HE-Tag: 1739185500-458517 X-HE-Meta: U2FsdGVkX185l/X3tvTi6P8ex02WN1fq2mmqxrvGsVUGFt3ZRRUfpm4j1cEcsNlbrBjqlzPZCWhdyLGgRtbo8zAjHTLSnLf3XHF3hX2vwXXFMO6C/nPp7TxqIxSI3tLq4pTqxQBtXH6sFstUWi/7kwSzDowh5WenVXBhTzeTTM3Y42XE0xTmvZCSaiVqxpq2VHsf56BAwsnZa4wWkrWTBdKLNsV1D094PWDC4008rYsEP/CYF+552NHRfyQoJvqmpNA3jgzQnbpKqfyENnqP2l84+NWCKIX7Whw3WrNC1Gchln/B9wbP7+n9S5gE384FuqVfr0jZA6iB6qOM5IrW3vvbXkrMue2gXUGIq+jNR6EnHzxEAv7xaC17xwnbmmakoN2xD/8bP21G40geHjv6BWn2SuDakFc0UC9HdTRPOzJjlBZNxWot254Q/uqmR1YwJd2wRVC6lmPDfQU+TJogwWZd7vfjq3ZSmWlsfVLiZxQEjYPCGUq6ub+ZN13nmCwcP/vOcB4XM0qbtJR7lA9w77DrTev5R4WRSo1MMPyO88DFijVCsbjezSXJ2CN0IPhU/r15lGyqxzwKX5Ct2ztClyg6e4XRbn/P+hGyLY/XuVuxbtEi5cG191CwwnaCXZSmF/4jAkIdbG72R5Ub5zI4ClqLgCMcuRh7tDRbOM9h0d+SWgDCGR7IRcMKpA2nMmNX8VQcs+z/UT0CnZ01Oh5+3II2MyguiXR0TzduOteyAuLUInfl8GjeXLZsX46JfdJTKFQlFyqz9jy37pbtXSRHNtJi/uVNPLsJHON6f0wtkt5a1UC/KvXt3Q9VhU+tkff/LAjSHX6pWN+g6PVvGKYOIHA4DoCQcjsf15yIKtLZfvtq8fpvS0GUn2krIE40HQzgsZZ1PxOhKPrYH0ulf4sKqfGhhC6V1PafGRCmXvgpzY6QDCJERf8JD+B/5UQqhX4N+zeXtuAgUxzXrOrT7hX rzz2iwgr Y3zBGTJ9BqjnNpby0EyPCwJyMT1+zHRnbdaiI4XpqpnT1IQsWFzpMh6upM56e9WpbBS3ddXeN1rIhb5rWAbpkgTfCp4yU7/V+SrEXv/oeN6ybgK8fi5vcrCxvYQFgKXE/7Uwe2Z+lPwrBQgr8TNpQji57WI+RbzgNCgKh4VUxTEVjKd3vFTeOdg5tVbuRc6J3U5Q2zs5VP6HVn2Gsb4/MAZvOnnPBORONjX+59U2kgyyOrJgU7W7xP/U2fbovceXYYcMK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/02/2025 07:45, Anshuman Khandual wrote: > > > On 2/5/25 20:39, Ryan Roberts wrote: >> arch_sync_kernel_mappings() is an optional hook for arches to allow them >> to synchonize certain levels of the kernel pgtables after modification. >> But arm64 could benefit from a hook similar to this, paired with a call >> prior to starting the batch of modifications. >> >> So let's introduce arch_update_kernel_mappings_begin() and >> arch_update_kernel_mappings_end(). Both have a default implementation >> which can be overridden by the arch code. The default for the former is >> a nop, and the default for the latter is to call >> arch_sync_kernel_mappings(), so the latter replaces previous >> arch_sync_kernel_mappings() callsites. So by default, the resulting >> behaviour is unchanged. >> >> To avoid include hell, the pgtbl_mod_mask type and it's associated >> macros are moved to their own header. >> >> In a future patch, arm64 will opt-in to overriding both functions. >> >> Signed-off-by: Ryan Roberts >> --- >> include/linux/pgtable.h | 24 +---------------- >> include/linux/pgtable_modmask.h | 32 ++++++++++++++++++++++ >> include/linux/vmalloc.h | 47 +++++++++++++++++++++++++++++++++ >> mm/memory.c | 5 ++-- >> mm/vmalloc.c | 15 ++++++----- >> 5 files changed, 92 insertions(+), 31 deletions(-) >> create mode 100644 include/linux/pgtable_modmask.h >> >> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h >> index 94d267d02372..7f70786a73b3 100644 >> --- a/include/linux/pgtable.h >> +++ b/include/linux/pgtable.h >> @@ -4,6 +4,7 @@ >> >> #include >> #include >> +#include >> >> #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) >> #define PUD_ORDER (PUD_SHIFT - PAGE_SHIFT) >> @@ -1786,29 +1787,6 @@ static inline bool arch_has_pfn_modify_check(void) >> # define PAGE_KERNEL_EXEC PAGE_KERNEL >> #endif >> >> -/* >> - * Page Table Modification bits for pgtbl_mod_mask. >> - * >> - * These are used by the p?d_alloc_track*() set of functions an in the generic >> - * vmalloc/ioremap code to track at which page-table levels entries have been >> - * modified. Based on that the code can better decide when vmalloc and ioremap >> - * mapping changes need to be synchronized to other page-tables in the system. >> - */ >> -#define __PGTBL_PGD_MODIFIED 0 >> -#define __PGTBL_P4D_MODIFIED 1 >> -#define __PGTBL_PUD_MODIFIED 2 >> -#define __PGTBL_PMD_MODIFIED 3 >> -#define __PGTBL_PTE_MODIFIED 4 >> - >> -#define PGTBL_PGD_MODIFIED BIT(__PGTBL_PGD_MODIFIED) >> -#define PGTBL_P4D_MODIFIED BIT(__PGTBL_P4D_MODIFIED) >> -#define PGTBL_PUD_MODIFIED BIT(__PGTBL_PUD_MODIFIED) >> -#define PGTBL_PMD_MODIFIED BIT(__PGTBL_PMD_MODIFIED) >> -#define PGTBL_PTE_MODIFIED BIT(__PGTBL_PTE_MODIFIED) >> - >> -/* Page-Table Modification Mask */ >> -typedef unsigned int pgtbl_mod_mask; >> - >> #endif /* !__ASSEMBLY__ */ >> >> #if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT) >> diff --git a/include/linux/pgtable_modmask.h b/include/linux/pgtable_modmask.h >> new file mode 100644 >> index 000000000000..5a21b1bb8df3 >> --- /dev/null >> +++ b/include/linux/pgtable_modmask.h >> @@ -0,0 +1,32 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +#ifndef _LINUX_PGTABLE_MODMASK_H >> +#define _LINUX_PGTABLE_MODMASK_H >> + >> +#ifndef __ASSEMBLY__ >> + >> +/* >> + * Page Table Modification bits for pgtbl_mod_mask. >> + * >> + * These are used by the p?d_alloc_track*() set of functions an in the generic >> + * vmalloc/ioremap code to track at which page-table levels entries have been >> + * modified. Based on that the code can better decide when vmalloc and ioremap >> + * mapping changes need to be synchronized to other page-tables in the system. >> + */ >> +#define __PGTBL_PGD_MODIFIED 0 >> +#define __PGTBL_P4D_MODIFIED 1 >> +#define __PGTBL_PUD_MODIFIED 2 >> +#define __PGTBL_PMD_MODIFIED 3 >> +#define __PGTBL_PTE_MODIFIED 4 >> + >> +#define PGTBL_PGD_MODIFIED BIT(__PGTBL_PGD_MODIFIED) >> +#define PGTBL_P4D_MODIFIED BIT(__PGTBL_P4D_MODIFIED) >> +#define PGTBL_PUD_MODIFIED BIT(__PGTBL_PUD_MODIFIED) >> +#define PGTBL_PMD_MODIFIED BIT(__PGTBL_PMD_MODIFIED) >> +#define PGTBL_PTE_MODIFIED BIT(__PGTBL_PTE_MODIFIED) >> + >> +/* Page-Table Modification Mask */ >> +typedef unsigned int pgtbl_mod_mask; >> + >> +#endif /* !__ASSEMBLY__ */ >> + >> +#endif /* _LINUX_PGTABLE_MODMASK_H */ >> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h >> index 16dd4cba64f2..cb5d8f1965a1 100644 >> --- a/include/linux/vmalloc.h >> +++ b/include/linux/vmalloc.h >> @@ -11,6 +11,7 @@ >> #include /* pgprot_t */ >> #include >> #include >> +#include >> >> #include >> >> @@ -213,6 +214,26 @@ extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, >> int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot, >> struct page **pages, unsigned int page_shift); >> >> +#ifndef arch_update_kernel_mappings_begin >> +/** >> + * arch_update_kernel_mappings_begin - A batch of kernel pgtable mappings are >> + * about to be updated. >> + * @start: Virtual address of start of range to be updated. >> + * @end: Virtual address of end of range to be updated. >> + * >> + * An optional hook to allow architecture code to prepare for a batch of kernel >> + * pgtable mapping updates. An architecture may use this to enter a lazy mode >> + * where some operations can be deferred until the end of the batch. >> + * >> + * Context: Called in task context and may be preemptible. >> + */ >> +static inline void arch_update_kernel_mappings_begin(unsigned long start, >> + unsigned long end) >> +{ >> +} >> +#endif >> + >> +#ifndef arch_update_kernel_mappings_end >> /* >> * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values >> * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings() >> @@ -229,6 +250,32 @@ int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot, >> */ >> void arch_sync_kernel_mappings(unsigned long start, unsigned long end); >> >> +/** >> + * arch_update_kernel_mappings_end - A batch of kernel pgtable mappings have >> + * been updated. >> + * @start: Virtual address of start of range that was updated. >> + * @end: Virtual address of end of range that was updated. >> + * >> + * An optional hook to inform architecture code that a batch update is complete. >> + * This balances a previous call to arch_update_kernel_mappings_begin(). >> + * >> + * An architecture may override this for any purpose, such as exiting a lazy >> + * mode previously entered with arch_update_kernel_mappings_begin() or syncing >> + * kernel mappings to a secondary pgtable. The default implementation calls an >> + * arch-provided arch_sync_kernel_mappings() if any arch-defined pgtable level >> + * was updated. >> + * >> + * Context: Called in task context and may be preemptible. >> + */ >> +static inline void arch_update_kernel_mappings_end(unsigned long start, >> + unsigned long end, >> + pgtbl_mod_mask mask) >> +{ >> + if (mask & ARCH_PAGE_TABLE_SYNC_MASK) >> + arch_sync_kernel_mappings(start, end); >> +} > > One arch call back calling yet another arch call back sounds bit odd. It's no different from the default implementation of arch_make_huge_pte() calling pte_mkhuge() is it? > Also > should not ARCH_PAGE_TABLE_SYNC_MASK be checked both for __begin and __end > callbacks in case a platform subscribes into this framework. I'm not sure how that would work? The mask is accumulated during the pgtable walk. So we don't have a mask until we get to the end. > Instead the > following changes sound more reasonable, but will also require some more > updates for the current platforms using arch_sync_kernel_mappings(). > > if (mask & ARCH_PAGE_TABLE_SYNC_MASK) > arch_update_kernel_mappings_begin() This makes no sense. mask is always 0 before doing the walk. > > if (mask & ARCH_PAGE_TABLE_SYNC_MASK) > arch_update_kernel_mappings_end() > > Basically when any platform defines ARCH_PAGE_TABLE_SYNC_MASK and subscribes > this framework, it will also provide arch_update_kernel_mappings_begin/end() > callbacks as required. Personally I think it's cleaner to just pass mask to arch_update_kernel_mappings_end() and it the function decide what it wants to do. But it's a good question as to whether we should refactor x86 and arm to directly implement arch_update_kernel_mappings_end() instead of arch_sync_kernel_mappings(). Personally I thought it was better to avoid the churn. But interested in others opinions. Thanks, Ryan > >> +#endif >> + >> /* >> * Lowlevel-APIs (not for driver use!) >> */ >> diff --git a/mm/memory.c b/mm/memory.c >> index a15f7dd500ea..f80930bc19f6 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3035,6 +3035,8 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr, >> if (WARN_ON(addr >= end)) >> return -EINVAL; >> >> + arch_update_kernel_mappings_begin(start, end); >> + >> pgd = pgd_offset(mm, addr); >> do { >> next = pgd_addr_end(addr, end); >> @@ -3055,8 +3057,7 @@ static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr, >> break; >> } while (pgd++, addr = next, addr != end); >> >> - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) >> - arch_sync_kernel_mappings(start, start + size); >> + arch_update_kernel_mappings_end(start, end, mask); >> >> return err; >> } >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 50fd44439875..c5c51d86ef78 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -312,10 +312,10 @@ int vmap_page_range(unsigned long addr, unsigned long end, >> pgtbl_mod_mask mask = 0; >> int err; >> >> + arch_update_kernel_mappings_begin(addr, end); >> err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot), >> ioremap_max_page_shift, &mask); >> - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) >> - arch_sync_kernel_mappings(addr, end); >> + arch_update_kernel_mappings_end(addr, end, mask); >> >> flush_cache_vmap(addr, end); >> if (!err) >> @@ -463,6 +463,9 @@ void __vunmap_range_noflush(unsigned long start, unsigned long end) >> pgtbl_mod_mask mask = 0; >> >> BUG_ON(addr >= end); >> + >> + arch_update_kernel_mappings_begin(start, end); >> + >> pgd = pgd_offset_k(addr); >> do { >> next = pgd_addr_end(addr, end); >> @@ -473,8 +476,7 @@ void __vunmap_range_noflush(unsigned long start, unsigned long end) >> vunmap_p4d_range(pgd, addr, next, &mask); >> } while (pgd++, addr = next, addr != end); >> >> - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) >> - arch_sync_kernel_mappings(start, end); >> + arch_update_kernel_mappings_end(start, end, mask); >> } >> >> void vunmap_range_noflush(unsigned long start, unsigned long end) >> @@ -625,6 +627,8 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end, >> >> WARN_ON(page_shift < PAGE_SHIFT); >> >> + arch_update_kernel_mappings_begin(start, end); >> + >> if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) || >> page_shift == PAGE_SHIFT) { >> err = vmap_small_pages_range_noflush(addr, end, prot, pages, >> @@ -642,8 +646,7 @@ int __vmap_pages_range_noflush(unsigned long addr, unsigned long end, >> } >> } >> >> - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) >> - arch_sync_kernel_mappings(start, end); >> + arch_update_kernel_mappings_end(start, end, mask); >> >> return err; >> }