* [PATCH v11 0/5] riscv: mm: Add soft-dirty and uffd-wp support
@ 2025-09-11 9:55 Chunyan Zhang
2025-09-11 9:55 ` [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported() Chunyan Zhang
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-11 9:55 UTC (permalink / raw)
To: linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie, Chunyan Zhang
This patchset adds support for Svrsw60t59b [1] extension which is ratified now,
also add soft dirty and userfaultfd write protect tracking for RISC-V.
The patches 1 and 2 add macros to allow architectures to define their own checks
if the soft-dirty / uffd_wp PTE bits are available, in other words for RISC-V,
the Svrsw60t59b extension is supported on which device the kernel is running.
This patchset has been tested with kselftest mm suite in which soft-dirty,
madv_populate, test_unmerge_uffd_wp, and uffd-unit-tests run and pass,
and no regressions are observed in any of the other tests.
This patchset applies on top of v6.17-rc4.
[1] https://github.com/riscv-non-isa/riscv-iommu/pull/543
V11:
- Rename the macro API to pgtable_*_supported() since we also have PMD support;
- Change the default implementations of two macros, make CONFIG_MEM_SOFT_DIRTY or
CONFIG_HAVE_ARCH_USERFAULTFD_WP part of the macros;
- Correct the order of insertion of RISCV_ISA_EXT_SVRSW60T59B;
- Rephrase some comments.
V10: https://lore.kernel.org/all/20250909095611.803898-1-zhangchunyan@iscas.ac.cn/
- Fixed the issue reported by kernel test irobot <lkp@intel.com>.
V9: https://lore.kernel.org/all/20250905103651.489197-1-zhangchunyan@iscas.ac.cn/
- Add pte_soft_dirty/uffd_wp_available() API to allow dynamically checking
if the PTE bit is available for the platform on which the kernel is running.
V8: https://lore.kernel.org/all/20250619065232.1786470-1-zhangchunyan@iscas.ac.cn/)
- Rebase on v6.16-rc1;
- Add dependencies to MMU && 64BIT for RISCV_ISA_SVRSW60T59B;
- Use 'Svrsw60t59b' instead of 'SVRSW60T59B' in Kconfig help paragraph;
- Add Alex's Reviewed-by tag in patch 1.
V7: https://lore.kernel.org/all/20250409095320.224100-1-zhangchunyan@iscas.ac.cn/
- Add Svrsw60t59b [1] extension support;
- Have soft-dirty and uffd-wp depending on the Svrsw60t59b extension to
avoid crashes for the hardware which don't have this extension.
V6: https://lore.kernel.org/all/20250408084301.68186-1-zhangchunyan@iscas.ac.cn/
- Changes to use bits 59-60 which are supported by extension Svrsw60t59b
for soft dirty and userfaultfd write protect tracking.
V5: https://lore.kernel.org/all/20241113095833.1805746-1-zhangchunyan@iscas.ac.cn/
- Fixed typos and corrected some words in Kconfig and commit message;
- Removed pte_wrprotect() from pte_swp_mkuffd_wp(), this is a copy-paste
error;
- Added Alex's Reviewed-by tag in patch 2.
V4: https://lore.kernel.org/all/20240830011101.3189522-1-zhangchunyan@iscas.ac.cn/
- Added bit(4) descriptions into "Format of swap PTE".
V3: https://lore.kernel.org/all/20240805095243.44809-1-zhangchunyan@iscas.ac.cn/
- Fixed the issue reported by kernel test irobot <lkp@intel.com>.
V2: https://lore.kernel.org/all/20240731040444.3384790-1-zhangchunyan@iscas.ac.cn/
- Add uffd-wp supported;
- Make soft-dirty uffd-wp and devmap mutually exclusive which all use
the same PTE bit;
- Add test results of CRIU in the cover-letter.
Chunyan Zhang (5):
mm: softdirty: Add pgtable_soft_dirty_supported()
mm: userfaultfd: Add pgtable_uffd_wp_supported()
riscv: Add RISC-V Svrsw60t59b extension support
riscv: mm: Add soft-dirty page tracking support
riscv: mm: Add userfaultfd write-protect support
arch/riscv/Kconfig | 16 +++
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/pgtable-bits.h | 37 +++++++
arch/riscv/include/asm/pgtable.h | 143 +++++++++++++++++++++++++-
arch/riscv/kernel/cpufeature.c | 1 +
fs/proc/task_mmu.c | 17 ++-
fs/userfaultfd.c | 23 +++--
include/asm-generic/pgtable_uffd.h | 11 ++
include/linux/mm_inline.h | 7 ++
include/linux/pgtable.h | 12 +++
include/linux/userfaultfd_k.h | 44 +++++---
mm/debug_vm_pgtable.c | 10 +-
mm/huge_memory.c | 13 +--
mm/internal.h | 2 +-
mm/memory.c | 6 +-
mm/mremap.c | 13 +--
mm/userfaultfd.c | 10 +-
17 files changed, 310 insertions(+), 56 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported()
2025-09-11 9:55 [PATCH v11 0/5] riscv: mm: Add soft-dirty and uffd-wp support Chunyan Zhang
@ 2025-09-11 9:55 ` Chunyan Zhang
2025-09-11 13:09 ` David Hildenbrand
2025-09-11 9:55 ` [PATCH v11 2/5] mm: userfaultfd: Add pgtable_uffd_wp_supported() Chunyan Zhang
` (3 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-11 9:55 UTC (permalink / raw)
To: linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie, Chunyan Zhang
Some platforms can customize the PTE PMD entry soft-dirty bit making it
unavailable even if the architecture provides the resource.
Add an API which architectures can define their specific implementations
to detect if soft-dirty bit is available on which device the kernel is
running.
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
---
fs/proc/task_mmu.c | 17 ++++++++++++++++-
include/linux/pgtable.h | 12 ++++++++++++
mm/debug_vm_pgtable.c | 10 +++++-----
mm/huge_memory.c | 13 +++++++------
mm/internal.h | 2 +-
mm/mremap.c | 13 +++++++------
mm/userfaultfd.c | 10 ++++------
7 files changed, 52 insertions(+), 25 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 29cca0e6d0ff..9e8083b6d4cd 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1058,7 +1058,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
* -Werror=unterminated-string-initialization warning
* with GCC 15
*/
- static const char mnemonics[BITS_PER_LONG][3] = {
+ static char mnemonics[BITS_PER_LONG][3] = {
/*
* In case if we meet a flag we don't know about.
*/
@@ -1129,6 +1129,16 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
[ilog2(VM_SEALED)] = "sl",
#endif
};
+/*
+ * We should remove the VM_SOFTDIRTY flag if the soft-dirty bit is
+ * unavailable on which the kernel is running, even if the architecture
+ * provides the resource and soft-dirty is compiled in.
+ */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ if (!pgtable_soft_dirty_supported())
+ mnemonics[ilog2(VM_SOFTDIRTY)][0] = 0;
+#endif
+
size_t i;
seq_puts(m, "VmFlags: ");
@@ -1531,6 +1541,8 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr,
static inline void clear_soft_dirty(struct vm_area_struct *vma,
unsigned long addr, pte_t *pte)
{
+ if (!pgtable_soft_dirty_supported())
+ return;
/*
* The soft-dirty tracker uses #PF-s to catch writes
* to pages, so write-protect the pte as well. See the
@@ -1566,6 +1578,9 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
{
pmd_t old, pmd = *pmdp;
+ if (!pgtable_soft_dirty_supported())
+ return;
+
if (pmd_present(pmd)) {
/* See comment in change_huge_pmd() */
old = pmdp_invalidate(vma, addr, pmdp);
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 4c035637eeb7..2a3578a4ae4c 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1537,6 +1537,18 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
#define arch_start_context_switch(prev) do {} while (0)
#endif
+/*
+ * Some platforms can customize the PTE soft-dirty bit making it unavailable
+ * even if the architecture provides the resource.
+ * Adding this API allows architectures to add their own checks for the
+ * devices on which the kernel is running.
+ * Note: When overiding it, please make sure the CONFIG_MEM_SOFT_DIRTY
+ * is part of this macro.
+ */
+#ifndef pgtable_soft_dirty_supported
+#define pgtable_soft_dirty_supported() IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)
+#endif
+
#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
#ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 830107b6dd08..b32ce2b0b998 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -690,7 +690,7 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args)
{
pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
- if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+ if (!pgtable_soft_dirty_supported())
return;
pr_debug("Validating PTE soft dirty\n");
@@ -702,7 +702,7 @@ static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args)
{
pte_t pte;
- if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+ if (!pgtable_soft_dirty_supported())
return;
pr_debug("Validating PTE swap soft dirty\n");
@@ -718,7 +718,7 @@ static void __init pmd_soft_dirty_tests(struct pgtable_debug_args *args)
{
pmd_t pmd;
- if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+ if (!pgtable_soft_dirty_supported())
return;
if (!has_transparent_hugepage())
@@ -734,8 +734,8 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args)
{
pmd_t pmd;
- if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) ||
- !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
+ if (!pgtable_soft_dirty_supported() ||
+ !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
return;
if (!has_transparent_hugepage())
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9c38a95e9f09..218d430a2ec6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2271,12 +2271,13 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
static pmd_t move_soft_dirty_pmd(pmd_t pmd)
{
-#ifdef CONFIG_MEM_SOFT_DIRTY
- if (unlikely(is_pmd_migration_entry(pmd)))
- pmd = pmd_swp_mksoft_dirty(pmd);
- else if (pmd_present(pmd))
- pmd = pmd_mksoft_dirty(pmd);
-#endif
+ if (pgtable_soft_dirty_supported()) {
+ if (unlikely(is_pmd_migration_entry(pmd)))
+ pmd = pmd_swp_mksoft_dirty(pmd);
+ else if (pmd_present(pmd))
+ pmd = pmd_mksoft_dirty(pmd);
+ }
+
return pmd;
}
diff --git a/mm/internal.h b/mm/internal.h
index 45b725c3dc03..c6ca62f8ecf3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1538,7 +1538,7 @@ static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
* VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
* will be constantly true.
*/
- if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
+ if (!pgtable_soft_dirty_supported())
return false;
/*
diff --git a/mm/mremap.c b/mm/mremap.c
index e618a706aff5..7beb3114dbf5 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -162,12 +162,13 @@ static pte_t move_soft_dirty_pte(pte_t pte)
* Set soft dirty bit so we can notice
* in userspace the ptes were moved.
*/
-#ifdef CONFIG_MEM_SOFT_DIRTY
- if (pte_present(pte))
- pte = pte_mksoft_dirty(pte);
- else if (is_swap_pte(pte))
- pte = pte_swp_mksoft_dirty(pte);
-#endif
+ if (pgtable_soft_dirty_supported()) {
+ if (pte_present(pte))
+ pte = pte_mksoft_dirty(pte);
+ else if (is_swap_pte(pte))
+ pte = pte_swp_mksoft_dirty(pte);
+ }
+
return pte;
}
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 45e6290e2e8b..85f43479b67a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1065,9 +1065,8 @@ static int move_present_pte(struct mm_struct *mm,
orig_dst_pte = folio_mk_pte(src_folio, dst_vma->vm_page_prot);
/* Set soft dirty bit so userspace can notice the pte was moved */
-#ifdef CONFIG_MEM_SOFT_DIRTY
- orig_dst_pte = pte_mksoft_dirty(orig_dst_pte);
-#endif
+ if (pgtable_soft_dirty_supported())
+ orig_dst_pte = pte_mksoft_dirty(orig_dst_pte);
if (pte_dirty(orig_src_pte))
orig_dst_pte = pte_mkdirty(orig_dst_pte);
orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma);
@@ -1134,9 +1133,8 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
-#ifdef CONFIG_MEM_SOFT_DIRTY
- orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte);
-#endif
+ if (pgtable_soft_dirty_supported())
+ orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte);
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v11 2/5] mm: userfaultfd: Add pgtable_uffd_wp_supported()
2025-09-11 9:55 [PATCH v11 0/5] riscv: mm: Add soft-dirty and uffd-wp support Chunyan Zhang
2025-09-11 9:55 ` [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported() Chunyan Zhang
@ 2025-09-11 9:55 ` Chunyan Zhang
2025-09-12 8:54 ` David Hildenbrand
2025-09-11 9:56 ` [PATCH v11 3/5] riscv: Add RISC-V Svrsw60t59b extension support Chunyan Zhang
` (2 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-11 9:55 UTC (permalink / raw)
To: linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie, Chunyan Zhang
Some platforms can customize the PTE/PMD entry uffd-wp bit making
it unavailable even if the architecture provides the resource.
This patch adds a macro API that allows architectures to define their
specific implementations to check if the uffd-wp bit is available
on which device the kernel is running.
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
---
fs/userfaultfd.c | 23 ++++++++--------
include/asm-generic/pgtable_uffd.h | 11 ++++++++
include/linux/mm_inline.h | 7 +++++
include/linux/userfaultfd_k.h | 44 +++++++++++++++++++-----------
mm/memory.c | 6 ++--
5 files changed, 62 insertions(+), 29 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 54c6cc7fe9c6..b549c327d7ad 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1270,9 +1270,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING)
vm_flags |= VM_UFFD_MISSING;
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) {
-#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
- goto out;
-#endif
+ if (!pgtable_uffd_wp_supported())
+ goto out;
+
vm_flags |= VM_UFFD_WP;
}
if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) {
@@ -1980,14 +1980,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
uffdio_api.features &=
~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
#endif
-#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
- uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP;
-#endif
-#ifndef CONFIG_PTE_MARKER_UFFD_WP
- uffdio_api.features &= ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM;
- uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
- uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
-#endif
+ if (!pgtable_uffd_wp_supported())
+ uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP;
+
+ if (!IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP) ||
+ !pgtable_uffd_wp_supported()) {
+ uffdio_api.features &= ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM;
+ uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
+ uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
+ }
ret = -EINVAL;
if (features & ~uffdio_api.features)
diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h
index 828966d4c281..895d68ece0e7 100644
--- a/include/asm-generic/pgtable_uffd.h
+++ b/include/asm-generic/pgtable_uffd.h
@@ -1,6 +1,17 @@
#ifndef _ASM_GENERIC_PGTABLE_UFFD_H
#define _ASM_GENERIC_PGTABLE_UFFD_H
+/*
+ * Some platforms can customize the uffd-wp bit, making it unavailable
+ * even if the architecture provides the resource.
+ * Adding this API allows architectures to add their own checks for the
+ * devices on which the kernel is running.
+ * Note: When overiding it, please make sure the
+ * CONFIG_HAVE_ARCH_USERFAULTFD_WP is part of this macro.
+ */
+#ifndef pgtable_uffd_wp_supported
+#define pgtable_uffd_wp_supported() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP)
+#endif
#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
static __always_inline int pte_uffd_wp(pte_t pte)
{
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 89b518ff097e..38845b8b79ff 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -571,6 +571,13 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr,
pte_t *pte, pte_t pteval)
{
#ifdef CONFIG_PTE_MARKER_UFFD_WP
+ /*
+ * Some platforms can customize the PTE uffd-wp bit, making it unavailable
+ * even if the architecture allows providing the PTE resource.
+ */
+ if (!pgtable_uffd_wp_supported())
+ return false;
+
bool arm_uffd_pte = false;
/* The current status of the pte should be "cleared" before calling */
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index c0e716aec26a..6264b56ae961 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -228,15 +228,15 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
if (wp_async && (vm_flags == VM_UFFD_WP))
return true;
-#ifndef CONFIG_PTE_MARKER_UFFD_WP
/*
* If user requested uffd-wp but not enabled pte markers for
* uffd-wp, then shmem & hugetlbfs are not supported but only
* anonymous.
*/
- if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma))
+ if ((!IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP) ||
+ !pgtable_uffd_wp_supported()) &&
+ (vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma))
return false;
-#endif
/* By default, allow any of anon|shmem|hugetlb */
return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
@@ -437,8 +437,11 @@ static inline bool userfaultfd_wp_use_markers(struct vm_area_struct *vma)
static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry)
{
#ifdef CONFIG_PTE_MARKER_UFFD_WP
- return is_pte_marker_entry(entry) &&
- (pte_marker_get(entry) & PTE_MARKER_UFFD_WP);
+ if (pgtable_uffd_wp_supported())
+ return is_pte_marker_entry(entry) &&
+ (pte_marker_get(entry) & PTE_MARKER_UFFD_WP);
+ else
+ return false;
#else
return false;
#endif
@@ -447,14 +450,19 @@ static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry)
static inline bool pte_marker_uffd_wp(pte_t pte)
{
#ifdef CONFIG_PTE_MARKER_UFFD_WP
- swp_entry_t entry;
+ if (pgtable_uffd_wp_supported()) {
+ swp_entry_t entry;
- if (!is_swap_pte(pte))
- return false;
+ if (!is_swap_pte(pte))
+ return false;
- entry = pte_to_swp_entry(pte);
+ entry = pte_to_swp_entry(pte);
+
+ return pte_marker_entry_uffd_wp(entry);
+ } else {
+ return false;
+ }
- return pte_marker_entry_uffd_wp(entry);
#else
return false;
#endif
@@ -467,14 +475,18 @@ static inline bool pte_marker_uffd_wp(pte_t pte)
static inline bool pte_swp_uffd_wp_any(pte_t pte)
{
#ifdef CONFIG_PTE_MARKER_UFFD_WP
- if (!is_swap_pte(pte))
- return false;
+ if (pgtable_uffd_wp_supported()) {
+ if (!is_swap_pte(pte))
+ return false;
- if (pte_swp_uffd_wp(pte))
- return true;
+ if (pte_swp_uffd_wp(pte))
+ return true;
- if (pte_marker_uffd_wp(pte))
- return true;
+ if (pte_marker_uffd_wp(pte))
+ return true;
+ } else {
+ return false;
+ }
#endif
return false;
}
diff --git a/mm/memory.c b/mm/memory.c
index 0ba4f6b71847..4eb05c5f487b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1465,7 +1465,9 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma,
{
bool was_installed = false;
-#ifdef CONFIG_PTE_MARKER_UFFD_WP
+ if (!IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP) || !pgtable_uffd_wp_supported())
+ return false;
+
/* Zap on anonymous always means dropping everything */
if (vma_is_anonymous(vma))
return false;
@@ -1482,7 +1484,7 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma,
pte++;
addr += PAGE_SIZE;
}
-#endif
+
return was_installed;
}
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v11 3/5] riscv: Add RISC-V Svrsw60t59b extension support
2025-09-11 9:55 [PATCH v11 0/5] riscv: mm: Add soft-dirty and uffd-wp support Chunyan Zhang
2025-09-11 9:55 ` [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported() Chunyan Zhang
2025-09-11 9:55 ` [PATCH v11 2/5] mm: userfaultfd: Add pgtable_uffd_wp_supported() Chunyan Zhang
@ 2025-09-11 9:56 ` Chunyan Zhang
2025-09-11 9:56 ` [PATCH v11 4/5] riscv: mm: Add soft-dirty page tracking support Chunyan Zhang
2025-09-11 9:56 ` [PATCH v11 5/5] riscv: mm: Add userfaultfd write-protect support Chunyan Zhang
4 siblings, 0 replies; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-11 9:56 UTC (permalink / raw)
To: linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie, Chunyan Zhang
The Svrsw60t59b extension allows to free the PTE reserved bits 60
and 59 for software to use.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
---
arch/riscv/Kconfig | 14 ++++++++++++++
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
3 files changed, 16 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index a4b233a0659e..d99df67cc7a4 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -862,6 +862,20 @@ config RISCV_ISA_ZICBOP
If you don't know what to do here, say Y.
+config RISCV_ISA_SVRSW60T59B
+ bool "Svrsw60t59b extension support for using PTE bits 60 and 59"
+ depends on MMU && 64BIT
+ depends on RISCV_ALTERNATIVE
+ default y
+ help
+ Adds support to dynamically detect the presence of the Svrsw60t59b
+ extension and enable its usage.
+
+ The Svrsw60t59b extension allows to free the PTE reserved bits 60
+ and 59 for software to use.
+
+ If you don't know what to do here, say Y.
+
config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
def_bool y
# https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index affd63e11b0a..f98fcb5c17d5 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -106,6 +106,7 @@
#define RISCV_ISA_EXT_ZAAMO 97
#define RISCV_ISA_EXT_ZALRSC 98
#define RISCV_ISA_EXT_ZICBOP 99
+#define RISCV_ISA_EXT_SVRSW60T59B 100
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 743d53415572..2ba71d2d3fa3 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -539,6 +539,7 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
__RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT),
__RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT),
+ __RISCV_ISA_EXT_DATA(svrsw60t59b, RISCV_ISA_EXT_SVRSW60T59B),
__RISCV_ISA_EXT_DATA(svvptc, RISCV_ISA_EXT_SVVPTC),
};
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v11 4/5] riscv: mm: Add soft-dirty page tracking support
2025-09-11 9:55 [PATCH v11 0/5] riscv: mm: Add soft-dirty and uffd-wp support Chunyan Zhang
` (2 preceding siblings ...)
2025-09-11 9:56 ` [PATCH v11 3/5] riscv: Add RISC-V Svrsw60t59b extension support Chunyan Zhang
@ 2025-09-11 9:56 ` Chunyan Zhang
2025-09-11 9:56 ` [PATCH v11 5/5] riscv: mm: Add userfaultfd write-protect support Chunyan Zhang
4 siblings, 0 replies; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-11 9:56 UTC (permalink / raw)
To: linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie, Chunyan Zhang
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software, this patch uses bit 59 for soft-dirty.
To add swap PTE soft-dirty tracking, we borrow bit 3 which is available
for swap PTEs on RISC-V systems.
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/pgtable-bits.h | 19 +++++++
arch/riscv/include/asm/pgtable.h | 75 ++++++++++++++++++++++++++-
3 files changed, 93 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d99df67cc7a4..53b73e4bdf3f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -141,6 +141,7 @@ config RISCV
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
select HAVE_ARCH_SECCOMP_FILTER
+ select HAVE_ARCH_SOFT_DIRTY if 64BIT && MMU && RISCV_ISA_SVRSW60T59B
select HAVE_ARCH_THREAD_STRUCT_WHITELIST
select HAVE_ARCH_TRACEHOOK
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if 64BIT && MMU
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index 179bd4afece4..f3bac2bbc157 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -19,6 +19,25 @@
#define _PAGE_SOFT (3 << 8) /* Reserved for software */
#define _PAGE_SPECIAL (1 << 8) /* RSW: 0x1 */
+
+#ifdef CONFIG_MEM_SOFT_DIRTY
+
+/* ext_svrsw60t59b: bit 59 for soft-dirty tracking */
+#define _PAGE_SOFT_DIRTY \
+ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \
+ (1UL << 59) : 0)
+/*
+ * Bit 3 is always zero for swap entry computation, so we
+ * can borrow it for swap page soft-dirty tracking.
+ */
+#define _PAGE_SWP_SOFT_DIRTY \
+ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \
+ _PAGE_EXEC : 0)
+#else
+#define _PAGE_SOFT_DIRTY 0
+#define _PAGE_SWP_SOFT_DIRTY 0
+#endif /* CONFIG_MEM_SOFT_DIRTY */
+
#define _PAGE_TABLE _PAGE_PRESENT
/*
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 91697fbf1f90..77344ff0298b 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -427,7 +427,7 @@ static inline pte_t pte_mkwrite_novma(pte_t pte)
static inline pte_t pte_mkdirty(pte_t pte)
{
- return __pte(pte_val(pte) | _PAGE_DIRTY);
+ return __pte(pte_val(pte) | _PAGE_DIRTY | _PAGE_SOFT_DIRTY);
}
static inline pte_t pte_mkclean(pte_t pte)
@@ -455,6 +455,42 @@ static inline pte_t pte_mkhuge(pte_t pte)
return pte;
}
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+#define pgtable_soft_dirty_supported() \
+ (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && \
+ riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B))
+
+static inline bool pte_soft_dirty(pte_t pte)
+{
+ return !!(pte_val(pte) & _PAGE_SOFT_DIRTY);
+}
+
+static inline pte_t pte_mksoft_dirty(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SOFT_DIRTY);
+}
+
+static inline pte_t pte_clear_soft_dirty(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~(_PAGE_SOFT_DIRTY));
+}
+
+static inline bool pte_swp_soft_dirty(pte_t pte)
+{
+ return !!(pte_val(pte) & _PAGE_SWP_SOFT_DIRTY);
+}
+
+static inline pte_t pte_swp_mksoft_dirty(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SWP_SOFT_DIRTY);
+}
+
+static inline pte_t pte_swp_clear_soft_dirty(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~(_PAGE_SWP_SOFT_DIRTY));
+}
+#endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
+
#ifdef CONFIG_RISCV_ISA_SVNAPOT
#define pte_leaf_size(pte) (pte_napot(pte) ? \
napot_cont_size(napot_cont_order(pte)) :\
@@ -802,6 +838,40 @@ static inline pud_t pud_mkspecial(pud_t pud)
}
#endif
+#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
+static inline bool pmd_soft_dirty(pmd_t pmd)
+{
+ return pte_soft_dirty(pmd_pte(pmd));
+}
+
+static inline pmd_t pmd_mksoft_dirty(pmd_t pmd)
+{
+ return pte_pmd(pte_mksoft_dirty(pmd_pte(pmd)));
+}
+
+static inline pmd_t pmd_clear_soft_dirty(pmd_t pmd)
+{
+ return pte_pmd(pte_clear_soft_dirty(pmd_pte(pmd)));
+}
+
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline bool pmd_swp_soft_dirty(pmd_t pmd)
+{
+ return pte_swp_soft_dirty(pmd_pte(pmd));
+}
+
+static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+{
+ return pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd)));
+}
+
+static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd)
+{
+ return pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)));
+}
+#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
+#endif /* CONFIG_HAVE_ARCH_SOFT_DIRTY */
+
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp, pmd_t pmd)
{
@@ -983,7 +1053,8 @@ static inline pud_t pud_modify(pud_t pud, pgprot_t newprot)
*
* Format of swap PTE:
* bit 0: _PAGE_PRESENT (zero)
- * bit 1 to 3: _PAGE_LEAF (zero)
+ * bit 1 to 2: (zero)
+ * bit 3: _PAGE_SWP_SOFT_DIRTY
* bit 5: _PAGE_PROT_NONE (zero)
* bit 6: exclusive marker
* bits 7 to 11: swap type
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v11 5/5] riscv: mm: Add userfaultfd write-protect support
2025-09-11 9:55 [PATCH v11 0/5] riscv: mm: Add soft-dirty and uffd-wp support Chunyan Zhang
` (3 preceding siblings ...)
2025-09-11 9:56 ` [PATCH v11 4/5] riscv: mm: Add soft-dirty page tracking support Chunyan Zhang
@ 2025-09-11 9:56 ` Chunyan Zhang
4 siblings, 0 replies; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-11 9:56 UTC (permalink / raw)
To: linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
David Hildenbrand, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie, Chunyan Zhang
The Svrsw60t59b extension allows to free the PTE reserved bits 60 and 59
for software, this patch uses bit 60 for uffd-wp tracking
Additionally for tracking the uffd-wp state as a PTE swap bit, we borrow
bit 4 which is not involved into swap entry computation.
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/pgtable-bits.h | 18 +++++++
arch/riscv/include/asm/pgtable.h | 68 +++++++++++++++++++++++++++
3 files changed, 87 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 53b73e4bdf3f..f928768bb14a 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -147,6 +147,7 @@ config RISCV
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if 64BIT && MMU
select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if 64BIT && MMU
select HAVE_ARCH_USERFAULTFD_MINOR if 64BIT && USERFAULTFD
+ select HAVE_ARCH_USERFAULTFD_WP if 64BIT && MMU && USERFAULTFD && RISCV_ISA_SVRSW60T59B
select HAVE_ARCH_VMAP_STACK if MMU && 64BIT
select HAVE_ASM_MODVERSIONS
select HAVE_CONTEXT_TRACKING_USER
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index f3bac2bbc157..b422d9691e60 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -38,6 +38,24 @@
#define _PAGE_SWP_SOFT_DIRTY 0
#endif /* CONFIG_MEM_SOFT_DIRTY */
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+
+/* ext_svrsw60t59b: Bit(60) for uffd-wp tracking */
+#define _PAGE_UFFD_WP \
+ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \
+ (1UL << 60) : 0)
+/*
+ * Bit 4 is not involved into swap entry computation, so we
+ * can borrow it for swap page uffd-wp tracking.
+ */
+#define _PAGE_SWP_UFFD_WP \
+ ((riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)) ? \
+ _PAGE_USER : 0)
+#else
+#define _PAGE_UFFD_WP 0
+#define _PAGE_SWP_UFFD_WP 0
+#endif
+
#define _PAGE_TABLE _PAGE_PRESENT
/*
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 77344ff0298b..5d3f17e175e5 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -416,6 +416,41 @@ static inline pte_t pte_wrprotect(pte_t pte)
return __pte(pte_val(pte) & ~(_PAGE_WRITE));
}
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+#define pgtable_uffd_wp_supported() \
+ riscv_has_extension_unlikely(RISCV_ISA_EXT_SVRSW60T59B)
+
+static inline bool pte_uffd_wp(pte_t pte)
+{
+ return !!(pte_val(pte) & _PAGE_UFFD_WP);
+}
+
+static inline pte_t pte_mkuffd_wp(pte_t pte)
+{
+ return pte_wrprotect(__pte(pte_val(pte) | _PAGE_UFFD_WP));
+}
+
+static inline pte_t pte_clear_uffd_wp(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~(_PAGE_UFFD_WP));
+}
+
+static inline bool pte_swp_uffd_wp(pte_t pte)
+{
+ return !!(pte_val(pte) & _PAGE_SWP_UFFD_WP);
+}
+
+static inline pte_t pte_swp_mkuffd_wp(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SWP_UFFD_WP);
+}
+
+static inline pte_t pte_swp_clear_uffd_wp(pte_t pte)
+{
+ return __pte(pte_val(pte) & ~(_PAGE_SWP_UFFD_WP));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
/* static inline pte_t pte_mkread(pte_t pte) */
static inline pte_t pte_mkwrite_novma(pte_t pte)
@@ -838,6 +873,38 @@ static inline pud_t pud_mkspecial(pud_t pud)
}
#endif
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
+static inline bool pmd_uffd_wp(pmd_t pmd)
+{
+ return pte_uffd_wp(pmd_pte(pmd));
+}
+
+static inline pmd_t pmd_mkuffd_wp(pmd_t pmd)
+{
+ return pte_pmd(pte_mkuffd_wp(pmd_pte(pmd)));
+}
+
+static inline pmd_t pmd_clear_uffd_wp(pmd_t pmd)
+{
+ return pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd)));
+}
+
+static inline bool pmd_swp_uffd_wp(pmd_t pmd)
+{
+ return pte_swp_uffd_wp(pmd_pte(pmd));
+}
+
+static inline pmd_t pmd_swp_mkuffd_wp(pmd_t pmd)
+{
+ return pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd)));
+}
+
+static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd)
+{
+ return pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd)));
+}
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
+
#ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
static inline bool pmd_soft_dirty(pmd_t pmd)
{
@@ -1055,6 +1122,7 @@ static inline pud_t pud_modify(pud_t pud, pgprot_t newprot)
* bit 0: _PAGE_PRESENT (zero)
* bit 1 to 2: (zero)
* bit 3: _PAGE_SWP_SOFT_DIRTY
+ * bit 4: _PAGE_SWP_UFFD_WP
* bit 5: _PAGE_PROT_NONE (zero)
* bit 6: exclusive marker
* bits 7 to 11: swap type
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported()
2025-09-11 9:55 ` [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported() Chunyan Zhang
@ 2025-09-11 13:09 ` David Hildenbrand
2025-09-12 8:22 ` Chunyan Zhang
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2025-09-11 13:09 UTC (permalink / raw)
To: Chunyan Zhang, linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Axel Rasmussen,
Yuanchu Xie, Chunyan Zhang
On 11.09.25 11:55, Chunyan Zhang wrote:
> Some platforms can customize the PTE PMD entry soft-dirty bit making it
> unavailable even if the architecture provides the resource.
>
> Add an API which architectures can define their specific implementations
> to detect if soft-dirty bit is available on which device the kernel is
> running.
Thinking to myself: maybe pgtable_supports_soft_dirty() would read better
Whatever you prefer.
>
> Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
> ---
> fs/proc/task_mmu.c | 17 ++++++++++++++++-
> include/linux/pgtable.h | 12 ++++++++++++
> mm/debug_vm_pgtable.c | 10 +++++-----
> mm/huge_memory.c | 13 +++++++------
> mm/internal.h | 2 +-
> mm/mremap.c | 13 +++++++------
> mm/userfaultfd.c | 10 ++++------
> 7 files changed, 52 insertions(+), 25 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 29cca0e6d0ff..9e8083b6d4cd 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1058,7 +1058,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> * -Werror=unterminated-string-initialization warning
> * with GCC 15
> */
> - static const char mnemonics[BITS_PER_LONG][3] = {
> + static char mnemonics[BITS_PER_LONG][3] = {
> /*
> * In case if we meet a flag we don't know about.
> */
> @@ -1129,6 +1129,16 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> [ilog2(VM_SEALED)] = "sl",
> #endif
> };
> +/*
> + * We should remove the VM_SOFTDIRTY flag if the soft-dirty bit is
> + * unavailable on which the kernel is running, even if the architecture
> + * provides the resource and soft-dirty is compiled in.
> + */
> +#ifdef CONFIG_MEM_SOFT_DIRTY
> + if (!pgtable_soft_dirty_supported())
> + mnemonics[ilog2(VM_SOFTDIRTY)][0] = 0;
> +#endif
You can now drop the ifdef.
But, I wonder if could we instead just stop setting the flag. Then we don't
have to worry about any VM_SOFTDIRTY checks.
Something like the following
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 892fe5dbf9de0..8b8bf63a32ef7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -783,6 +783,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
static inline void vm_flags_init(struct vm_area_struct *vma,
vm_flags_t flags)
{
+ VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
ACCESS_PRIVATE(vma, __vm_flags) = flags;
}
@@ -801,6 +802,7 @@ static inline void vm_flags_reset(struct vm_area_struct *vma,
static inline void vm_flags_reset_once(struct vm_area_struct *vma,
vm_flags_t flags)
{
+ VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
vma_assert_write_locked(vma);
WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
}
@@ -808,6 +810,7 @@ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
static inline void vm_flags_set(struct vm_area_struct *vma,
vm_flags_t flags)
{
+ VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
vma_start_write(vma);
ACCESS_PRIVATE(vma, __vm_flags) |= flags;
}
diff --git a/mm/mmap.c b/mm/mmap.c
index 5fd3b80fda1d5..40cb3fbf9a247 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1451,8 +1451,10 @@ static struct vm_area_struct *__install_special_mapping(
return ERR_PTR(-ENOMEM);
vma_set_range(vma, addr, addr + len, 0);
- vm_flags_init(vma, (vm_flags | mm->def_flags |
- VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
+ vm_flags |= mm->def_flags | VM_DONTEXPAND;
+ if (pgtable_soft_dirty_supported())
+ vm_flags |= VM_SOFTDIRTY;
+ vm_flags_init(vma, vm_flags & ~VM_LOCKED_MASK);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
vma->vm_ops = ops;
diff --git a/mm/vma.c b/mm/vma.c
index abe0da33c8446..16a1ed2a6199c 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -2551,7 +2551,8 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
* then new mapped in-place (which must be aimed as
* a completely new data area).
*/
- vm_flags_set(vma, VM_SOFTDIRTY);
+ if (pgtable_soft_dirty_supported())
+ vm_flags_set(vma, VM_SOFTDIRTY);
vma_set_page_prot(vma);
}
@@ -2819,7 +2820,8 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
mm->data_vm += len >> PAGE_SHIFT;
if (vm_flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
- vm_flags_set(vma, VM_SOFTDIRTY);
+ if (pgtable_soft_dirty_supported())
+ vm_flags_set(vma, VM_SOFTDIRTY);
return 0;
mas_store_fail:
diff --git a/mm/vma_exec.c b/mm/vma_exec.c
index 922ee51747a68..c06732a5a620a 100644
--- a/mm/vma_exec.c
+++ b/mm/vma_exec.c
@@ -107,6 +107,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
int create_init_stack_vma(struct mm_struct *mm, struct vm_area_struct **vmap,
unsigned long *top_mem_p)
{
+ unsigned long flags = VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP;
int err;
struct vm_area_struct *vma = vm_area_alloc(mm);
@@ -137,7 +138,9 @@ int create_init_stack_vma(struct mm_struct *mm, struct vm_area_struct **vmap,
BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
vma->vm_end = STACK_TOP_MAX;
vma->vm_start = vma->vm_end - PAGE_SIZE;
- vm_flags_init(vma, VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP);
+ if (pgtable_soft_dirty_supported())
+ flags |= VM_SOFTDIRTY;
+ vm_flags_init(vma, flags);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
err = insert_vm_struct(mm, vma);
> +
> size_t i;
>
> seq_puts(m, "VmFlags: ");
> @@ -1531,6 +1541,8 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr,
> static inline void clear_soft_dirty(struct vm_area_struct *vma,
> unsigned long addr, pte_t *pte)
> {
> + if (!pgtable_soft_dirty_supported())
> + return;
> /*
> * The soft-dirty tracker uses #PF-s to catch writes
> * to pages, so write-protect the pte as well. See the
> @@ -1566,6 +1578,9 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
> {
> pmd_t old, pmd = *pmdp;
>
> + if (!pgtable_soft_dirty_supported())
> + return;
> +
> if (pmd_present(pmd)) {
> /* See comment in change_huge_pmd() */
> old = pmdp_invalidate(vma, addr, pmdp);
That would all be handled with the above never-set-VM_SOFTDIRTY.
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 4c035637eeb7..2a3578a4ae4c 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1537,6 +1537,18 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
> #define arch_start_context_switch(prev) do {} while (0)
> #endif
>
> +/*
> + * Some platforms can customize the PTE soft-dirty bit making it unavailable
> + * even if the architecture provides the resource.
> + * Adding this API allows architectures to add their own checks for the
> + * devices on which the kernel is running.
> + * Note: When overiding it, please make sure the CONFIG_MEM_SOFT_DIRTY
> + * is part of this macro.
> + */
> +#ifndef pgtable_soft_dirty_supported
> +#define pgtable_soft_dirty_supported() IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)
> +#endif
> +
> #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
> #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
> static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 830107b6dd08..b32ce2b0b998 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -690,7 +690,7 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args)
> {
> pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
>
> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> + if (!pgtable_soft_dirty_supported())
> return;
>
> pr_debug("Validating PTE soft dirty\n");
> @@ -702,7 +702,7 @@ static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args)
> {
> pte_t pte;
>
> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> + if (!pgtable_soft_dirty_supported())
> return;
>
> pr_debug("Validating PTE swap soft dirty\n");
> @@ -718,7 +718,7 @@ static void __init pmd_soft_dirty_tests(struct pgtable_debug_args *args)
> {
> pmd_t pmd;
>
> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> + if (!pgtable_soft_dirty_supported())
> return;
>
> if (!has_transparent_hugepage())
> @@ -734,8 +734,8 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args)
> {
> pmd_t pmd;
>
> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) ||
> - !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
> + if (!pgtable_soft_dirty_supported() ||
> + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
> return;
>
> if (!has_transparent_hugepage())
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9c38a95e9f09..218d430a2ec6 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2271,12 +2271,13 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
>
> static pmd_t move_soft_dirty_pmd(pmd_t pmd)
> {
> -#ifdef CONFIG_MEM_SOFT_DIRTY
> - if (unlikely(is_pmd_migration_entry(pmd)))
> - pmd = pmd_swp_mksoft_dirty(pmd);
> - else if (pmd_present(pmd))
> - pmd = pmd_mksoft_dirty(pmd);
> -#endif
> + if (pgtable_soft_dirty_supported()) {
> + if (unlikely(is_pmd_migration_entry(pmd)))
> + pmd = pmd_swp_mksoft_dirty(pmd);
> + else if (pmd_present(pmd))
> + pmd = pmd_mksoft_dirty(pmd);
> + }
> +
Wondering, should simply the arch take care of that and we can just clal
pmd_swp_mksoft_dirty / pmd_mksoft_dirty?
> return pmd;
> }
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 45b725c3dc03..c6ca62f8ecf3 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1538,7 +1538,7 @@ static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
> * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
> * will be constantly true.
> */
> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> + if (!pgtable_soft_dirty_supported())
> return false;
>
That should be handled with the above never-set-VM_SOFTDIRTY.
> /*
> diff --git a/mm/mremap.c b/mm/mremap.c
> index e618a706aff5..7beb3114dbf5 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -162,12 +162,13 @@ static pte_t move_soft_dirty_pte(pte_t pte)
> * Set soft dirty bit so we can notice
> * in userspace the ptes were moved.
> */
> -#ifdef CONFIG_MEM_SOFT_DIRTY
> - if (pte_present(pte))
> - pte = pte_mksoft_dirty(pte);
> - else if (is_swap_pte(pte))
> - pte = pte_swp_mksoft_dirty(pte);
> -#endif
> + if (pgtable_soft_dirty_supported()) {
> + if (pte_present(pte))
> + pte = pte_mksoft_dirty(pte);
> + else if (is_swap_pte(pte))
> + pte = pte_swp_mksoft_dirty(pte);
> + }
> +
> return pte;
> }
>
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported()
2025-09-11 13:09 ` David Hildenbrand
@ 2025-09-12 8:22 ` Chunyan Zhang
2025-09-12 8:41 ` David Hildenbrand
0 siblings, 1 reply; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-12 8:22 UTC (permalink / raw)
To: David Hildenbrand
Cc: Chunyan Zhang, linux-riscv, linux-fsdevel, linux-mm,
linux-kernel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, Deepak Gupta, Ved Shanbhogue, Alexander Viro,
Christian Brauner, Jan Kara, Andrew Morton, Peter Xu,
Arnd Bergmann, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie
Hi David,
On Thu, 11 Sept 2025 at 21:09, David Hildenbrand <david@redhat.com> wrote:
>
> On 11.09.25 11:55, Chunyan Zhang wrote:
> > Some platforms can customize the PTE PMD entry soft-dirty bit making it
> > unavailable even if the architecture provides the resource.
> >
> > Add an API which architectures can define their specific implementations
> > to detect if soft-dirty bit is available on which device the kernel is
> > running.
>
> Thinking to myself: maybe pgtable_supports_soft_dirty() would read better
> Whatever you prefer.
I will use pgtable_supports_* in the next version.
> >
> > Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
> > ---
> > fs/proc/task_mmu.c | 17 ++++++++++++++++-
> > include/linux/pgtable.h | 12 ++++++++++++
> > mm/debug_vm_pgtable.c | 10 +++++-----
> > mm/huge_memory.c | 13 +++++++------
> > mm/internal.h | 2 +-
> > mm/mremap.c | 13 +++++++------
> > mm/userfaultfd.c | 10 ++++------
> > 7 files changed, 52 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 29cca0e6d0ff..9e8083b6d4cd 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -1058,7 +1058,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> > * -Werror=unterminated-string-initialization warning
> > * with GCC 15
> > */
> > - static const char mnemonics[BITS_PER_LONG][3] = {
> > + static char mnemonics[BITS_PER_LONG][3] = {
> > /*
> > * In case if we meet a flag we don't know about.
> > */
> > @@ -1129,6 +1129,16 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> > [ilog2(VM_SEALED)] = "sl",
> > #endif
> > };
> > +/*
> > + * We should remove the VM_SOFTDIRTY flag if the soft-dirty bit is
> > + * unavailable on which the kernel is running, even if the architecture
> > + * provides the resource and soft-dirty is compiled in.
> > + */
> > +#ifdef CONFIG_MEM_SOFT_DIRTY
> > + if (!pgtable_soft_dirty_supported())
> > + mnemonics[ilog2(VM_SOFTDIRTY)][0] = 0;
> > +#endif
>
> You can now drop the ifdef.
Ok, you mean define VM_SOFTDIRTY 0x08000000 no matter if
MEM_SOFT_DIRTY is compiled in, right?
Then I need memcpy() to set mnemonics[ilog2(VM_SOFTDIRTY)] here.
>
> But, I wonder if could we instead just stop setting the flag. Then we don't
> have to worry about any VM_SOFTDIRTY checks.
>
> Something like the following
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 892fe5dbf9de0..8b8bf63a32ef7 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -783,6 +783,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
> static inline void vm_flags_init(struct vm_area_struct *vma,
> vm_flags_t flags)
> {
> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
> ACCESS_PRIVATE(vma, __vm_flags) = flags;
> }
>
> @@ -801,6 +802,7 @@ static inline void vm_flags_reset(struct vm_area_struct *vma,
> static inline void vm_flags_reset_once(struct vm_area_struct *vma,
> vm_flags_t flags)
> {
> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
> vma_assert_write_locked(vma);
> WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
> }
> @@ -808,6 +810,7 @@ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
> static inline void vm_flags_set(struct vm_area_struct *vma,
> vm_flags_t flags)
> {
> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
> vma_start_write(vma);
> ACCESS_PRIVATE(vma, __vm_flags) |= flags;
> }
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 5fd3b80fda1d5..40cb3fbf9a247 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1451,8 +1451,10 @@ static struct vm_area_struct *__install_special_mapping(
> return ERR_PTR(-ENOMEM);
>
> vma_set_range(vma, addr, addr + len, 0);
> - vm_flags_init(vma, (vm_flags | mm->def_flags |
> - VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
> + vm_flags |= mm->def_flags | VM_DONTEXPAND;
Why use '|=' rather than not directly setting vm_flags which is an
uninitialized variable?
> + if (pgtable_soft_dirty_supported())
> + vm_flags |= VM_SOFTDIRTY;
> + vm_flags_init(vma, vm_flags & ~VM_LOCKED_MASK);
> vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>
> vma->vm_ops = ops;
> diff --git a/mm/vma.c b/mm/vma.c
> index abe0da33c8446..16a1ed2a6199c 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -2551,7 +2551,8 @@ static void __mmap_complete(struct mmap_state *map, struct vm_area_struct *vma)
> * then new mapped in-place (which must be aimed as
> * a completely new data area).
> */
> - vm_flags_set(vma, VM_SOFTDIRTY);
> + if (pgtable_soft_dirty_supported())
> + vm_flags_set(vma, VM_SOFTDIRTY);
>
> vma_set_page_prot(vma);
> }
> @@ -2819,7 +2820,8 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
> mm->data_vm += len >> PAGE_SHIFT;
> if (vm_flags & VM_LOCKED)
> mm->locked_vm += (len >> PAGE_SHIFT);
> - vm_flags_set(vma, VM_SOFTDIRTY);
> + if (pgtable_soft_dirty_supported())
> + vm_flags_set(vma, VM_SOFTDIRTY);
> return 0;
>
> mas_store_fail:
> diff --git a/mm/vma_exec.c b/mm/vma_exec.c
> index 922ee51747a68..c06732a5a620a 100644
> --- a/mm/vma_exec.c
> +++ b/mm/vma_exec.c
> @@ -107,6 +107,7 @@ int relocate_vma_down(struct vm_area_struct *vma, unsigned long shift)
> int create_init_stack_vma(struct mm_struct *mm, struct vm_area_struct **vmap,
> unsigned long *top_mem_p)
> {
> + unsigned long flags = VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP;
> int err;
> struct vm_area_struct *vma = vm_area_alloc(mm);
>
> @@ -137,7 +138,9 @@ int create_init_stack_vma(struct mm_struct *mm, struct vm_area_struct **vmap,
> BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
> vma->vm_end = STACK_TOP_MAX;
> vma->vm_start = vma->vm_end - PAGE_SIZE;
> - vm_flags_init(vma, VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP);
> + if (pgtable_soft_dirty_supported())
> + flags |= VM_SOFTDIRTY;
> + vm_flags_init(vma, flags);
> vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>
> err = insert_vm_struct(mm, vma);
>
>
> > +
> > size_t i;
> >
> > seq_puts(m, "VmFlags: ");
> > @@ -1531,6 +1541,8 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr,
> > static inline void clear_soft_dirty(struct vm_area_struct *vma,
> > unsigned long addr, pte_t *pte)
> > {
> > + if (!pgtable_soft_dirty_supported())
> > + return;
> > /*
> > * The soft-dirty tracker uses #PF-s to catch writes
> > * to pages, so write-protect the pte as well. See the
> > @@ -1566,6 +1578,9 @@ static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma,
> > {
> > pmd_t old, pmd = *pmdp;
> >
> > + if (!pgtable_soft_dirty_supported())
> > + return;
> > +
> > if (pmd_present(pmd)) {
> > /* See comment in change_huge_pmd() */
> > old = pmdp_invalidate(vma, addr, pmdp);
>
> That would all be handled with the above never-set-VM_SOFTDIRTY.
Sorry I'm not sure I understand here, you mean no longer need #ifdef
CONFIG_MEM_SOFT_DIRTY for these function definitions, right?
>
> > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> > index 4c035637eeb7..2a3578a4ae4c 100644
> > --- a/include/linux/pgtable.h
> > +++ b/include/linux/pgtable.h
> > @@ -1537,6 +1537,18 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
> > #define arch_start_context_switch(prev) do {} while (0)
> > #endif
> >
> > +/*
> > + * Some platforms can customize the PTE soft-dirty bit making it unavailable
> > + * even if the architecture provides the resource.
> > + * Adding this API allows architectures to add their own checks for the
> > + * devices on which the kernel is running.
> > + * Note: When overiding it, please make sure the CONFIG_MEM_SOFT_DIRTY
> > + * is part of this macro.
> > + */
> > +#ifndef pgtable_soft_dirty_supported
> > +#define pgtable_soft_dirty_supported() IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)
> > +#endif
> > +
> > #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
> > #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
> > static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
> > diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> > index 830107b6dd08..b32ce2b0b998 100644
> > --- a/mm/debug_vm_pgtable.c
> > +++ b/mm/debug_vm_pgtable.c
> > @@ -690,7 +690,7 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args)
> > {
> > pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
> >
> > - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> > + if (!pgtable_soft_dirty_supported())
> > return;
> >
> > pr_debug("Validating PTE soft dirty\n");
> > @@ -702,7 +702,7 @@ static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args)
> > {
> > pte_t pte;
> >
> > - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> > + if (!pgtable_soft_dirty_supported())
> > return;
> >
> > pr_debug("Validating PTE swap soft dirty\n");
> > @@ -718,7 +718,7 @@ static void __init pmd_soft_dirty_tests(struct pgtable_debug_args *args)
> > {
> > pmd_t pmd;
> >
> > - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> > + if (!pgtable_soft_dirty_supported())
> > return;
> >
> > if (!has_transparent_hugepage())
> > @@ -734,8 +734,8 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args)
> > {
> > pmd_t pmd;
> >
> > - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) ||
> > - !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
> > + if (!pgtable_soft_dirty_supported() ||
> > + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
> > return;
> >
> > if (!has_transparent_hugepage())
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 9c38a95e9f09..218d430a2ec6 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2271,12 +2271,13 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
> >
> > static pmd_t move_soft_dirty_pmd(pmd_t pmd)
> > {
> > -#ifdef CONFIG_MEM_SOFT_DIRTY
> > - if (unlikely(is_pmd_migration_entry(pmd)))
> > - pmd = pmd_swp_mksoft_dirty(pmd);
> > - else if (pmd_present(pmd))
> > - pmd = pmd_mksoft_dirty(pmd);
> > -#endif
> > + if (pgtable_soft_dirty_supported()) {
> > + if (unlikely(is_pmd_migration_entry(pmd)))
> > + pmd = pmd_swp_mksoft_dirty(pmd);
> > + else if (pmd_present(pmd))
> > + pmd = pmd_mksoft_dirty(pmd);
> > + }
> > +
>
> Wondering, should simply the arch take care of that and we can just clal
> pmd_swp_mksoft_dirty / pmd_mksoft_dirty?
Ok, I think I can do that in another patchset.
>
> > return pmd;
> > }
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 45b725c3dc03..c6ca62f8ecf3 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -1538,7 +1538,7 @@ static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
> > * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
> > * will be constantly true.
> > */
> > - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> > + if (!pgtable_soft_dirty_supported())
> > return false;
> >
>
> That should be handled with the above never-set-VM_SOFTDIRTY.
We don't need to check if (!pgtable_soft_dirty_supported()) if I
understand correctly.
Thanks for the review,
Chunyan
>
> > /*
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index e618a706aff5..7beb3114dbf5 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -162,12 +162,13 @@ static pte_t move_soft_dirty_pte(pte_t pte)
> > * Set soft dirty bit so we can notice
> > * in userspace the ptes were moved.
> > */
> > -#ifdef CONFIG_MEM_SOFT_DIRTY
> > - if (pte_present(pte))
> > - pte = pte_mksoft_dirty(pte);
> > - else if (is_swap_pte(pte))
> > - pte = pte_swp_mksoft_dirty(pte);
> > -#endif
> > + if (pgtable_soft_dirty_supported()) {
> > + if (pte_present(pte))
> > + pte = pte_mksoft_dirty(pte);
> > + else if (is_swap_pte(pte))
> > + pte = pte_swp_mksoft_dirty(pte);
> > + }
> > +
> > return pte;
> > }
> >
> --
> Cheers
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported()
2025-09-12 8:22 ` Chunyan Zhang
@ 2025-09-12 8:41 ` David Hildenbrand
2025-09-12 9:21 ` Chunyan Zhang
0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2025-09-12 8:41 UTC (permalink / raw)
To: Chunyan Zhang
Cc: Chunyan Zhang, linux-riscv, linux-fsdevel, linux-mm,
linux-kernel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, Deepak Gupta, Ved Shanbhogue, Alexander Viro,
Christian Brauner, Jan Kara, Andrew Morton, Peter Xu,
Arnd Bergmann, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie
[...]
>>> +/*
>>> + * We should remove the VM_SOFTDIRTY flag if the soft-dirty bit is
>>> + * unavailable on which the kernel is running, even if the architecture
>>> + * provides the resource and soft-dirty is compiled in.
>>> + */
>>> +#ifdef CONFIG_MEM_SOFT_DIRTY
>>> + if (!pgtable_soft_dirty_supported())
>>> + mnemonics[ilog2(VM_SOFTDIRTY)][0] = 0;
>>> +#endif
>>
>> You can now drop the ifdef.
>
> Ok, you mean define VM_SOFTDIRTY 0x08000000 no matter if
> MEM_SOFT_DIRTY is compiled in, right?
>
> Then I need memcpy() to set mnemonics[ilog2(VM_SOFTDIRTY)] here.
The whole hunk will not be required when we make sure VM_SOFTDIRTY never
gets set, correct?
>
>>
>> But, I wonder if could we instead just stop setting the flag. Then we don't
>> have to worry about any VM_SOFTDIRTY checks.
>>
>> Something like the following
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 892fe5dbf9de0..8b8bf63a32ef7 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -783,6 +783,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
>> static inline void vm_flags_init(struct vm_area_struct *vma,
>> vm_flags_t flags)
>> {
>> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
>> ACCESS_PRIVATE(vma, __vm_flags) = flags;
>> }
>>
>> @@ -801,6 +802,7 @@ static inline void vm_flags_reset(struct vm_area_struct *vma,
>> static inline void vm_flags_reset_once(struct vm_area_struct *vma,
>> vm_flags_t flags)
>> {
>> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
>> vma_assert_write_locked(vma);
>> WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
>> }
>> @@ -808,6 +810,7 @@ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
>> static inline void vm_flags_set(struct vm_area_struct *vma,
>> vm_flags_t flags)
>> {
>> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
>> vma_start_write(vma);
>> ACCESS_PRIVATE(vma, __vm_flags) |= flags;
>> }
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 5fd3b80fda1d5..40cb3fbf9a247 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1451,8 +1451,10 @@ static struct vm_area_struct *__install_special_mapping(
>> return ERR_PTR(-ENOMEM);
>>
>> vma_set_range(vma, addr, addr + len, 0);
>> - vm_flags_init(vma, (vm_flags | mm->def_flags |
>> - VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
>> + vm_flags |= mm->def_flags | VM_DONTEXPAND;
>
> Why use '|=' rather than not directly setting vm_flags which is an
> uninitialized variable?
vm_flags is passed in by the caller?
But just to clarify: this code was just a quick hack, adjust it as you need.
[...]
>>>
>>> + if (!pgtable_soft_dirty_supported())
>>> + return;
>>> +
>>> if (pmd_present(pmd)) {
>>> /* See comment in change_huge_pmd() */
>>> old = pmdp_invalidate(vma, addr, pmdp);
>>
>> That would all be handled with the above never-set-VM_SOFTDIRTY.
I meant that there is no need to add the pgtable_soft_dirty_supported()
check.
>
> Sorry I'm not sure I understand here, you mean no longer need #ifdef
> CONFIG_MEM_SOFT_DIRTY for these function definitions, right?
Likely we could drop them. VM_SOFTDIRTY will never be set so the code
will not be invoked.
And for architectures where VM_SOFTDIRTY is never even possible
(!CONFIG_MEM_SOFT_DIRTY) we keep it as 0.
That way, the compiler can even optimize out all of that code because
"vma->vm_flags & VM_SOFTDIRTY" -> "vma->vm_flags & 0"
will never be true.
>
>>
>>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>>> index 4c035637eeb7..2a3578a4ae4c 100644
>>> --- a/include/linux/pgtable.h
>>> +++ b/include/linux/pgtable.h
>>> @@ -1537,6 +1537,18 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>> #define arch_start_context_switch(prev) do {} while (0)
>>> #endif
>>>
>>> +/*
>>> + * Some platforms can customize the PTE soft-dirty bit making it unavailable
>>> + * even if the architecture provides the resource.
>>> + * Adding this API allows architectures to add their own checks for the
>>> + * devices on which the kernel is running.
>>> + * Note: When overiding it, please make sure the CONFIG_MEM_SOFT_DIRTY
>>> + * is part of this macro.
>>> + */
>>> +#ifndef pgtable_soft_dirty_supported
>>> +#define pgtable_soft_dirty_supported() IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)
>>> +#endif
>>> +
>>> #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
>>> #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
>>> static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
>>> index 830107b6dd08..b32ce2b0b998 100644
>>> --- a/mm/debug_vm_pgtable.c
>>> +++ b/mm/debug_vm_pgtable.c
>>> @@ -690,7 +690,7 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args)
>>> {
>>> pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
>>>
>>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>>> + if (!pgtable_soft_dirty_supported())
>>> return;
>>>
>>> pr_debug("Validating PTE soft dirty\n");
>>> @@ -702,7 +702,7 @@ static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args)
>>> {
>>> pte_t pte;
>>>
>>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>>> + if (!pgtable_soft_dirty_supported())
>>> return;
>>>
>>> pr_debug("Validating PTE swap soft dirty\n");
>>> @@ -718,7 +718,7 @@ static void __init pmd_soft_dirty_tests(struct pgtable_debug_args *args)
>>> {
>>> pmd_t pmd;
>>>
>>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>>> + if (!pgtable_soft_dirty_supported())
>>> return;
>>>
>>> if (!has_transparent_hugepage())
>>> @@ -734,8 +734,8 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args)
>>> {
>>> pmd_t pmd;
>>>
>>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) ||
>>> - !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
>>> + if (!pgtable_soft_dirty_supported() ||
>>> + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
>>> return;
>>>
>>> if (!has_transparent_hugepage())
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 9c38a95e9f09..218d430a2ec6 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2271,12 +2271,13 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
>>>
>>> static pmd_t move_soft_dirty_pmd(pmd_t pmd)
>>> {
>>> -#ifdef CONFIG_MEM_SOFT_DIRTY
>>> - if (unlikely(is_pmd_migration_entry(pmd)))
>>> - pmd = pmd_swp_mksoft_dirty(pmd);
>>> - else if (pmd_present(pmd))
>>> - pmd = pmd_mksoft_dirty(pmd);
>>> -#endif
>>> + if (pgtable_soft_dirty_supported()) {
>>> + if (unlikely(is_pmd_migration_entry(pmd)))
>>> + pmd = pmd_swp_mksoft_dirty(pmd);
>>> + else if (pmd_present(pmd))
>>> + pmd = pmd_mksoft_dirty(pmd);
>>> + }
>>> +
>>
>> Wondering, should simply the arch take care of that and we can just clal
>> pmd_swp_mksoft_dirty / pmd_mksoft_dirty?
>
I think we have that already in include/linux/pgtable.h:
We have stubs that just don't do anything.
For riscv support you would handle runtime-enablement in these helpers.
>
>>
>>> return pmd;
>>> }
>>>
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 45b725c3dc03..c6ca62f8ecf3 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -1538,7 +1538,7 @@ static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
>>> * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
>>> * will be constantly true.
>>> */
>>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
>>> + if (!pgtable_soft_dirty_supported())
>>> return false;
>>>
>>
>> That should be handled with the above never-set-VM_SOFTDIRTY.
>
> We don't need to check if (!pgtable_soft_dirty_supported()) if I
> understand correctly.
Hm, let me think about that. No, I think this has to stay as the comment
says, so this case here is special.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v11 2/5] mm: userfaultfd: Add pgtable_uffd_wp_supported()
2025-09-11 9:55 ` [PATCH v11 2/5] mm: userfaultfd: Add pgtable_uffd_wp_supported() Chunyan Zhang
@ 2025-09-12 8:54 ` David Hildenbrand
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2025-09-12 8:54 UTC (permalink / raw)
To: Chunyan Zhang, linux-riscv, linux-fsdevel, linux-mm, linux-kernel
Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Deepak Gupta, Ved Shanbhogue, Alexander Viro, Christian Brauner,
Jan Kara, Andrew Morton, Peter Xu, Arnd Bergmann,
Lorenzo Stoakes, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Axel Rasmussen,
Yuanchu Xie, Chunyan Zhang
On 11.09.25 11:55, Chunyan Zhang wrote:
> Some platforms can customize the PTE/PMD entry uffd-wp bit making
> it unavailable even if the architecture provides the resource.
> This patch adds a macro API that allows architectures to define their
> specific implementations to check if the uffd-wp bit is available
> on which device the kernel is running.
If you change the name of the sofdirty thingy, adjust that one here as well.
>
> Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
> ---
> fs/userfaultfd.c | 23 ++++++++--------
> include/asm-generic/pgtable_uffd.h | 11 ++++++++
> include/linux/mm_inline.h | 7 +++++
> include/linux/userfaultfd_k.h | 44 +++++++++++++++++++-----------
> mm/memory.c | 6 ++--
> 5 files changed, 62 insertions(+), 29 deletions(-)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 54c6cc7fe9c6..b549c327d7ad 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1270,9 +1270,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
> if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING)
> vm_flags |= VM_UFFD_MISSING;
> if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) {
> -#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
> - goto out;
> -#endif
> + if (!pgtable_uffd_wp_supported())
> + goto out;
> +
> vm_flags |= VM_UFFD_WP;
I like that, similar to the softdirty thing we will simply not set the flag.
> }
> if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) {
> @@ -1980,14 +1980,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
> uffdio_api.features &=
> ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
> #endif
> -#ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
> - uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP;
> -#endif
> -#ifndef CONFIG_PTE_MARKER_UFFD_WP
> - uffdio_api.features &= ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM;
> - uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
> - uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
> -#endif
> + if (!pgtable_uffd_wp_supported())
> + uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP;
> +
> + if (!IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP) ||
> + !pgtable_uffd_wp_supported()) {
I wonder if we would want to have a helper for that like
static inline bool uffd_supports_wp_marker(void)
{
return pgtable_uffd_wp_supported() && IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP);
}
That should clean all of this futher up.
> + uffdio_api.features &= ~UFFD_FEATURE_WP_HUGETLBFS_SHMEM;
> + uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
> + uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
> + }
>
> ret = -EINVAL;
> if (features & ~uffdio_api.features)
> diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h
> index 828966d4c281..895d68ece0e7 100644
> --- a/include/asm-generic/pgtable_uffd.h
> +++ b/include/asm-generic/pgtable_uffd.h
> @@ -1,6 +1,17 @@
> #ifndef _ASM_GENERIC_PGTABLE_UFFD_H
> #define _ASM_GENERIC_PGTABLE_UFFD_H
>
> +/*
> + * Some platforms can customize the uffd-wp bit, making it unavailable
> + * even if the architecture provides the resource.
> + * Adding this API allows architectures to add their own checks for the
> + * devices on which the kernel is running.
> + * Note: When overiding it, please make sure the
s/overiding/overriding/
> + * CONFIG_HAVE_ARCH_USERFAULTFD_WP is part of this macro.
> + */
> +#ifndef pgtable_uffd_wp_supported
> +#define pgtable_uffd_wp_supported() IS_ENABLED(CONFIG_HAVE_ARCH_USERFAULTFD_WP)
> +#endif
> #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP
> static __always_inline int pte_uffd_wp(pte_t pte)
> {
> diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
> index 89b518ff097e..38845b8b79ff 100644
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -571,6 +571,13 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr,
> pte_t *pte, pte_t pteval)
> {
> #ifdef CONFIG_PTE_MARKER_UFFD_WP
> + /*
> + * Some platforms can customize the PTE uffd-wp bit, making it unavailable
> + * even if the architecture allows providing the PTE resource.
> + */
> + if (!pgtable_uffd_wp_supported())
> + return false;
> +
Likely we could use the uffd_supports_wp_marker() wrapper here isntead and
remove the #ifdef.
> bool arm_uffd_pte = false;
>
> /* The current status of the pte should be "cleared" before calling */
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index c0e716aec26a..6264b56ae961 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -228,15 +228,15 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
> if (wp_async && (vm_flags == VM_UFFD_WP))
> return true;
>
> -#ifndef CONFIG_PTE_MARKER_UFFD_WP
> /*
> * If user requested uffd-wp but not enabled pte markers for
> * uffd-wp, then shmem & hugetlbfs are not supported but only
> * anonymous.
> */
> - if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma))
> + if ((!IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP) ||
> + !pgtable_uffd_wp_supported()) &&
This would also use the helper.
> + (vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma))
> return false;
> -#endif
>
> /* By default, allow any of anon|shmem|hugetlb */
> return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
> @@ -437,8 +437,11 @@ static inline bool userfaultfd_wp_use_markers(struct vm_area_struct *vma)
> static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry)
> {
> #ifdef CONFIG_PTE_MARKER_UFFD_WP
> - return is_pte_marker_entry(entry) &&
> - (pte_marker_get(entry) & PTE_MARKER_UFFD_WP);
> + if (pgtable_uffd_wp_supported())
> + return is_pte_marker_entry(entry) &&
> + (pte_marker_get(entry) & PTE_MARKER_UFFD_WP);
> + else
> + return false;
if (!uffd_supports_wp_marker())
return false;
return is_pte_marker_entry(entry) &&
(pte_marker_get(entry) & PTE_MARKER_UFFD_WP);
> #else
> return false;
> #endif
> @@ -447,14 +450,19 @@ static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry)
> static inline bool pte_marker_uffd_wp(pte_t pte)
> {
> #ifdef CONFIG_PTE_MARKER_UFFD_WP
Simialrly here, just do a
if (!uffd_supports_wp_marker())
return false;
and remove the ifdef
> #endif
> @@ -467,14 +475,18 @@ static inline bool pte_marker_uffd_wp(pte_t pte)
> static inline bool pte_swp_uffd_wp_any(pte_t pte)
> {
> #ifdef CONFIG_PTE_MARKER_UFFD_WP
Same here.
> - if (!is_swap_pte(pte))
> - return false;
> + if (pgtable_uffd_wp_supported()) {
> + if (!is_swap_pte(pte))
> + return false;
>
> - if (pte_swp_uffd_wp(pte))
> - return true;
> + if (pte_swp_uffd_wp(pte))
> + return true;
>
> - if (pte_marker_uffd_wp(pte))
> - return true;
> + if (pte_marker_uffd_wp(pte))
> + return true;
> + } else {
> + return false;
> + }
> #endif
> return false;
> }
> diff --git a/mm/memory.c b/mm/memory.c
> index 0ba4f6b71847..4eb05c5f487b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1465,7 +1465,9 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma,
> {
> bool was_installed = false;
>
> -#ifdef CONFIG_PTE_MARKER_UFFD_WP
> + if (!IS_ENABLED(CONFIG_PTE_MARKER_UFFD_WP) || !pgtable_uffd_wp_supported())
> + return false;
> +
Same here.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported()
2025-09-12 8:41 ` David Hildenbrand
@ 2025-09-12 9:21 ` Chunyan Zhang
2025-09-12 13:32 ` David Hildenbrand
0 siblings, 1 reply; 12+ messages in thread
From: Chunyan Zhang @ 2025-09-12 9:21 UTC (permalink / raw)
To: David Hildenbrand
Cc: Chunyan Zhang, linux-riscv, linux-fsdevel, linux-mm,
linux-kernel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, Deepak Gupta, Ved Shanbhogue, Alexander Viro,
Christian Brauner, Jan Kara, Andrew Morton, Peter Xu,
Arnd Bergmann, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie
On Fri, 12 Sept 2025 at 16:41, David Hildenbrand <david@redhat.com> wrote:
>
> [...]
>
> >>> +/*
> >>> + * We should remove the VM_SOFTDIRTY flag if the soft-dirty bit is
> >>> + * unavailable on which the kernel is running, even if the architecture
> >>> + * provides the resource and soft-dirty is compiled in.
> >>> + */
> >>> +#ifdef CONFIG_MEM_SOFT_DIRTY
> >>> + if (!pgtable_soft_dirty_supported())
> >>> + mnemonics[ilog2(VM_SOFTDIRTY)][0] = 0;
> >>> +#endif
> >>
> >> You can now drop the ifdef.
> >
> > Ok, you mean define VM_SOFTDIRTY 0x08000000 no matter if
> > MEM_SOFT_DIRTY is compiled in, right?
> >
> > Then I need memcpy() to set mnemonics[ilog2(VM_SOFTDIRTY)] here.
>
> The whole hunk will not be required when we make sure VM_SOFTDIRTY never
> gets set, correct?
Oh no, this hunk code does not set vmflag.
The mnemonics[ilog2(VM_SOFTDIRTY)] is for show_smap_vma_flags(),
something like below:
# cat /proc/1/smaps
5555605c7000-555560680000 r-xp 00000000 fe:00 19
/bin/busybox
...
VmFlags: rd ex mr mw me sd
'sd' is for soft-dirty
I think this is still needed, right?
>
> >
> >>
> >> But, I wonder if could we instead just stop setting the flag. Then we don't
> >> have to worry about any VM_SOFTDIRTY checks.
> >>
> >> Something like the following
> >>
> >> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >> index 892fe5dbf9de0..8b8bf63a32ef7 100644
> >> --- a/include/linux/mm.h
> >> +++ b/include/linux/mm.h
> >> @@ -783,6 +783,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
> >> static inline void vm_flags_init(struct vm_area_struct *vma,
> >> vm_flags_t flags)
> >> {
> >> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
> >> ACCESS_PRIVATE(vma, __vm_flags) = flags;
> >> }
> >>
> >> @@ -801,6 +802,7 @@ static inline void vm_flags_reset(struct vm_area_struct *vma,
> >> static inline void vm_flags_reset_once(struct vm_area_struct *vma,
> >> vm_flags_t flags)
> >> {
> >> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
> >> vma_assert_write_locked(vma);
> >> WRITE_ONCE(ACCESS_PRIVATE(vma, __vm_flags), flags);
> >> }
> >> @@ -808,6 +810,7 @@ static inline void vm_flags_reset_once(struct vm_area_struct *vma,
> >> static inline void vm_flags_set(struct vm_area_struct *vma,
> >> vm_flags_t flags)
> >> {
> >> + VM_WARN_ON_ONCE(!pgtable_soft_dirty_supported() && (flags & VM_SOFTDIRTY));
> >> vma_start_write(vma);
> >> ACCESS_PRIVATE(vma, __vm_flags) |= flags;
> >> }
> >> diff --git a/mm/mmap.c b/mm/mmap.c
> >> index 5fd3b80fda1d5..40cb3fbf9a247 100644
> >> --- a/mm/mmap.c
> >> +++ b/mm/mmap.c
> >> @@ -1451,8 +1451,10 @@ static struct vm_area_struct *__install_special_mapping(
> >> return ERR_PTR(-ENOMEM);
> >>
> >> vma_set_range(vma, addr, addr + len, 0);
> >> - vm_flags_init(vma, (vm_flags | mm->def_flags |
> >> - VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
> >> + vm_flags |= mm->def_flags | VM_DONTEXPAND;
> >
> > Why use '|=' rather than not directly setting vm_flags which is an
> > uninitialized variable?
>
> vm_flags is passed in by the caller?
>
Then the original code seems wrong.
> But just to clarify: this code was just a quick hack, adjust it as you need.
Got it.
>
> [...]
>
> >>>
> >>> + if (!pgtable_soft_dirty_supported())
> >>> + return;
> >>> +
> >>> if (pmd_present(pmd)) {
> >>> /* See comment in change_huge_pmd() */
> >>> old = pmdp_invalidate(vma, addr, pmdp);
> >>
> >> That would all be handled with the above never-set-VM_SOFTDIRTY.
>
> I meant that there is no need to add the pgtable_soft_dirty_supported()
> check.
Ok I will take a look.
>
> >
> > Sorry I'm not sure I understand here, you mean no longer need #ifdef
> > CONFIG_MEM_SOFT_DIRTY for these function definitions, right?
>
> Likely we could drop them. VM_SOFTDIRTY will never be set so the code
> will not be invoked.
The relationship of VM_SOFTDIRTY and clear_soft_dirty_pmd() is not
very direct from the first sight, let me take a further look.
>
> And for architectures where VM_SOFTDIRTY is never even possible
> (!CONFIG_MEM_SOFT_DIRTY) we keep it as 0.
Ok.
>
> That way, the compiler can even optimize out all of that code because
>
> "vma->vm_flags & VM_SOFTDIRTY" -> "vma->vm_flags & 0"
>
> will never be true.
>
> >
> >>
> >>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> >>> index 4c035637eeb7..2a3578a4ae4c 100644
> >>> --- a/include/linux/pgtable.h
> >>> +++ b/include/linux/pgtable.h
> >>> @@ -1537,6 +1537,18 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
> >>> #define arch_start_context_switch(prev) do {} while (0)
> >>> #endif
> >>>
> >>> +/*
> >>> + * Some platforms can customize the PTE soft-dirty bit making it unavailable
> >>> + * even if the architecture provides the resource.
> >>> + * Adding this API allows architectures to add their own checks for the
> >>> + * devices on which the kernel is running.
> >>> + * Note: When overiding it, please make sure the CONFIG_MEM_SOFT_DIRTY
> >>> + * is part of this macro.
> >>> + */
> >>> +#ifndef pgtable_soft_dirty_supported
> >>> +#define pgtable_soft_dirty_supported() IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)
> >>> +#endif
> >>> +
> >>> #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
> >>> #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION
> >>> static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
> >>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> >>> index 830107b6dd08..b32ce2b0b998 100644
> >>> --- a/mm/debug_vm_pgtable.c
> >>> +++ b/mm/debug_vm_pgtable.c
> >>> @@ -690,7 +690,7 @@ static void __init pte_soft_dirty_tests(struct pgtable_debug_args *args)
> >>> {
> >>> pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot);
> >>>
> >>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> >>> + if (!pgtable_soft_dirty_supported())
> >>> return;
> >>>
> >>> pr_debug("Validating PTE soft dirty\n");
> >>> @@ -702,7 +702,7 @@ static void __init pte_swap_soft_dirty_tests(struct pgtable_debug_args *args)
> >>> {
> >>> pte_t pte;
> >>>
> >>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> >>> + if (!pgtable_soft_dirty_supported())
> >>> return;
> >>>
> >>> pr_debug("Validating PTE swap soft dirty\n");
> >>> @@ -718,7 +718,7 @@ static void __init pmd_soft_dirty_tests(struct pgtable_debug_args *args)
> >>> {
> >>> pmd_t pmd;
> >>>
> >>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> >>> + if (!pgtable_soft_dirty_supported())
> >>> return;
> >>>
> >>> if (!has_transparent_hugepage())
> >>> @@ -734,8 +734,8 @@ static void __init pmd_swap_soft_dirty_tests(struct pgtable_debug_args *args)
> >>> {
> >>> pmd_t pmd;
> >>>
> >>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) ||
> >>> - !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
> >>> + if (!pgtable_soft_dirty_supported() ||
> >>> + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION))
> >>> return;
> >>>
> >>> if (!has_transparent_hugepage())
> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>> index 9c38a95e9f09..218d430a2ec6 100644
> >>> --- a/mm/huge_memory.c
> >>> +++ b/mm/huge_memory.c
> >>> @@ -2271,12 +2271,13 @@ static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
> >>>
> >>> static pmd_t move_soft_dirty_pmd(pmd_t pmd)
> >>> {
> >>> -#ifdef CONFIG_MEM_SOFT_DIRTY
> >>> - if (unlikely(is_pmd_migration_entry(pmd)))
> >>> - pmd = pmd_swp_mksoft_dirty(pmd);
> >>> - else if (pmd_present(pmd))
> >>> - pmd = pmd_mksoft_dirty(pmd);
> >>> -#endif
> >>> + if (pgtable_soft_dirty_supported()) {
> >>> + if (unlikely(is_pmd_migration_entry(pmd)))
> >>> + pmd = pmd_swp_mksoft_dirty(pmd);
> >>> + else if (pmd_present(pmd))
> >>> + pmd = pmd_mksoft_dirty(pmd);
> >>> + }
> >>> +
> >>
> >> Wondering, should simply the arch take care of that and we can just clal
> >> pmd_swp_mksoft_dirty / pmd_mksoft_dirty?
> >
>
> I think we have that already in include/linux/pgtable.h:
>
> We have stubs that just don't do anything.
>
> For riscv support you would handle runtime-enablement in these helpers.
>
> >
> >>
> >>> return pmd;
> >>> }
> >>>
> >>> diff --git a/mm/internal.h b/mm/internal.h
> >>> index 45b725c3dc03..c6ca62f8ecf3 100644
> >>> --- a/mm/internal.h
> >>> +++ b/mm/internal.h
> >>> @@ -1538,7 +1538,7 @@ static inline bool vma_soft_dirty_enabled(struct vm_area_struct *vma)
> >>> * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY)
> >>> * will be constantly true.
> >>> */
> >>> - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY))
> >>> + if (!pgtable_soft_dirty_supported())
> >>> return false;
> >>>
> >>
> >> That should be handled with the above never-set-VM_SOFTDIRTY.
> >
> > We don't need to check if (!pgtable_soft_dirty_supported()) if I
> > understand correctly.
> Hm, let me think about that. No, I think this has to stay as the comment
> says, so this case here is special.
I will cook a new version and then we can discuss further based on the
new patch.
Thanks for your review,
Chunyan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported()
2025-09-12 9:21 ` Chunyan Zhang
@ 2025-09-12 13:32 ` David Hildenbrand
0 siblings, 0 replies; 12+ messages in thread
From: David Hildenbrand @ 2025-09-12 13:32 UTC (permalink / raw)
To: Chunyan Zhang
Cc: Chunyan Zhang, linux-riscv, linux-fsdevel, linux-mm,
linux-kernel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Alexandre Ghiti, Deepak Gupta, Ved Shanbhogue, Alexander Viro,
Christian Brauner, Jan Kara, Andrew Morton, Peter Xu,
Arnd Bergmann, Lorenzo Stoakes, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
Axel Rasmussen, Yuanchu Xie
On 12.09.25 11:21, Chunyan Zhang wrote:
> On Fri, 12 Sept 2025 at 16:41, David Hildenbrand <david@redhat.com> wrote:
>>
>> [...]
>>
>>>>> +/*
>>>>> + * We should remove the VM_SOFTDIRTY flag if the soft-dirty bit is
>>>>> + * unavailable on which the kernel is running, even if the architecture
>>>>> + * provides the resource and soft-dirty is compiled in.
>>>>> + */
>>>>> +#ifdef CONFIG_MEM_SOFT_DIRTY
>>>>> + if (!pgtable_soft_dirty_supported())
>>>>> + mnemonics[ilog2(VM_SOFTDIRTY)][0] = 0;
>>>>> +#endif
>>>>
>>>> You can now drop the ifdef.
>>>
>>> Ok, you mean define VM_SOFTDIRTY 0x08000000 no matter if
>>> MEM_SOFT_DIRTY is compiled in, right?
>>>
>>> Then I need memcpy() to set mnemonics[ilog2(VM_SOFTDIRTY)] here.
>>
>> The whole hunk will not be required when we make sure VM_SOFTDIRTY never
>> gets set, correct?
>
> Oh no, this hunk code does not set vmflag.
> The mnemonics[ilog2(VM_SOFTDIRTY)] is for show_smap_vma_flags(),
> something like below:
> # cat /proc/1/smaps
> 5555605c7000-555560680000 r-xp 00000000 fe:00 19
> /bin/busybox
> ...
> VmFlags: rd ex mr mw me sd
>
> 'sd' is for soft-dirty
>
> I think this is still needed, right?
If nobody sets VM_SOFTDIRTY in vma->vm_flags, then we will never print it.
So you can just leave the "#ifdef CONFIG_MEM_SOFT_DIRTY" as is to handle
the VM_SOFTDIRTY=0 case.
So you should not have to change anything in show_smap_vma_flags().
[...]
>>>> That should be handled with the above never-set-VM_SOFTDIRTY.
>>>
>>> We don't need to check if (!pgtable_soft_dirty_supported()) if I
>>> understand correctly.
>> Hm, let me think about that. No, I think this has to stay as the comment
>> says, so this case here is special.
>
> I will cook a new version and then we can discuss further based on the
> new patch.
Sounds good!
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-09-12 13:32 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-11 9:55 [PATCH v11 0/5] riscv: mm: Add soft-dirty and uffd-wp support Chunyan Zhang
2025-09-11 9:55 ` [PATCH v11 1/5] mm: softdirty: Add pgtable_soft_dirty_supported() Chunyan Zhang
2025-09-11 13:09 ` David Hildenbrand
2025-09-12 8:22 ` Chunyan Zhang
2025-09-12 8:41 ` David Hildenbrand
2025-09-12 9:21 ` Chunyan Zhang
2025-09-12 13:32 ` David Hildenbrand
2025-09-11 9:55 ` [PATCH v11 2/5] mm: userfaultfd: Add pgtable_uffd_wp_supported() Chunyan Zhang
2025-09-12 8:54 ` David Hildenbrand
2025-09-11 9:56 ` [PATCH v11 3/5] riscv: Add RISC-V Svrsw60t59b extension support Chunyan Zhang
2025-09-11 9:56 ` [PATCH v11 4/5] riscv: mm: Add soft-dirty page tracking support Chunyan Zhang
2025-09-11 9:56 ` [PATCH v11 5/5] riscv: mm: Add userfaultfd write-protect support Chunyan Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox