linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/2] mm: add huge pfnmap support for remap_pfn_range()
@ 2025-10-16 11:27 Yin Tirui
  2025-10-16 11:27 ` [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv Yin Tirui
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Yin Tirui @ 2025-10-16 11:27 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, ziy, baolin.wang, npache, ryan.roberts, dev.jain, baohua,
	catalin.marinas, will, paul.walmsley, palmer, aou, alex,
	anshuman.khandual, yangyicong, ardb, willy, apopple,
	samuel.holland, luxu.kernel, abrestic, yongxuan.wang, linux-mm,
	linux-kernel, linux-arm-kernel, linux-riscv
  Cc: wangkefeng.wang, chenjun102, yintirui

v2:
- remove "nohugepfnmap" boot option and "pfnmap_max_page_shift" variable.
- zap_deposited_table for non-special pmd.
- move set_pmd_at() inside pmd_lock.
- prevent PMD mapping creation when pgtable allocation fails.
- defer the refactor of pte_clrhuge() to a separate patch series. For now,
  add a TODO to track this.

v1: https://lore.kernel.org/linux-mm/20250923133104.926672-1-yintirui@huawei.com/

Overview
========
This patch series adds huge page support for remap_pfn_range(),
automatically creating huge mappings when prerequisites are satisfied
(size, alignment, architecture support, etc.) and falling back to
normal page mappings otherwise.

This work builds on Peter Xu's previous efforts on huge pfnmap
support [0].

TODO
====
- Add PUD-level huge page support. Currently, only PMD-level huge
pages are supported.
- Consider the logic related to vmap_page_range and extract
reusable common code.
- Refactor pte_clrhuge() and related functions.

Tests Done
==========
- Cross-build tests.
- Performance tests with custom device driver implementing mmap()
  with remap_pfn_range():
    - lat_mem_rd benchmark modified to use mmap(device_fd) instead of
      malloc() shows around 40% improvement in memory access latency with
      huge page support compared to normal page mappings.

      numactl -C 0 lat_mem_rd -t 4096M (stride=64)
      Memory Size (MB)    Without Huge Mapping With Huge Mapping Improvement
      ----------------    -----------------    --------------    -----------
      64.00               148.858 ns           100.780 ns        32.3%
      128.00              164.745 ns           103.537 ns        37.2%
      256.00              169.907 ns           103.179 ns        39.3%
      512.00              171.285 ns           103.072 ns        39.8%
      1024.00             173.054 ns           103.055 ns        40.4%
      2048.00             172.820 ns           103.091 ns        40.3%
      4096.00             172.877 ns           103.115 ns        40.4%

    - Custom memory copy operations on mmap(device_fd) show around 18% performance 
      improvement with huge page support compared to normal page mappings.

      numactl -C 0 memcpy_test (memory copy performance test)
      Memory Size (MB)    Without Huge Mapping With Huge Mapping Improvement
      ----------------    -----------------    --------------    -----------
      1024.00             95.76 ms             77.91 ms          18.6%
      2048.00             190.87 ms            155.64 ms         18.5%
      4096.00             380.84 ms            311.45 ms         18.2%

[0] https://lore.kernel.org/all/20240826204353.2228736-2-peterx@redhat.com/T/#u

Yin Tirui (2):
  pgtable: add pte_clrhuge() implementation for arm64 and riscv
  mm: add PMD-level huge page support for remap_pfn_range()

 arch/arm64/include/asm/pgtable.h |  8 +++++++
 arch/riscv/include/asm/pgtable.h |  5 ++++
 include/linux/pgtable.h          |  6 ++++-
 mm/huge_memory.c                 | 26 +++++++++++++++------
 mm/memory.c                      | 40 ++++++++++++++++++++++++++++++++
 5 files changed, 77 insertions(+), 8 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv
  2025-10-16 11:27 [PATCH RFC v2 0/2] mm: add huge pfnmap support for remap_pfn_range() Yin Tirui
@ 2025-10-16 11:27 ` Yin Tirui
  2025-10-16 18:22   ` Matthew Wilcox
  2025-10-16 11:27 ` [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range() Yin Tirui
  2025-10-16 16:23 ` [syzbot ci] Re: mm: add huge pfnmap " syzbot ci
  2 siblings, 1 reply; 7+ messages in thread
From: Yin Tirui @ 2025-10-16 11:27 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, ziy, baolin.wang, npache, ryan.roberts, dev.jain, baohua,
	catalin.marinas, will, paul.walmsley, palmer, aou, alex,
	anshuman.khandual, yangyicong, ardb, willy, apopple,
	samuel.holland, luxu.kernel, abrestic, yongxuan.wang, linux-mm,
	linux-kernel, linux-arm-kernel, linux-riscv
  Cc: wangkefeng.wang, chenjun102, yintirui

Add pte_clrhuge() helper function for architectures that enable
ARCH_SUPPORTS_HUGE_PFNMAP to clear huge page attributes from PTE
entries.

This function provides the inverse operation of pte_mkhuge() and will
be needed for upcoming huge page splitting, where PTE entries derived
from huge page mappings need to have their huge page attributes cleared.

Future work will refactor pfn_pte() to automatically filter huge bits,
removing the need for pte_clrhuge() across all architectures.

Signed-off-by: Yin Tirui <yintirui@huawei.com>
---
 arch/arm64/include/asm/pgtable.h | 8 ++++++++
 arch/riscv/include/asm/pgtable.h | 5 +++++
 2 files changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index abd2dee416b3..244755bad46f 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -366,6 +366,14 @@ static inline pte_t pte_mkinvalid(pte_t pte)
 	return pte;
 }
 
+static inline pte_t pte_clrhuge(pte_t pte)
+{
+	pteval_t mask = PTE_TYPE_MASK & ~PTE_VALID;
+	pteval_t val = PTE_TYPE_PAGE & ~PTE_VALID;
+
+	return __pte((pte_val(pte) & ~mask) | val);
+}
+
 static inline pmd_t pmd_mkcont(pmd_t pmd)
 {
 	return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 815067742939..b0a20ddf780a 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -455,6 +455,11 @@ static inline pte_t pte_mkhuge(pte_t pte)
 	return pte;
 }
 
+static inline pte_t pte_clrhuge(pte_t pte)
+{
+	return pte;
+}
+
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
 #define pte_leaf_size(pte)	(pte_napot(pte) ?				\
 					napot_cont_size(napot_cont_order(pte)) :\
-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range()
  2025-10-16 11:27 [PATCH RFC v2 0/2] mm: add huge pfnmap support for remap_pfn_range() Yin Tirui
  2025-10-16 11:27 ` [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv Yin Tirui
@ 2025-10-16 11:27 ` Yin Tirui
  2025-10-16 16:23 ` [syzbot ci] Re: mm: add huge pfnmap " syzbot ci
  2 siblings, 0 replies; 7+ messages in thread
From: Yin Tirui @ 2025-10-16 11:27 UTC (permalink / raw)
  To: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, ziy, baolin.wang, npache, ryan.roberts, dev.jain, baohua,
	catalin.marinas, will, paul.walmsley, palmer, aou, alex,
	anshuman.khandual, yangyicong, ardb, willy, apopple,
	samuel.holland, luxu.kernel, abrestic, yongxuan.wang, linux-mm,
	linux-kernel, linux-arm-kernel, linux-riscv
  Cc: wangkefeng.wang, chenjun102, yintirui

Add PMD-level huge page support to remap_pfn_range(), automatically
creating huge mappings when prerequisites are satisfied (size, alignment,
architecture support, etc.) and falling back to normal page mappings
otherwise.

Implement special huge PMD splitting by utilizing the pgtable deposit/
withdraw mechanism. When splitting is needed, the deposited pgtable is
withdrawn and populated with individual PTEs created from the original
huge mapping, using pte_clrhuge() to clear huge page attributes.

Update arch_needs_pgtable_deposit() to return true when PMD pfnmap
support is enabled, ensuring proper pgtable management for huge
pfnmap operations.

Signed-off-by: Yin Tirui <yintirui@huawei.com>
---
 include/linux/pgtable.h |  6 +++++-
 mm/huge_memory.c        | 26 +++++++++++++++++++-------
 mm/memory.c             | 40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 25a7257052ff..9ae015cb67a0 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1025,7 +1025,11 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
 #endif
 
 #ifndef arch_needs_pgtable_deposit
-#define arch_needs_pgtable_deposit() (false)
+#define arch_needs_pgtable_deposit arch_needs_pgtable_deposit
+static inline bool arch_needs_pgtable_deposit(void)
+{
+	return IS_ENABLED(CONFIG_ARCH_SUPPORTS_PMD_PFNMAP);
+}
 #endif
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9c38a95e9f09..b5eecd8fc1bf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2857,14 +2857,26 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 
 	if (!vma_is_anonymous(vma)) {
 		old_pmd = pmdp_huge_clear_flush(vma, haddr, pmd);
-		/*
-		 * We are going to unmap this huge page. So
-		 * just go ahead and zap it
-		 */
-		if (arch_needs_pgtable_deposit())
-			zap_deposited_table(mm, pmd);
-		if (!vma_is_dax(vma) && vma_is_special_huge(vma))
+		if (!vma_is_dax(vma) && vma_is_special_huge(vma)) {
+			pte_t entry;
+
+			pgtable = pgtable_trans_huge_withdraw(mm, pmd);
+			if (unlikely(!pgtable))
+				return;
+			pmd_populate(mm, &_pmd, pgtable);
+			pte = pte_offset_map(&_pmd, haddr);
+			entry = pte_clrhuge(pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)));
+			set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
+			pte_unmap(pte);
+
+			smp_wmb(); /* make pte visible before pmd */
+			pmd_populate(mm, pmd, pgtable);
 			return;
+		} else if (arch_needs_pgtable_deposit()) {
+			/* Zap for the non-special mappings. */
+			zap_deposited_table(mm, pmd);
+		}
+
 		if (unlikely(is_pmd_migration_entry(old_pmd))) {
 			swp_entry_t entry;
 
diff --git a/mm/memory.c b/mm/memory.c
index 0ba4f6b71847..4e8f2248a86f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2705,6 +2705,40 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd,
 	return err;
 }
 
+#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
+static int remap_try_huge_pmd(struct mm_struct *mm, pmd_t *pmd,
+			unsigned long addr, unsigned long end,
+			unsigned long pfn, pgprot_t prot)
+{
+	pgtable_t pgtable;
+	spinlock_t *ptl;
+
+	if ((end - addr) != PMD_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(addr, PMD_SIZE))
+		return 0;
+
+	if (!IS_ALIGNED(pfn, HPAGE_PMD_NR))
+		return 0;
+
+	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
+		return 0;
+
+	pgtable = pte_alloc_one(mm);
+	if (unlikely(!pgtable))
+		return 0;
+
+	mm_inc_nr_ptes(mm);
+	ptl = pmd_lock(mm, pmd);
+	set_pmd_at(mm, addr, pmd, pmd_mkspecial(pmd_mkhuge(pfn_pmd(pfn, prot))));
+	pgtable_trans_huge_deposit(mm, pmd, pgtable);
+	spin_unlock(ptl);
+
+	return 1;
+}
+#endif
+
 static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
 			unsigned long addr, unsigned long end,
 			unsigned long pfn, pgprot_t prot)
@@ -2720,6 +2754,12 @@ static inline int remap_pmd_range(struct mm_struct *mm, pud_t *pud,
 	VM_BUG_ON(pmd_trans_huge(*pmd));
 	do {
 		next = pmd_addr_end(addr, end);
+#ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP
+		if (remap_try_huge_pmd(mm, pmd, addr, next,
+				pfn + (addr >> PAGE_SHIFT), prot)) {
+			continue;
+		}
+#endif
 		err = remap_pte_range(mm, pmd, addr, next,
 				pfn + (addr >> PAGE_SHIFT), prot);
 		if (err)
-- 
2.43.0



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [syzbot ci] Re: mm: add huge pfnmap support for remap_pfn_range()
  2025-10-16 11:27 [PATCH RFC v2 0/2] mm: add huge pfnmap support for remap_pfn_range() Yin Tirui
  2025-10-16 11:27 ` [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv Yin Tirui
  2025-10-16 11:27 ` [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range() Yin Tirui
@ 2025-10-16 16:23 ` syzbot ci
  2 siblings, 0 replies; 7+ messages in thread
From: syzbot ci @ 2025-10-16 16:23 UTC (permalink / raw)
  To: abrestic, akpm, alex, anshuman.khandual, aou, apopple, ardb,
	baohua, baolin.wang, catalin.marinas, chenjun102, david,
	dev.jain, liam.howlett, linux-arm-kernel, linux-kernel, linux-mm,
	linux-riscv, lorenzo.stoakes, luxu.kernel, mhocko, npache,
	palmer, paul.walmsley, rppt, ryan.roberts, samuel.holland,
	surenb, vbabka, wangkefeng.wang, will, willy, yangyicong,
	yintirui, yongxuan.wang, ziy
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v2] mm: add huge pfnmap support for remap_pfn_range()
https://lore.kernel.org/all/20251016112704.179280-1-yintirui@huawei.com
* [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv
* [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range()

and found the following issue:
stack segment fault in pgtable_trans_huge_withdraw

Full report is available here:
https://ci.syzbot.org/series/d04c2914-0d99-4132-89d4-899e22abf904

***

stack segment fault in pgtable_trans_huge_withdraw

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      3a8660878839faadb4f1a6dd72c3179c1df56787
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/9d7864e5-ad3a-4c0d-b21d-86cfc476792e/config
C repro:   https://ci.syzbot.org/findings/b9fca361-413d-4db1-b8b2-1849cd2c50dd/c_repro
syz repro: https://ci.syzbot.org/findings/b9fca361-413d-4db1-b8b2-1849cd2c50dd/syz_repro

Oops: stack segment: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 5968 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:pgtable_trans_huge_withdraw+0x115/0x310 mm/pgtable-generic.c:188
Code: c3 10 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 9d e9 13 00 48 8b 03 48 89 04 24 4c 8d 78 08 4c 89 fd 48 c1 ed 03 <42> 80 7c 2d 00 00 74 08 4c 89 ff e8 7b e9 13 00 49 8b 07 48 8d 48
RSP: 0018:ffffc90003717300 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffea00044848d0 RCX: ffff88816c890000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffff88816db66a23 R09: 1ffff1102db6cd44
R10: dffffc0000000000 R11: ffffed102db6cd45 R12: ffff888112123000
R13: dffffc0000000000 R14: ffff888112123000 R15: 0000000000000008
FS:  000055556cb5f500(0000) GS:ffff88818e70c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001102a4000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 zap_deposited_table mm/huge_memory.c:2169 [inline]
 zap_huge_pmd+0xa25/0xf50 mm/huge_memory.c:2197
 zap_pmd_range mm/memory.c:1926 [inline]
 zap_pud_range mm/memory.c:1975 [inline]
 zap_p4d_range mm/memory.c:1996 [inline]
 unmap_page_range+0x9fe/0x4370 mm/memory.c:2017
 unmap_single_vma mm/memory.c:2060 [inline]
 unmap_vmas+0x399/0x580 mm/memory.c:2104
 exit_mmap+0x240/0xb40 mm/mmap.c:1280
 __mmput+0x118/0x430 kernel/fork.c:1133
 copy_process+0x2910/0x3c00 kernel/fork.c:2460
 kernel_clone+0x21e/0x840 kernel/fork.c:2609
 __do_sys_clone kernel/fork.c:2750 [inline]
 __se_sys_clone kernel/fork.c:2734 [inline]
 __x64_sys_clone+0x18b/0x1e0 kernel/fork.c:2734
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f946958eec9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc41c94258 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007f94697e5fa0 RCX: 00007f946958eec9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000002001000
RBP: 00007f9469611f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007f94697e5fa0 R14: 00007f94697e5fa0 R15: 0000000000000006
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:pgtable_trans_huge_withdraw+0x115/0x310 mm/pgtable-generic.c:188
Code: c3 10 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 9d e9 13 00 48 8b 03 48 89 04 24 4c 8d 78 08 4c 89 fd 48 c1 ed 03 <42> 80 7c 2d 00 00 74 08 4c 89 ff e8 7b e9 13 00 49 8b 07 48 8d 48
RSP: 0018:ffffc90003717300 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffea00044848d0 RCX: ffff88816c890000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffff88816db66a23 R09: 1ffff1102db6cd44
R10: dffffc0000000000 R11: ffffed102db6cd45 R12: ffff888112123000
R13: dffffc0000000000 R14: ffff888112123000 R15: 0000000000000008
FS:  000055556cb5f500(0000) GS:ffff88818e70c000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001102a4000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
   0:	c3                   	ret
   1:	10 48 89             	adc    %cl,-0x77(%rax)
   4:	d8 48 c1             	fmuls  -0x3f(%rax)
   7:	e8 03 42 80 3c       	call   0x3c80420f
   c:	28 00                	sub    %al,(%rax)
   e:	74 08                	je     0x18
  10:	48 89 df             	mov    %rbx,%rdi
  13:	e8 9d e9 13 00       	call   0x13e9b5
  18:	48 8b 03             	mov    (%rbx),%rax
  1b:	48 89 04 24          	mov    %rax,(%rsp)
  1f:	4c 8d 78 08          	lea    0x8(%rax),%r15
  23:	4c 89 fd             	mov    %r15,%rbp
  26:	48 c1 ed 03          	shr    $0x3,%rbp
* 2a:	42 80 7c 2d 00 00    	cmpb   $0x0,0x0(%rbp,%r13,1) <-- trapping instruction
  30:	74 08                	je     0x3a
  32:	4c 89 ff             	mov    %r15,%rdi
  35:	e8 7b e9 13 00       	call   0x13e9b5
  3a:	49 8b 07             	mov    (%r15),%rax
  3d:	48                   	rex.W
  3e:	8d                   	.byte 0x8d
  3f:	48                   	rex.W


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv
  2025-10-16 11:27 ` [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv Yin Tirui
@ 2025-10-16 18:22   ` Matthew Wilcox
  2025-10-18  3:12     ` Yin Tirui
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2025-10-16 18:22 UTC (permalink / raw)
  To: Yin Tirui
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, ziy, baolin.wang, npache, ryan.roberts, dev.jain, baohua,
	catalin.marinas, will, paul.walmsley, palmer, aou, alex,
	anshuman.khandual, yangyicong, ardb, apopple, samuel.holland,
	luxu.kernel, abrestic, yongxuan.wang, linux-mm, linux-kernel,
	linux-arm-kernel, linux-riscv, wangkefeng.wang, chenjun102

On Thu, Oct 16, 2025 at 07:27:03PM +0800, Yin Tirui wrote:
> Add pte_clrhuge() helper function for architectures that enable
> ARCH_SUPPORTS_HUGE_PFNMAP to clear huge page attributes from PTE
> entries.

I really would prefer to see pte_clrhuge() removed first.  Otherwise
this just goes onto the long list of "somebody should clean this up some
day" and nobody ever will.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv
  2025-10-16 18:22   ` Matthew Wilcox
@ 2025-10-18  3:12     ` Yin Tirui
  0 siblings, 0 replies; 7+ messages in thread
From: Yin Tirui @ 2025-10-18  3:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, david, lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb,
	mhocko, ziy, baolin.wang, npache, ryan.roberts, dev.jain, baohua,
	catalin.marinas, will, paul.walmsley, palmer, aou, alex,
	anshuman.khandual, yangyicong, ardb, apopple, samuel.holland,
	luxu.kernel, abrestic, yongxuan.wang, linux-mm, linux-kernel,
	linux-arm-kernel, linux-riscv, wangkefeng.wang, chenjun102



On 10/17/2025 2:22 AM, Matthew Wilcox wrote:
> On Thu, Oct 16, 2025 at 07:27:03PM +0800, Yin Tirui wrote:
>> Add pte_clrhuge() helper function for architectures that enable
>> ARCH_SUPPORTS_HUGE_PFNMAP to clear huge page attributes from PTE
>> entries.
> 
> I really would prefer to see pte_clrhuge() removed first.  Otherwise
> this just goes onto the long list of "somebody should clean this up some
> day" and nobody ever will.
> 

Understood. I'm working on it.
-- 
Best regards,
Yin Tirui



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [syzbot ci] Re: mm: add huge pfnmap support for remap_pfn_range()
  2025-09-23 13:31 [PATCH RFC 0/2] " Yin Tirui
@ 2025-09-23 22:53 ` syzbot ci
  0 siblings, 0 replies; 7+ messages in thread
From: syzbot ci @ 2025-09-23 22:53 UTC (permalink / raw)
  To: abrestic, akpm, alex, anshuman.khandual, aou, apopple, ardb,
	baohua, baolin.wang, catalin.marinas, chenjun102, david,
	dev.jain, liam.howlett, linux-arm-kernel, linux-kernel, linux-mm,
	linux-riscv, lorenzo.stoakes, luxu.kernel, mhocko, npache,
	palmer, paul.walmsley, rppt, ryan.roberts, samuel.holland,
	surenb, vbabka, wangkefeng.wang, will, willy, yangyicong,
	yintirui, yongxuan.wang, ziy
  Cc: syzbot, syzkaller-bugs

syzbot ci has tested the following series

[v1] mm: add huge pfnmap support for remap_pfn_range()
https://lore.kernel.org/all/20250923133104.926672-1-yintirui@huawei.com
* [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv
* [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range()

and found the following issues:
* BUG: non-zero pgtables_bytes on freeing mm: NUM
* stack segment fault in pgtable_trans_huge_withdraw

Full report is available here:
https://ci.syzbot.org/series/633cbff7-ef54-4f3a-9133-71cc271396ee

***

BUG: non-zero pgtables_bytes on freeing mm: NUM

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      07e27ad16399afcd693be20211b0dfae63e0615f
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/72b4b6cf-5400-40d6-94b6-1cfc0e85050d/config
C repro:   https://ci.syzbot.org/findings/3450ef75-3540-4c00-8b33-5625d4aa40ef/c_repro
syz repro: https://ci.syzbot.org/findings/3450ef75-3540-4c00-8b33-5625d4aa40ef/syz_repro

BUG: non-zero pgtables_bytes on freeing mm: 4096


***

stack segment fault in pgtable_trans_huge_withdraw

tree:      torvalds
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base:      07e27ad16399afcd693be20211b0dfae63e0615f
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/72b4b6cf-5400-40d6-94b6-1cfc0e85050d/config
C repro:   https://ci.syzbot.org/findings/dcfb72b5-c263-48da-830a-7f51aaa927db/c_repro
syz repro: https://ci.syzbot.org/findings/dcfb72b5-c263-48da-830a-7f51aaa927db/syz_repro

Oops: stack segment: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6000 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:pgtable_trans_huge_withdraw+0x115/0x310 mm/pgtable-generic.c:188
Code: c3 10 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 5d 38 13 00 48 8b 03 48 89 04 24 4c 8d 78 08 4c 89 fd 48 c1 ed 03 <42> 80 7c 2d 00 00 74 08 4c 89 ff e8 3b 38 13 00 49 8b 07 48 8d 48
RSP: 0018:ffffc90002d5f300 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffea0000fb3dd0 RCX: ffff888107769cc0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffff888022b90843 R09: 1ffff11004572108
R10: dffffc0000000000 R11: ffffed1004572109 R12: ffff88803ecf7000
R13: dffffc0000000000 R14: ffff88803ecf7000 R15: 0000000000000008
FS:  0000555576e7a500(0000) GS:ffff8880b8612000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000107d74000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 zap_deposited_table mm/huge_memory.c:2177 [inline]
 zap_huge_pmd+0xa25/0xf50 mm/huge_memory.c:2205
 zap_pmd_range mm/memory.c:1798 [inline]
 zap_pud_range mm/memory.c:1847 [inline]
 zap_p4d_range mm/memory.c:1868 [inline]
 unmap_page_range+0x9fe/0x4370 mm/memory.c:1889
 unmap_single_vma mm/memory.c:1932 [inline]
 unmap_vmas+0x399/0x580 mm/memory.c:1976
 exit_mmap+0x248/0xb50 mm/mmap.c:1280
 __mmput+0x118/0x430 kernel/fork.c:1129
 copy_process+0x2910/0x3c00 kernel/fork.c:2454
 kernel_clone+0x21e/0x840 kernel/fork.c:2605
 __do_sys_clone kernel/fork.c:2748 [inline]
 __se_sys_clone kernel/fork.c:2732 [inline]
 __x64_sys_clone+0x18b/0x1e0 kernel/fork.c:2732
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f96b638ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc07e618c8 EFLAGS: 00000206 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007f96b65d5fa0 RCX: 00007f96b638ec29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000002001000
RBP: 00007f96b6411e41 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007f96b65d5fa0 R14: 00007f96b65d5fa0 R15: 0000000000000006
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:pgtable_trans_huge_withdraw+0x115/0x310 mm/pgtable-generic.c:188
Code: c3 10 48 89 d8 48 c1 e8 03 42 80 3c 28 00 74 08 48 89 df e8 5d 38 13 00 48 8b 03 48 89 04 24 4c 8d 78 08 4c 89 fd 48 c1 ed 03 <42> 80 7c 2d 00 00 74 08 4c 89 ff e8 3b 38 13 00 49 8b 07 48 8d 48
RSP: 0018:ffffc90002d5f300 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffea0000fb3dd0 RCX: ffff888107769cc0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffff888022b90843 R09: 1ffff11004572108
R10: dffffc0000000000 R11: ffffed1004572109 R12: ffff88803ecf7000
R13: dffffc0000000000 R14: ffff88803ecf7000 R15: 0000000000000008
FS:  0000555576e7a500(0000) GS:ffff8880b8612000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000107d74000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
   0:	c3                   	ret
   1:	10 48 89             	adc    %cl,-0x77(%rax)
   4:	d8 48 c1             	fmuls  -0x3f(%rax)
   7:	e8 03 42 80 3c       	call   0x3c80420f
   c:	28 00                	sub    %al,(%rax)
   e:	74 08                	je     0x18
  10:	48 89 df             	mov    %rbx,%rdi
  13:	e8 5d 38 13 00       	call   0x133875
  18:	48 8b 03             	mov    (%rbx),%rax
  1b:	48 89 04 24          	mov    %rax,(%rsp)
  1f:	4c 8d 78 08          	lea    0x8(%rax),%r15
  23:	4c 89 fd             	mov    %r15,%rbp
  26:	48 c1 ed 03          	shr    $0x3,%rbp
* 2a:	42 80 7c 2d 00 00    	cmpb   $0x0,0x0(%rbp,%r13,1) <-- trapping instruction
  30:	74 08                	je     0x3a
  32:	4c 89 ff             	mov    %r15,%rdi
  35:	e8 3b 38 13 00       	call   0x133875
  3a:	49 8b 07             	mov    (%r15),%rax
  3d:	48                   	rex.W
  3e:	8d                   	.byte 0x8d
  3f:	48                   	rex.W


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-18  3:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-16 11:27 [PATCH RFC v2 0/2] mm: add huge pfnmap support for remap_pfn_range() Yin Tirui
2025-10-16 11:27 ` [PATCH RFC 1/2] pgtable: add pte_clrhuge() implementation for arm64 and riscv Yin Tirui
2025-10-16 18:22   ` Matthew Wilcox
2025-10-18  3:12     ` Yin Tirui
2025-10-16 11:27 ` [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range() Yin Tirui
2025-10-16 16:23 ` [syzbot ci] Re: mm: add huge pfnmap " syzbot ci
  -- strict thread matches above, loose matches on Subject: below --
2025-09-23 13:31 [PATCH RFC 0/2] " Yin Tirui
2025-09-23 22:53 ` [syzbot ci] " syzbot ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox