linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
@ 2025-10-06 14:13 Deepanshu Kartikey
  0 siblings, 0 replies; 6+ messages in thread
From: Deepanshu Kartikey @ 2025-10-06 14:13 UTC (permalink / raw)
  To: osalvador, broonie
  Cc: muchun.song, david, akpm, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, linux-mm, linux-kernel

Hi Oscar,

I agree - checking __vma_shareable_lock() directly in __unmap_hugepage_range() 
is cleaner and simpler. I'll send v4 with that approach shortly.

Thanks for the feedback.

Best regards,
Deepanshu


^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
@ 2025-10-06  7:54 Deepanshu Kartikey
  2025-10-06 12:27 ` Oscar Salvador
  0 siblings, 1 reply; 6+ messages in thread
From: Deepanshu Kartikey @ 2025-10-06  7:54 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, broonie
  Cc: linux-mm, linux-kernel

Apologies for confusion. Please disregard this patch . As, it is creating a lot of confusion.
I will create patch v4 with proper commit message and previous link


Deepanshu


^ permalink raw reply	[flat|nested] 6+ messages in thread
* [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
@ 2025-10-03 17:45 Deepanshu Kartikey
  2025-10-06  7:37 ` David Hildenbrand
  2025-10-06 13:28 ` Oscar Salvador
  0 siblings, 2 replies; 6+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 17:45 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, broonie
  Cc: linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+f26d7c75c26ec19790e7

When hugetlb_vmdelete_list() cannot acquire the shareable lock for a VMA,
the previous fix (dd83609b8898) skipped the entire VMA to avoid lock
assertions in huge_pmd_unshare(). However, this prevented pages from being
unmapped and freed, causing a regression in fallocate(PUNCH_HOLE) operations
where pages were not freed immediately, as reported by Mark Brown.

The issue occurs because:
1. hugetlb_vmdelete_list() calls hugetlb_vma_trylock_write()
2. For shareable VMAs, this attempts to acquire the shareable lock
3. If successful, huge_pmd_unshare() expects the lock to be held
4. huge_pmd_unshare() asserts the lock via hugetlb_vma_assert_locked()

The v2 fix avoided calling code that requires locks, but this prevented
page unmapping entirely, breaking the expected behavior where pages are
freed during punch hole operations.

This v3 fix takes a different approach: instead of skipping the entire VMA,
we skip only the PMD unsharing operation when we don't have the required
lock, while still proceeding with page unmapping. This is safe because:

- PMD unsharing is an optimization to reduce shared page table overhead
- Page unmapping can proceed safely with just the VMA write lock
- Pages get freed immediately as expected by PUNCH_HOLE operations
- The PMD metadata will be cleaned up when the VMA is destroyed

We introduce a new ZAP_FLAG_NO_UNSHARE flag that communicates to
__unmap_hugepage_range() that it should skip huge_pmd_unshare() while
still clearing page table entries and freeing pages.

Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Reported-by: Mark Brown <broonie@kernel.org>
Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
Tested-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>

---
Changes in v3:
- Instead of skipping entire VMAs, skip only PMD unsharing operation
- Add ZAP_FLAG_NO_UNSHARE flag to communicate lock status
- Ensure pages are still unmapped and freed immediately
- Fixes regression in fallocate PUNCH_HOLE reported by Mark Brown

Changes in v2:
- Check for shareable lock before trylock to avoid lock leaks
- Add comment explaining why non-shareable VMAs are skipped
---
 fs/hugetlbfs/inode.c | 22 ++++++++++++----------
 include/linux/mm.h   |  2 ++
 mm/hugetlb.c         |  3 ++-
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9c94ed8c3ab0..519497bc1045 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -474,29 +474,31 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
 	vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) {
 		unsigned long v_start;
 		unsigned long v_end;
+		bool have_shareable_lock;
+		zap_flags_t local_flags = zap_flags;
 
 		if (!hugetlb_vma_trylock_write(vma))
 			continue;
-
+
+		have_shareable_lock = __vma_shareable_lock(vma);
+
 		/*
-		 * Skip VMAs without shareable locks. Per the design in commit
-		 * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
-		 * called after this function with proper locking.
+		 * If we can't get the shareable lock, set ZAP_FLAG_NO_UNSHARE
+		 * to skip PMD unsharing. We still proceed with unmapping to
+		 * ensure pages are properly freed, which is critical for punch
+		 * hole operations that expect immediate page freeing.
 		 */
-		if (!__vma_shareable_lock(vma))
-			goto skip;
-
+		if (!have_shareable_lock)
+			local_flags |= ZAP_FLAG_NO_UNSHARE;
 		v_start = vma_offset_start(vma, start);
 		v_end = vma_offset_end(vma, end);
 
-		unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags);
-
+		unmap_hugepage_range(vma, v_start, v_end, NULL, local_flags);
 		/*
 		 * Note that vma lock only exists for shared/non-private
 		 * vmas.  Therefore, lock is not held when calling
 		 * unmap_hugepage_range for private vmas.
 		 */
-skip:
 		hugetlb_vma_unlock_write(vma);
 	}
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 06978b4dbeb8..9126ab44320d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2395,6 +2395,8 @@ struct zap_details {
 #define  ZAP_FLAG_DROP_MARKER        ((__force zap_flags_t) BIT(0))
 /* Set in unmap_vmas() to indicate a final unmap call.  Only used by hugetlb */
 #define  ZAP_FLAG_UNMAP              ((__force zap_flags_t) BIT(1))
+/* Skip PMD unsharing when unmapping hugetlb ranges without shareable lock */
+#define  ZAP_FLAG_NO_UNSHARE         ((__force zap_flags_t) BIT(2))
 
 #ifdef CONFIG_SCHED_MM_CID
 void sched_mm_cid_before_execve(struct task_struct *t);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cac826cb61f..c4257aa568fe 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5885,7 +5885,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		}
 
 		ptl = huge_pte_lock(h, mm, ptep);
-		if (huge_pmd_unshare(mm, vma, address, ptep)) {
+		if (!(zap_flags & ZAP_FLAG_NO_UNSHARE) &&
+		      huge_pmd_unshare(mm, vma, address, ptep)) {
 			spin_unlock(ptl);
 			tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
 			force_flush = true;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-06 14:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-06 14:13 [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable Deepanshu Kartikey
  -- strict thread matches above, loose matches on Subject: below --
2025-10-06  7:54 Deepanshu Kartikey
2025-10-06 12:27 ` Oscar Salvador
2025-10-03 17:45 Deepanshu Kartikey
2025-10-06  7:37 ` David Hildenbrand
2025-10-06 13:28 ` Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox