linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-10-03 11:18 Deepanshu Kartikey
  0 siblings, 0 replies; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 11:18 UTC (permalink / raw)
  To: broonie
  Cc: muchun.song, osalvador, david, akpm, linux-mm, linux-kernel,
	syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV


Hi Mark,

Thank you for the report and bisection. I see the hugetlbfs-madvise 
test is failing with unexpected free huge pages count.

I'm investigating why the VMAs skipped in my patch aren't getting 
their pages freed properly. I'll analyze the test code and work on 
a fix.

I'll follow up once I understand the root cause.

Deepanshu


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-10-03 15:09 Deepanshu Kartikey
  0 siblings, 0 replies; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 15:09 UTC (permalink / raw)
  To: akpm, broonie
  Cc: muchun.song, osalvador, david, linux-mm, linux-kernel,
	syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV

Hi,

My previous patch dd83609b8898 ("hugetlbfs: skip VMAs without shareable 
locks in hugetlb_vmdelete_list") fixed a WARNING but introduced a 
regression where pages aren't freed during punch hole operations, as 
reported by Mark Brown.

The issue is that skipping the entire VMA means pages don't get unmapped, 
so they can't be freed.

I'm considering the following fix approach:

1. Add a new ZAP_FLAG_NO_UNSHARE flag
2. In hugetlb_vmdelete_list(), try to get the shareable lock
3. If we can't get it, set ZAP_FLAG_NO_UNSHARE and proceed anyway
4. In __unmap_hugepage_range(), skip huge_pmd_unshare() if flag is set
5. But still clear page table entries so pages get freed

This way:
- For truncate: same behavior as before (might skip unsharing)
- For punch hole: pages get freed immediately (fixes regression)
- No WARNING (we don't call huge_pmd_unshare without lock)

The trade-off is that PMD metadata may not be cleaned up immediately 
when we can't get the shareable lock, but it will be freed when the 
VMA is destroyed.

Does this approach seem reasonable? Or is there a better way to handle 
this?

Thanks,
Deepanshu


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-10-03 15:09 Deepanshu Kartikey
  0 siblings, 0 replies; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 15:09 UTC (permalink / raw)
  To: akpm, broonie
  Cc: muchun.song, osalvador, david, linux-mm, inux-kernel,
	syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV

Hi,

My previous patch dd83609b8898 ("hugetlbfs: skip VMAs without shareable 
locks in hugetlb_vmdelete_list") fixed a WARNING but introduced a 
regression where pages aren't freed during punch hole operations, as 
reported by Mark Brown.

The issue is that skipping the entire VMA means pages don't get unmapped, 
so they can't be freed.

I'm considering the following fix approach:

1. Add a new ZAP_FLAG_NO_UNSHARE flag
2. In hugetlb_vmdelete_list(), try to get the shareable lock
3. If we can't get it, set ZAP_FLAG_NO_UNSHARE and proceed anyway
4. In __unmap_hugepage_range(), skip huge_pmd_unshare() if flag is set
5. But still clear page table entries so pages get freed

This way:
- For truncate: same behavior as before (might skip unsharing)
- For punch hole: pages get freed immediately (fixes regression)
- No WARNING (we don't call huge_pmd_unshare without lock)

The trade-off is that PMD metadata may not be cleaned up immediately 
when we can't get the shareable lock, but it will be freed when the 
VMA is destroyed.

Does this approach seem reasonable? Or is there a better way to handle 
this?

Thanks,
Deepanshu


^ permalink raw reply	[flat|nested] 8+ messages in thread
* [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-09-26  3:32 Deepanshu Kartikey
  2025-10-03 10:57 ` Mark Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-09-26  3:32 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm
  Cc: linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+f26d7c75c26ec19790e7

hugetlb_vmdelete_list() uses trylock to acquire VMA locks during truncate
operations. As per the original design in commit 40549ba8f8e0 ("hugetlb:
use new vma_lock for pmd sharing synchronization"), if the trylock fails
or the VMA has no lock, it should skip that VMA. Any remaining mapped
pages are handled by remove_inode_hugepages() which is called after
hugetlb_vmdelete_list() and uses proper lock ordering to guarantee
unmapping success.

Currently, when hugetlb_vma_trylock_write() returns success (1) for VMAs
without shareable locks, the code proceeds to call unmap_hugepage_range().
This causes assertion failures in huge_pmd_unshare() → hugetlb_vma_assert_locked()
because no lock is actually held:

  WARNING: CPU: 1 PID: 6594 Comm: syz.0.28 Not tainted
  Call Trace:
   hugetlb_vma_assert_locked+0x1dd/0x250
   huge_pmd_unshare+0x2c8/0x540
   __unmap_hugepage_range+0x6e3/0x1aa0
   unmap_hugepage_range+0x32e/0x410
   hugetlb_vmdelete_list+0x189/0x1f0

Fix by using goto to ensure locks acquired by trylock are always released, even
when skipping VMAs without shareable locks.

Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f26d7c75c26ec19790e7
Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>

---
Changes in v2:
- Use goto to unlock after trylock, avoiding lock leaks (Andrew Morton)
- Add comment explaining why non-shareable VMAs are skipped (Andrew Morton)
---
 fs/hugetlbfs/inode.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9e0625167517..9fa7c72ac1a6 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -488,6 +488,14 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
 		if (!hugetlb_vma_trylock_write(vma))
 			continue;
 
+		/*
+		 * Skip VMAs without shareable locks. Per the design in commit
+		 * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
+		 * called after this function with proper locking.
+		 */
+		if (!__vma_shareable_lock(vma))
+			goto skip;
+
 		v_start = vma_offset_start(vma, start);
 		v_end = vma_offset_end(vma, end);
 
@@ -498,7 +506,8 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
 		 * vmas.  Therefore, lock is not held when calling
 		 * unmap_hugepage_range for private vmas.
 		 */
-		hugetlb_vma_unlock_write(vma);
+skip:
+		hugetlb_vma_unlock_write(vma);
 	}
 }
 
-- 
2.43.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-10-22 11:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-03 11:18 [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list Deepanshu Kartikey
  -- strict thread matches above, loose matches on Subject: below --
2025-10-03 15:09 Deepanshu Kartikey
2025-10-03 15:09 Deepanshu Kartikey
2025-09-26  3:32 Deepanshu Kartikey
2025-10-03 10:57 ` Mark Brown
2025-10-20 17:52   ` Mark Brown
2025-10-21 21:10     ` Andrew Morton
2025-10-22 11:40       ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox