[PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
@ 2025-10-03 17:45 Deepanshu Kartikey
  2025-10-06  7:37 ` David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 17:45 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, broonie
  Cc: linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+f26d7c75c26ec19790e7

When hugetlb_vmdelete_list() cannot acquire the shareable lock for a VMA,
the previous fix (dd83609b8898) skipped the entire VMA to avoid lock
assertions in huge_pmd_unshare(). However, this prevented pages from being
unmapped and freed, causing a regression in fallocate(PUNCH_HOLE) operations
where pages were not freed immediately, as reported by Mark Brown.

The issue occurs because:
1. hugetlb_vmdelete_list() calls hugetlb_vma_trylock_write()
2. For shareable VMAs, this attempts to acquire the shareable lock
3. If successful, huge_pmd_unshare() expects the lock to be held
4. huge_pmd_unshare() asserts the lock via hugetlb_vma_assert_locked()

The v2 fix avoided calling code that requires locks, but this prevented
page unmapping entirely, breaking the expected behavior where pages are
freed during punch hole operations.

This v3 fix takes a different approach: instead of skipping the entire VMA,
we skip only the PMD unsharing operation when we don't have the required
lock, while still proceeding with page unmapping. This is safe because:

- PMD unsharing is an optimization to reduce shared page table overhead
- Page unmapping can proceed safely with just the VMA write lock
- Pages get freed immediately as expected by PUNCH_HOLE operations
- The PMD metadata will be cleaned up when the VMA is destroyed

We introduce a new ZAP_FLAG_NO_UNSHARE flag that communicates to
__unmap_hugepage_range() that it should skip huge_pmd_unshare() while
still clearing page table entries and freeing pages.

Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Reported-by: Mark Brown <broonie@kernel.org>
Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
Tested-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>

---
Changes in v3:
- Instead of skipping entire VMAs, skip only PMD unsharing operation
- Add ZAP_FLAG_NO_UNSHARE flag to communicate lock status
- Ensure pages are still unmapped and freed immediately
- Fixes regression in fallocate PUNCH_HOLE reported by Mark Brown

Changes in v2:
- Check for shareable lock before trylock to avoid lock leaks
- Add comment explaining why non-shareable VMAs are skipped
---
 fs/hugetlbfs/inode.c | 22 ++++++++++++----------
 include/linux/mm.h   |  2 ++
 mm/hugetlb.c         |  3 ++-
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9c94ed8c3ab0..519497bc1045 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -474,29 +474,31 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
 	vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) {
 		unsigned long v_start;
 		unsigned long v_end;
+		bool have_shareable_lock;
+		zap_flags_t local_flags = zap_flags;
 
 		if (!hugetlb_vma_trylock_write(vma))
 			continue;
-
+
+		have_shareable_lock = __vma_shareable_lock(vma);
+
 		/*
-		 * Skip VMAs without shareable locks. Per the design in commit
-		 * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
-		 * called after this function with proper locking.
+		 * If we can't get the shareable lock, set ZAP_FLAG_NO_UNSHARE
+		 * to skip PMD unsharing. We still proceed with unmapping to
+		 * ensure pages are properly freed, which is critical for punch
+		 * hole operations that expect immediate page freeing.
 		 */
-		if (!__vma_shareable_lock(vma))
-			goto skip;
-
+		if (!have_shareable_lock)
+			local_flags |= ZAP_FLAG_NO_UNSHARE;
 		v_start = vma_offset_start(vma, start);
 		v_end = vma_offset_end(vma, end);
 
-		unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags);
-
+		unmap_hugepage_range(vma, v_start, v_end, NULL, local_flags);
 		/*
 		 * Note that vma lock only exists for shared/non-private
 		 * vmas.  Therefore, lock is not held when calling
 		 * unmap_hugepage_range for private vmas.
 		 */
-skip:
 		hugetlb_vma_unlock_write(vma);
 	}
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 06978b4dbeb8..9126ab44320d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2395,6 +2395,8 @@ struct zap_details {
 #define  ZAP_FLAG_DROP_MARKER        ((__force zap_flags_t) BIT(0))
 /* Set in unmap_vmas() to indicate a final unmap call.  Only used by hugetlb */
 #define  ZAP_FLAG_UNMAP              ((__force zap_flags_t) BIT(1))
+/* Skip PMD unsharing when unmapping hugetlb ranges without shareable lock */
+#define  ZAP_FLAG_NO_UNSHARE         ((__force zap_flags_t) BIT(2))
 
 #ifdef CONFIG_SCHED_MM_CID
 void sched_mm_cid_before_execve(struct task_struct *t);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cac826cb61f..c4257aa568fe 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5885,7 +5885,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		}
 
 		ptl = huge_pte_lock(h, mm, ptep);
-		if (huge_pmd_unshare(mm, vma, address, ptep)) {
+		if (!(zap_flags & ZAP_FLAG_NO_UNSHARE) &&
+		      huge_pmd_unshare(mm, vma, address, ptep)) {
 			spin_unlock(ptl);
 			tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
 			force_flush = true;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
  2025-10-03 17:45 [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable Deepanshu Kartikey
@ 2025-10-06  7:37 ` David Hildenbrand
  2025-10-06 13:28 ` Oscar Salvador
  2025-10-08  5:27 ` [PATCH v4] hugetlbfs: check for shareable lock before calling huge_pmd_unshare() Deepanshu Kartikey
  2 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-10-06  7:37 UTC (permalink / raw)
  To: Deepanshu Kartikey, muchun.song, osalvador, akpm,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	broonie
  Cc: linux-mm, linux-kernel, syzbot+f26d7c75c26ec19790e7

On 03.10.25 19:45, Deepanshu Kartikey wrote:
> When hugetlb_vmdelete_list() cannot acquire the shareable lock for a VMA,
> the previous fix (dd83609b8898) skipped the entire VMA to avoid lock

The proper way to mention a commit here

"... fix in commit dd83609b8898 ("hugetlbfs: skip VMAs without shareable 
locks in hugetlb_vmdelete_list") skipped ..."

> assertions in huge_pmd_unshare(). However, this prevented pages from being
> unmapped and freed, causing a regression in fallocate(PUNCH_HOLE) operations
> where pages were not freed immediately, as reported by Mark Brown.
> 
> The issue occurs because:
> 1. hugetlb_vmdelete_list() calls hugetlb_vma_trylock_write()
> 2. For shareable VMAs, this attempts to acquire the shareable lock
> 3. If successful, huge_pmd_unshare() expects the lock to be held
> 4. huge_pmd_unshare() asserts the lock via hugetlb_vma_assert_locked()
> 
> The v2 fix avoided calling code that requires locks, but this prevented
> page unmapping entirely, breaking the expected behavior where pages are
> freed during punch hole operations.
> 
> This v3 fix takes a different approach: instead of skipping the entire VMA,
> we skip only the PMD unsharing operation when we don't have the required
> lock, while still proceeding with page unmapping. This is safe because:

It's confusing to talk about fix versions. If you want to reference 
previous discussions, rather link to them.

> 
> - PMD unsharing is an optimization to reduce shared page table overhead
> - Page unmapping can proceed safely with just the VMA write lock
> - Pages get freed immediately as expected by PUNCH_HOLE operations
> - The PMD metadata will be cleaned up when the VMA is destroyed
> 
> We introduce a new ZAP_FLAG_NO_UNSHARE flag that communicates to
> __unmap_hugepage_range() that it should skip huge_pmd_unshare() while
> still clearing page table entries and freeing pages.
> 
> Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Reported-by: Mark Brown <broonie@kernel.org>
> Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
> Tested-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> 
> ---
> Changes in v3:
> - Instead of skipping entire VMAs, skip only PMD unsharing operation
> - Add ZAP_FLAG_NO_UNSHARE flag to communicate lock status
> - Ensure pages are still unmapped and freed immediately
> - Fixes regression in fallocate PUNCH_HOLE reported by Mark Brown
> 
> Changes in v2:
> - Check for shareable lock before trylock to avoid lock leaks
> - Add comment explaining why non-shareable VMAs are skipped
> ---
>   fs/hugetlbfs/inode.c | 22 ++++++++++++----------
>   include/linux/mm.h   |  2 ++
>   mm/hugetlb.c         |  3 ++-
>   3 files changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 9c94ed8c3ab0..519497bc1045 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -474,29 +474,31 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
>   	vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) {
>   		unsigned long v_start;
>   		unsigned long v_end;
> +		bool have_shareable_lock;
> +		zap_flags_t local_flags = zap_flags;
>   
>   		if (!hugetlb_vma_trylock_write(vma))
>   			continue;
> -
> +
> +		have_shareable_lock = __vma_shareable_lock(vma);
> +
>   		/*
> -		 * Skip VMAs without shareable locks. Per the design in commit
> -		 * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
> -		 * called after this function with proper locking.
> +		 * If we can't get the shareable lock, set ZAP_FLAG_NO_UNSHARE

What do you mean with "If we can't get the shareable lock"? 
__vma_shareable_lock() doesn't tell us whether we grabbed the lock, but 
whether we have to grab the lock?

I see now R-b/Ack from hugetlb maintainers and this seems to be getting 
rather complicated now and I cannot really easily judge what's right or 
wrong now.

@Muchun, Oscar, can you take a look?


> +		 * to skip PMD unsharing. We still proceed with unmapping to
> +		 * ensure pages are properly freed, which is critical for punch
> +		 * hole operations that expect immediate page freeing.
>   		 */
> -		if (!__vma_shareable_lock(vma))
> -			goto skip;
> -
> +		if (!have_shareable_lock)
> +			local_flags |= ZAP_FLAG_NO_UNSHARE;
>   		v_start = vma_offset_start(vma, start);
>   		v_end = vma_offset_end(vma, end);
>   
> -		unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags);
> -
> +		unmap_hugepage_range(vma, v_start, v_end, NULL, local_flags);
>   		/*
>   		 * Note that vma lock only exists for shared/non-private
>   		 * vmas.  Therefore, lock is not held when calling
>   		 * unmap_hugepage_range for private vmas.
>   		 */
> -skip:
>   		hugetlb_vma_unlock_write(vma);
>   	}
>   }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 06978b4dbeb8..9126ab44320d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2395,6 +2395,8 @@ struct zap_details {
>   #define  ZAP_FLAG_DROP_MARKER        ((__force zap_flags_t) BIT(0))
>   /* Set in unmap_vmas() to indicate a final unmap call.  Only used by hugetlb */
>   #define  ZAP_FLAG_UNMAP              ((__force zap_flags_t) BIT(1))
> +/* Skip PMD unsharing when unmapping hugetlb ranges without shareable lock */
> +#define  ZAP_FLAG_NO_UNSHARE         ((__force zap_flags_t) BIT(2))

That's nasty: this is hugetlb-specific stuff in a generic mm.h header 
using generic mm flags.

I'm sure we can find a way communicate that in a different way within 
hugetlb code and leave the generic ZAP_* flags alone?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
  2025-10-03 17:45 [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable Deepanshu Kartikey
  2025-10-06  7:37 ` David Hildenbrand
@ 2025-10-06 13:28 ` Oscar Salvador
  2025-10-08  5:27 ` [PATCH v4] hugetlbfs: check for shareable lock before calling huge_pmd_unshare() Deepanshu Kartikey
  2 siblings, 0 replies; 9+ messages in thread
From: Oscar Salvador @ 2025-10-06 13:28 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: muchun.song, david, akpm, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, broonie, linux-mm, linux-kernel,
	syzbot+f26d7c75c26ec19790e7

On Fri, Oct 03, 2025 at 11:15:53PM +0530, Deepanshu Kartikey wrote:
> When hugetlb_vmdelete_list() cannot acquire the shareable lock for a VMA,
> the previous fix (dd83609b8898) skipped the entire VMA to avoid lock
> assertions in huge_pmd_unshare(). However, this prevented pages from being
> unmapped and freed, causing a regression in fallocate(PUNCH_HOLE) operations
> where pages were not freed immediately, as reported by Mark Brown.
> 
> The issue occurs because:
> 1. hugetlb_vmdelete_list() calls hugetlb_vma_trylock_write()
> 2. For shareable VMAs, this attempts to acquire the shareable lock
> 3. If successful, huge_pmd_unshare() expects the lock to be held
> 4. huge_pmd_unshare() asserts the lock via hugetlb_vma_assert_locked()
> 
> The v2 fix avoided calling code that requires locks, but this prevented
> page unmapping entirely, breaking the expected behavior where pages are
> freed during punch hole operations.
> 
> This v3 fix takes a different approach: instead of skipping the entire VMA,
> we skip only the PMD unsharing operation when we don't have the required
> lock, while still proceeding with page unmapping. This is safe because:
> 
> - PMD unsharing is an optimization to reduce shared page table overhead
> - Page unmapping can proceed safely with just the VMA write lock
> - Pages get freed immediately as expected by PUNCH_HOLE operations
> - The PMD metadata will be cleaned up when the VMA is destroyed
> 
> We introduce a new ZAP_FLAG_NO_UNSHARE flag that communicates to
> __unmap_hugepage_range() that it should skip huge_pmd_unshare() while
> still clearing page table entries and freeing pages.
> 
> Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Reported-by: Mark Brown <broonie@kernel.org>
> Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
> Tested-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> 
> ---
...
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 9c94ed8c3ab0..519497bc1045 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -474,29 +474,31 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
>  	vma_interval_tree_foreach(vma, root, start, end ? end - 1 : ULONG_MAX) {
>  		unsigned long v_start;
>  		unsigned long v_end;
> +		bool have_shareable_lock;
> +		zap_flags_t local_flags = zap_flags;
>  
>  		if (!hugetlb_vma_trylock_write(vma))
>  			continue;
> -
> +
> +		have_shareable_lock = __vma_shareable_lock(vma);
> +
>  		/*
> -		 * Skip VMAs without shareable locks. Per the design in commit
> -		 * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
> -		 * called after this function with proper locking.
> +		 * If we can't get the shareable lock, set ZAP_FLAG_NO_UNSHARE
> +		 * to skip PMD unsharing. We still proceed with unmapping to
> +		 * ensure pages are properly freed, which is critical for punch
> +		 * hole operations that expect immediate page freeing.
>  		 */
> -		if (!__vma_shareable_lock(vma))
> -			goto skip;
> -
> +		if (!have_shareable_lock)
> +			local_flags |= ZAP_FLAG_NO_UNSHARE;

This is quite a head-spinning thing.

First of all, as David pointed out, that comment is misleading as it looks like
__vma_shareable_lock() performs a taking action which is not true, so that should
reworded.

Now, the thing is:

- Prior to commit dd83609b8898("hugetlbfs: skip VMAs without shareable
  locks in hugetlb_vmdelete_list"), we were unconditionally calling
  huge_pmd_unshare(), which asserted the vma lock and we didn't hold it.
  My question would be that Mike's vma-lock addition happened in 2022,
  how's that we didn't see this sooner? It should be rather easy to
  trigger? I'm a bit puzzled.

- Ok, since there's nothing to unshare, we skip the vma here and
  remove_inode_hugepages() should take care of it.
  But that seems to be troublesome because on punch-hole operation pages
  don't get freed.

- So instead, we just skip the unsharing operation and keep carrying
  with the unmapping/freeing in __unmap_hugepage_range.

I don't know but to me it seems that we're going to large extends to fix
an assertion.
So, the thing is, can't we check __vma_shareable_lock in
__unmap_hugepage_range() and only call huge_pmd_unshare() if we need to?

 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4] hugetlbfs: check for shareable lock before calling  huge_pmd_unshare()
  2025-10-03 17:45 [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable Deepanshu Kartikey
  2025-10-06  7:37 ` David Hildenbrand
  2025-10-06 13:28 ` Oscar Salvador
@ 2025-10-08  5:27 ` Deepanshu Kartikey
  2025-10-13  8:09   ` Oscar Salvador
  2025-10-13  8:27   ` David Hildenbrand
  2 siblings, 2 replies; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-10-08  5:27 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm, broonie
  Cc: linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+f26d7c75c26ec19790e7

When hugetlb_vmdelete_list() processes VMAs during truncate operations,
it may encounter VMAs where huge_pmd_unshare() is called without the
required shareable lock. This triggers an assertion failure in
hugetlb_vma_assert_locked().

The previous fix in commit dd83609b8898 ("hugetlbfs: skip VMAs without
shareable locks in hugetlb_vmdelete_list") skipped entire VMAs without
shareable locks to avoid the assertion. However, this prevented pages
from being unmapped and freed, causing a regression in fallocate(PUNCH_HOLE)
operations where pages were not freed immediately, as reported by Mark Brown.

Instead of skipping VMAs or adding new flags, check __vma_shareable_lock()
directly in __unmap_hugepage_range() right before calling huge_pmd_unshare().
This ensures PMD unsharing only happens when the VMA has a shareable lock
structure, while still allowing page unmapping and freeing to proceed for
all VMAs.

Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Tested-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Reported-by: Mark Brown <broonie@kernel.org>
Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
Suggested-by: Oscar Salvador <osalvador@suse.de>
Suggested-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/mm-commits/20250925203504.7BE02C4CEF7@smtp.kernel.org/ [v1]
Link: https://lore.kernel.org/mm-commits/20250928185232.BEDB6C4CEF0@smtp.kernel.org/ [v2]
Link: https://lore.kernel.org/linux-mm/20251003174553.3078839-1-kartikey406@gmail.com/ [v3]
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
Changes in v4:
- Simplified approach per Oscar's suggestion: check __vma_shareable_lock()
  directly in __unmap_hugepage_range() before calling huge_pmd_unshare()
- Removed ZAP_FLAG_NO_UNSHARE flag per David's feedback to avoid polluting
  generic mm.h header
- Reverted hugetlb_vmdelete_list() to not skip VMAs

Changes in v3:
- Added ZAP_FLAG_NO_UNSHARE to skip only PMD unsharing, not entire VMA

Changes in v2:
- Skip entire VMAs without shareable locks in hugetlb_vmdelete_list()
  (caused PUNCH_HOLE regression)

Changes in v1:
- Initial fix attempt
---
 fs/hugetlbfs/inode.c | 10 +---------
 mm/hugetlb.c         |  2 +-
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9c94ed8c3ab0..1e040db18b20 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -478,14 +478,6 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
 		if (!hugetlb_vma_trylock_write(vma))
 			continue;
 
-		/*
-		 * Skip VMAs without shareable locks. Per the design in commit
-		 * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
-		 * called after this function with proper locking.
-		 */
-		if (!__vma_shareable_lock(vma))
-			goto skip;
-
 		v_start = vma_offset_start(vma, start);
 		v_end = vma_offset_end(vma, end);
 
@@ -496,7 +488,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
 		 * vmas.  Therefore, lock is not held when calling
 		 * unmap_hugepage_range for private vmas.
 		 */
-skip:
+
 		hugetlb_vma_unlock_write(vma);
 	}
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6cac826cb61f..9ed85ab8420e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5885,7 +5885,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		}
 
 		ptl = huge_pte_lock(h, mm, ptep);
-		if (huge_pmd_unshare(mm, vma, address, ptep)) {
+		if (__vma_shareable_lock(vma) && huge_pmd_unshare(mm, vma, address, ptep)) {
 			spin_unlock(ptl);
 			tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
 			force_flush = true;
-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] hugetlbfs: check for shareable lock before calling huge_pmd_unshare()
  2025-10-08  5:27 ` [PATCH v4] hugetlbfs: check for shareable lock before calling huge_pmd_unshare() Deepanshu Kartikey
@ 2025-10-13  8:09   ` Oscar Salvador
  2025-10-13  8:27   ` David Hildenbrand
  1 sibling, 0 replies; 9+ messages in thread
From: Oscar Salvador @ 2025-10-13  8:09 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: muchun.song, david, akpm, broonie, linux-mm, linux-kernel,
	syzbot+f26d7c75c26ec19790e7

On Wed, Oct 08, 2025 at 10:57:59AM +0530, Deepanshu Kartikey wrote:
> When hugetlb_vmdelete_list() processes VMAs during truncate operations,
> it may encounter VMAs where huge_pmd_unshare() is called without the
> required shareable lock. This triggers an assertion failure in
> hugetlb_vma_assert_locked().
> 
> The previous fix in commit dd83609b8898 ("hugetlbfs: skip VMAs without
> shareable locks in hugetlb_vmdelete_list") skipped entire VMAs without
> shareable locks to avoid the assertion. However, this prevented pages
> from being unmapped and freed, causing a regression in fallocate(PUNCH_HOLE)
> operations where pages were not freed immediately, as reported by Mark Brown.
> 
> Instead of skipping VMAs or adding new flags, check __vma_shareable_lock()
> directly in __unmap_hugepage_range() right before calling huge_pmd_unshare().
> This ensures PMD unsharing only happens when the VMA has a shareable lock
> structure, while still allowing page unmapping and freeing to proceed for
> all VMAs.
> 
> Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Tested-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Reported-by: Mark Brown <broonie@kernel.org>
> Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
> Suggested-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Link: https://lore.kernel.org/mm-commits/20250925203504.7BE02C4CEF7@smtp.kernel.org/ [v1]
> Link: https://lore.kernel.org/mm-commits/20250928185232.BEDB6C4CEF0@smtp.kernel.org/ [v2]
> Link: https://lore.kernel.org/linux-mm/20251003174553.3078839-1-kartikey406@gmail.com/ [v3]
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>

Acked-by: Oscar Salvador <osalvador@suse.de>

 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] hugetlbfs: check for shareable lock before calling huge_pmd_unshare()
  2025-10-08  5:27 ` [PATCH v4] hugetlbfs: check for shareable lock before calling huge_pmd_unshare() Deepanshu Kartikey
  2025-10-13  8:09   ` Oscar Salvador
@ 2025-10-13  8:27   ` David Hildenbrand
  1 sibling, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-10-13  8:27 UTC (permalink / raw)
  To: Deepanshu Kartikey, muchun.song, osalvador, akpm, broonie
  Cc: linux-mm, linux-kernel, syzbot+f26d7c75c26ec19790e7

On 08.10.25 07:27, Deepanshu Kartikey wrote:
> When hugetlb_vmdelete_list() processes VMAs during truncate operations,
> it may encounter VMAs where huge_pmd_unshare() is called without the
> required shareable lock. This triggers an assertion failure in
> hugetlb_vma_assert_locked().
> 
> The previous fix in commit dd83609b8898 ("hugetlbfs: skip VMAs without
> shareable locks in hugetlb_vmdelete_list") skipped entire VMAs without
> shareable locks to avoid the assertion. However, this prevented pages
> from being unmapped and freed, causing a regression in fallocate(PUNCH_HOLE)
> operations where pages were not freed immediately, as reported by Mark Brown.
> 
> Instead of skipping VMAs or adding new flags, check __vma_shareable_lock()
> directly in __unmap_hugepage_range() right before calling huge_pmd_unshare().
> This ensures PMD unsharing only happens when the VMA has a shareable lock
> structure, while still allowing page unmapping and freeing to proceed for
> all VMAs.
> 
> Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Tested-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
> Reported-by: Mark Brown <broonie@kernel.org>
> Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
> Suggested-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Link: https://lore.kernel.org/mm-commits/20250925203504.7BE02C4CEF7@smtp.kernel.org/ [v1]
> Link: https://lore.kernel.org/mm-commits/20250928185232.BEDB6C4CEF0@smtp.kernel.org/ [v2]
> Link: https://lore.kernel.org/linux-mm/20251003174553.3078839-1-kartikey406@gmail.com/ [v3]
> Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
> ---
> Changes in v4:
> - Simplified approach per Oscar's suggestion: check __vma_shareable_lock()
>    directly in __unmap_hugepage_range() before calling huge_pmd_unshare()
> - Removed ZAP_FLAG_NO_UNSHARE flag per David's feedback to avoid polluting
>    generic mm.h header
> - Reverted hugetlb_vmdelete_list() to not skip VMAs
> 
> Changes in v3:
> - Added ZAP_FLAG_NO_UNSHARE to skip only PMD unsharing, not entire VMA
> 
> Changes in v2:
> - Skip entire VMAs without shareable locks in hugetlb_vmdelete_list()
>    (caused PUNCH_HOLE regression)
> 
> Changes in v1:
> - Initial fix attempt
> ---
>   fs/hugetlbfs/inode.c | 10 +---------
>   mm/hugetlb.c         |  2 +-
>   2 files changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 9c94ed8c3ab0..1e040db18b20 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -478,14 +478,6 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
>   		if (!hugetlb_vma_trylock_write(vma))
>   			continue;
>   
> -		/*
> -		 * Skip VMAs without shareable locks. Per the design in commit
> -		 * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
> -		 * called after this function with proper locking.
> -		 */
> -		if (!__vma_shareable_lock(vma))
> -			goto skip;
> -
>   		v_start = vma_offset_start(vma, start);
>   		v_end = vma_offset_end(vma, end);
>   
> @@ -496,7 +488,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
>   		 * vmas.  Therefore, lock is not held when calling
>   		 * unmap_hugepage_range for private vmas.
>   		 */
> -skip:
> +
>   		hugetlb_vma_unlock_write(vma);
>   	}
>   }
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 6cac826cb61f..9ed85ab8420e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5885,7 +5885,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
>   		}
>   
>   		ptl = huge_pte_lock(h, mm, ptep);
> -		if (huge_pmd_unshare(mm, vma, address, ptep)) {
> +		if (__vma_shareable_lock(vma) && huge_pmd_unshare(mm, vma, address, ptep)) {
>   			spin_unlock(ptl);
>   			tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE);
>   			force_flush = true;

Wondering, couldn't we handle that in huge_pmd_unshare()?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index eed59cfb5d218..f167cec4a5acc 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7598,13 +7598,14 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
         p4d_t *p4d = p4d_offset(pgd, addr);
         pud_t *pud = pud_offset(p4d, addr);
  
-       i_mmap_assert_write_locked(vma->vm_file->f_mapping);
-       hugetlb_vma_assert_locked(vma);
         if (sz != PMD_SIZE)
                 return 0;
         if (!ptdesc_pmd_pts_count(virt_to_ptdesc(ptep)))
                 return 0;
  
+       i_mmap_assert_write_locked(vma->vm_file->f_mapping);
+       hugetlb_vma_assert_locked(vma);
+
         pud_clear(pud);
         /*
          * Once our caller drops the rmap lock, some other process might be

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
@ 2025-10-06  7:54 Deepanshu Kartikey
  2025-10-06 12:27 ` Oscar Salvador
  0 siblings, 1 reply; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-10-06  7:54 UTC (permalink / raw)
  To: muchun.song, osalvador, david, akpm, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, broonie
  Cc: linux-mm, linux-kernel

Apologies for confusion. Please disregard this patch . As, it is creating a lot of confusion.
I will create patch v4 with proper commit message and previous link


Deepanshu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
  2025-10-06  7:54 [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable Deepanshu Kartikey
@ 2025-10-06 12:27 ` Oscar Salvador
  0 siblings, 0 replies; 9+ messages in thread
From: Oscar Salvador @ 2025-10-06 12:27 UTC (permalink / raw)
  To: Deepanshu Kartikey
  Cc: muchun.song, david, akpm, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, broonie, linux-mm, linux-kernel

On Mon, Oct 06, 2025 at 01:24:36PM +0530, Deepanshu Kartikey wrote:
> Apologies for confusion. Please disregard this patch . As, it is creating a lot of confusion.
> I will create patch v4 with proper commit message and previous link

Hold on, let us try to avoid more traffic than necessary :-)?
I plan to have a look at this later today.


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable
@ 2025-10-06 14:13 Deepanshu Kartikey
  0 siblings, 0 replies; 9+ messages in thread
From: Deepanshu Kartikey @ 2025-10-06 14:13 UTC (permalink / raw)
  To: osalvador, broonie
  Cc: muchun.song, david, akpm, lorenzo.stoakes, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, linux-mm, linux-kernel

Hi Oscar,

I agree - checking __vma_shareable_lock() directly in __unmap_hugepage_range() 
is cleaner and simpler. I'll send v4 with that approach shortly.

Thanks for the feedback.

Best regards,
Deepanshu


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-10-13  8:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-03 17:45 [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable Deepanshu Kartikey
2025-10-06  7:37 ` David Hildenbrand
2025-10-06 13:28 ` Oscar Salvador
2025-10-08  5:27 ` [PATCH v4] hugetlbfs: check for shareable lock before calling huge_pmd_unshare() Deepanshu Kartikey
2025-10-13  8:09   ` Oscar Salvador
2025-10-13  8:27   ` David Hildenbrand
2025-10-06  7:54 [PATCH v3] hugetlbfs: skip PMD unsharing when shareable lock unavailable Deepanshu Kartikey
2025-10-06 12:27 ` Oscar Salvador
2025-10-06 14:13 Deepanshu Kartikey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox