* [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-09-26 3:32 Deepanshu Kartikey
2025-10-03 10:57 ` Mark Brown
0 siblings, 1 reply; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-09-26 3:32 UTC (permalink / raw)
To: muchun.song, osalvador, david, akpm
Cc: linux-mm, linux-kernel, Deepanshu Kartikey, syzbot+f26d7c75c26ec19790e7
hugetlb_vmdelete_list() uses trylock to acquire VMA locks during truncate
operations. As per the original design in commit 40549ba8f8e0 ("hugetlb:
use new vma_lock for pmd sharing synchronization"), if the trylock fails
or the VMA has no lock, it should skip that VMA. Any remaining mapped
pages are handled by remove_inode_hugepages() which is called after
hugetlb_vmdelete_list() and uses proper lock ordering to guarantee
unmapping success.
Currently, when hugetlb_vma_trylock_write() returns success (1) for VMAs
without shareable locks, the code proceeds to call unmap_hugepage_range().
This causes assertion failures in huge_pmd_unshare() → hugetlb_vma_assert_locked()
because no lock is actually held:
WARNING: CPU: 1 PID: 6594 Comm: syz.0.28 Not tainted
Call Trace:
hugetlb_vma_assert_locked+0x1dd/0x250
huge_pmd_unshare+0x2c8/0x540
__unmap_hugepage_range+0x6e3/0x1aa0
unmap_hugepage_range+0x32e/0x410
hugetlb_vmdelete_list+0x189/0x1f0
Fix by using goto to ensure locks acquired by trylock are always released, even
when skipping VMAs without shareable locks.
Reported-by: syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f26d7c75c26ec19790e7
Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
---
Changes in v2:
- Use goto to unlock after trylock, avoiding lock leaks (Andrew Morton)
- Add comment explaining why non-shareable VMAs are skipped (Andrew Morton)
---
fs/hugetlbfs/inode.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9e0625167517..9fa7c72ac1a6 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -488,6 +488,14 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
if (!hugetlb_vma_trylock_write(vma))
continue;
+ /*
+ * Skip VMAs without shareable locks. Per the design in commit
+ * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
+ * called after this function with proper locking.
+ */
+ if (!__vma_shareable_lock(vma))
+ goto skip;
+
v_start = vma_offset_start(vma, start);
v_end = vma_offset_end(vma, end);
@@ -498,7 +506,8 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end,
* vmas. Therefore, lock is not held when calling
* unmap_hugepage_range for private vmas.
*/
- hugetlb_vma_unlock_write(vma);
+skip:
+ hugetlb_vma_unlock_write(vma);
}
}
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
2025-09-26 3:32 [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list Deepanshu Kartikey
@ 2025-10-03 10:57 ` Mark Brown
2025-10-20 17:52 ` Mark Brown
0 siblings, 1 reply; 8+ messages in thread
From: Mark Brown @ 2025-10-03 10:57 UTC (permalink / raw)
To: Deepanshu Kartikey
Cc: muchun.song, osalvador, david, akpm, linux-mm, linux-kernel,
syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV
[-- Attachment #1: Type: text/plain, Size: 7633 bytes --]
On Fri, Sep 26, 2025 at 09:02:54AM +0530, Deepanshu Kartikey wrote:
> hugetlb_vmdelete_list() uses trylock to acquire VMA locks during truncate
> operations. As per the original design in commit 40549ba8f8e0 ("hugetlb:
> use new vma_lock for pmd sharing synchronization"), if the trylock fails
> or the VMA has no lock, it should skip that VMA. Any remaining mapped
> pages are handled by remove_inode_hugepages() which is called after
> hugetlb_vmdelete_list() and uses proper lock ordering to guarantee
> unmapping success.
For the past few days I've been seeing failures on Raspberry Pi 4 in
the hugetlbfs-madvise kselftest in -next which bisect to this patch.
The test reports:
# # -------------------------
# # running ./hugetlb-madvise
# # -------------------------
# # Unexpected number of free huge pages line 252
# # [FAIL]
# not ok 6 hugetlb-madvise # exit=1
Full log:
https://lava.sirena.org.uk/scheduler/job/1913276#L1803
Bisect log:
# bad: [7396732143a22b42bb97710173d598aaf50daa89] Add linux-next specific files for 20251002
# good: [9d3bc72cc0a9791bf4910ef854b2c3dd61af3bbf] Merge branch 'for-rc' of https://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl.git
# good: [d4ecae56a8c7d3287a5bcdb2d65f7102ee580ab6] clk: mediatek: Add MT8196 mcu clock support
# good: [4c134c2a5f3db29afe35b2d30e39bb6d867b08da] um: Indent time-travel help messages
# good: [bf1af4f6e62878e053d20cd71267aed8dfb3e715] perf arm-spe: Downsample all sample types equally
# good: [e414334883f4835058ca06f934bc4988eb9cd9e6] Merge branch 'next/dt' into for-next
# good: [54653bb3ec83d1f717adab6108db82a3966d19ee] clk: renesas: rzv2h: remove round_rate() in favor of determine_rate()
# good: [87a877de367d835b527d1086f75727123ef85fc4] KVM: x86: Rename handle_fastpath_set_msr_irqoff() to handle_fastpath_wrmsr()
# good: [c26675447faff8c4ddc1dc5d2cd28326b8181aaf] KVM: x86: Zero XSTATE components on INIT by iterating over supported features
# good: [6684aba0780da9f505c202f27e68ee6d18c0aa66] XArray: Add extra debugging check to xas_lock and friends
git bisect start '7396732143a22b42bb97710173d598aaf50daa89' '9d3bc72cc0a9791bf4910ef854b2c3dd61af3bbf' 'd4ecae56a8c7d3287a5bcdb2d65f7102ee580ab6' '4c134c2a5f3db29afe35b2d30e39bb6d867b08da' 'bf1af4f6e62878e053d20cd71267aed8dfb3e715' 'e414334883f4835058ca06f934bc4988eb9cd9e6' '54653bb3ec83d1f717adab6108db82a3966d19ee' '87a877de367d835b527d1086f75727123ef85fc4' 'c26675447faff8c4ddc1dc5d2cd28326b8181aaf' '6684aba0780da9f505c202f27e68ee6d18c0aa66'
# test job: [d4ecae56a8c7d3287a5bcdb2d65f7102ee580ab6] https://lava.sirena.org.uk/scheduler/job/1907306
# test job: [4c134c2a5f3db29afe35b2d30e39bb6d867b08da] https://lava.sirena.org.uk/scheduler/job/1903298
# test job: [bf1af4f6e62878e053d20cd71267aed8dfb3e715] https://lava.sirena.org.uk/scheduler/job/1900552
# test job: [e414334883f4835058ca06f934bc4988eb9cd9e6] https://lava.sirena.org.uk/scheduler/job/1904803
# test job: [54653bb3ec83d1f717adab6108db82a3966d19ee] https://lava.sirena.org.uk/scheduler/job/1900685
# test job: [87a877de367d835b527d1086f75727123ef85fc4] https://lava.sirena.org.uk/scheduler/job/1697972
# test job: [c26675447faff8c4ddc1dc5d2cd28326b8181aaf] https://lava.sirena.org.uk/scheduler/job/1698132
# test job: [6684aba0780da9f505c202f27e68ee6d18c0aa66] https://lava.sirena.org.uk/scheduler/job/1738722
# test job: [7396732143a22b42bb97710173d598aaf50daa89] https://lava.sirena.org.uk/scheduler/job/1913276
# bad: [7396732143a22b42bb97710173d598aaf50daa89] Add linux-next specific files for 20251002
git bisect bad 7396732143a22b42bb97710173d598aaf50daa89
# test job: [74fc450198cf792e3db35ea4d49197a467233373] https://lava.sirena.org.uk/scheduler/job/1913848
# bad: [74fc450198cf792e3db35ea4d49197a467233373] Merge branch 'main' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect bad 74fc450198cf792e3db35ea4d49197a467233373
# test job: [db484ff3fff1fafa0017cdd017795bec09ace5e4] https://lava.sirena.org.uk/scheduler/job/1913993
# bad: [db484ff3fff1fafa0017cdd017795bec09ace5e4] Merge branch 'docs-next' of git://git.lwn.net/linux.git
git bisect bad db484ff3fff1fafa0017cdd017795bec09ace5e4
# test job: [7d942c9d9660e6808dcd835c4c73ad5405cc5518] https://lava.sirena.org.uk/scheduler/job/1914055
# bad: [7d942c9d9660e6808dcd835c4c73ad5405cc5518] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git
git bisect bad 7d942c9d9660e6808dcd835c4c73ad5405cc5518
# test job: [db03d3c83bdb21667392d1596fafdfb38325c2a0] https://lava.sirena.org.uk/scheduler/job/1914176
# bad: [db03d3c83bdb21667392d1596fafdfb38325c2a0] Merge branch 'dma-mapping-for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux.git
git bisect bad db03d3c83bdb21667392d1596fafdfb38325c2a0
# test job: [84a7a9823e73fe3c0adcc4780fa7a091981048ef] https://lava.sirena.org.uk/scheduler/job/1914247
# good: [84a7a9823e73fe3c0adcc4780fa7a091981048ef] mm/shmem, swap: remove redundant error handling for replacing folio
git bisect good 84a7a9823e73fe3c0adcc4780fa7a091981048ef
# test job: [c7416f37e4d31fb28ac4ed584b13037e69a22dbe] https://lava.sirena.org.uk/scheduler/job/1914387
# bad: [c7416f37e4d31fb28ac4ed584b13037e69a22dbe] Merge branch 'mm-nonmm-stable' of https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad c7416f37e4d31fb28ac4ed584b13037e69a22dbe
# test job: [3dfd02c900379d209ac9dcac24b4a61d8478842a] https://lava.sirena.org.uk/scheduler/job/1914497
# good: [3dfd02c900379d209ac9dcac24b4a61d8478842a] hugetlb: increase number of reserving hugepages via cmdline
git bisect good 3dfd02c900379d209ac9dcac24b4a61d8478842a
# test job: [fe7a283b39160153b6d1bd7f61b0a9d5d44987a8] https://lava.sirena.org.uk/scheduler/job/1915206
# good: [fe7a283b39160153b6d1bd7f61b0a9d5d44987a8] ocfs2: add suballoc slot check in ocfs2_validate_inode_block()
git bisect good fe7a283b39160153b6d1bd7f61b0a9d5d44987a8
# test job: [74058c0a9fc8b2b4d5f4a0ef7ee2cfa66a9e49cf] https://lava.sirena.org.uk/scheduler/job/1916011
# good: [74058c0a9fc8b2b4d5f4a0ef7ee2cfa66a9e49cf] Squashfs: fix uninit-value in squashfs_get_parent
git bisect good 74058c0a9fc8b2b4d5f4a0ef7ee2cfa66a9e49cf
# test job: [9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b] https://lava.sirena.org.uk/scheduler/job/1916064
# good: [9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b] Squashfs: reject negative file sizes in squashfs_read_inode()
git bisect good 9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b
# test job: [fb552b2425cf8f16c9c72229a972d1744b24d855] https://lava.sirena.org.uk/scheduler/job/1916102
# good: [fb552b2425cf8f16c9c72229a972d1744b24d855] alloc_tag: fix boot failure due to NULL pointer dereference
git bisect good fb552b2425cf8f16c9c72229a972d1744b24d855
# test job: [81e78b7ec61e89e8bab9736551839f79b063614c] https://lava.sirena.org.uk/scheduler/job/1916193
# bad: [81e78b7ec61e89e8bab9736551839f79b063614c] mm: convert folio_page() back to a macro
git bisect bad 81e78b7ec61e89e8bab9736551839f79b063614c
# test job: [1acc369373008b9eeb930fbb47847c0693055553] https://lava.sirena.org.uk/scheduler/job/1916218
# bad: [1acc369373008b9eeb930fbb47847c0693055553] mm/khugepaged: use start_addr/addr for improved readability
git bisect bad 1acc369373008b9eeb930fbb47847c0693055553
# test job: [dd83609b88986f4add37c0871c3434310652ebd5] https://lava.sirena.org.uk/scheduler/job/1916225
# bad: [dd83609b88986f4add37c0871c3434310652ebd5] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
git bisect bad dd83609b88986f4add37c0871c3434310652ebd5
# first bad commit: [dd83609b88986f4add37c0871c3434310652ebd5] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
2025-10-03 10:57 ` Mark Brown
@ 2025-10-20 17:52 ` Mark Brown
2025-10-21 21:10 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Mark Brown @ 2025-10-20 17:52 UTC (permalink / raw)
To: Deepanshu Kartikey
Cc: muchun.song, osalvador, david, akpm, linux-mm, linux-kernel,
syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV, torvalds
[-- Attachment #1: Type: text/plain, Size: 8193 bytes --]
On Fri, Oct 03, 2025 at 11:57:35AM +0100, Mark Brown wrote:
> On Fri, Sep 26, 2025 at 09:02:54AM +0530, Deepanshu Kartikey wrote:
> > hugetlb_vmdelete_list() uses trylock to acquire VMA locks during truncate
> > operations. As per the original design in commit 40549ba8f8e0 ("hugetlb:
> > use new vma_lock for pmd sharing synchronization"), if the trylock fails
> > or the VMA has no lock, it should skip that VMA. Any remaining mapped
> > pages are handled by remove_inode_hugepages() which is called after
> > hugetlb_vmdelete_list() and uses proper lock ordering to guarantee
> > unmapping success.
>
> For the past few days I've been seeing failures on Raspberry Pi 4 in
> the hugetlbfs-madvise kselftest in -next which bisect to this patch.
> The test reports:
>
> # # -------------------------
> # # running ./hugetlb-madvise
> # # -------------------------
> # # Unexpected number of free huge pages line 252
> # # [FAIL]
> # not ok 6 hugetlb-madvise # exit=1
This issue is now present in mainline:
Raspberry Pi 4: https://lava.sirena.org.uk/scheduler/job/1976561#L1798
Orion O6: https://lava.sirena.org.uk/scheduler/job/1977081#L1779
and still bisects to this patch.
> Full log:
>
> https://lava.sirena.org.uk/scheduler/job/1913276#L1803
>
> Bisect log:
>
> # bad: [7396732143a22b42bb97710173d598aaf50daa89] Add linux-next specific files for 20251002
> # good: [9d3bc72cc0a9791bf4910ef854b2c3dd61af3bbf] Merge branch 'for-rc' of https://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl.git
> # good: [d4ecae56a8c7d3287a5bcdb2d65f7102ee580ab6] clk: mediatek: Add MT8196 mcu clock support
> # good: [4c134c2a5f3db29afe35b2d30e39bb6d867b08da] um: Indent time-travel help messages
> # good: [bf1af4f6e62878e053d20cd71267aed8dfb3e715] perf arm-spe: Downsample all sample types equally
> # good: [e414334883f4835058ca06f934bc4988eb9cd9e6] Merge branch 'next/dt' into for-next
> # good: [54653bb3ec83d1f717adab6108db82a3966d19ee] clk: renesas: rzv2h: remove round_rate() in favor of determine_rate()
> # good: [87a877de367d835b527d1086f75727123ef85fc4] KVM: x86: Rename handle_fastpath_set_msr_irqoff() to handle_fastpath_wrmsr()
> # good: [c26675447faff8c4ddc1dc5d2cd28326b8181aaf] KVM: x86: Zero XSTATE components on INIT by iterating over supported features
> # good: [6684aba0780da9f505c202f27e68ee6d18c0aa66] XArray: Add extra debugging check to xas_lock and friends
> git bisect start '7396732143a22b42bb97710173d598aaf50daa89' '9d3bc72cc0a9791bf4910ef854b2c3dd61af3bbf' 'd4ecae56a8c7d3287a5bcdb2d65f7102ee580ab6' '4c134c2a5f3db29afe35b2d30e39bb6d867b08da' 'bf1af4f6e62878e053d20cd71267aed8dfb3e715' 'e414334883f4835058ca06f934bc4988eb9cd9e6' '54653bb3ec83d1f717adab6108db82a3966d19ee' '87a877de367d835b527d1086f75727123ef85fc4' 'c26675447faff8c4ddc1dc5d2cd28326b8181aaf' '6684aba0780da9f505c202f27e68ee6d18c0aa66'
> # test job: [d4ecae56a8c7d3287a5bcdb2d65f7102ee580ab6] https://lava.sirena.org.uk/scheduler/job/1907306
> # test job: [4c134c2a5f3db29afe35b2d30e39bb6d867b08da] https://lava.sirena.org.uk/scheduler/job/1903298
> # test job: [bf1af4f6e62878e053d20cd71267aed8dfb3e715] https://lava.sirena.org.uk/scheduler/job/1900552
> # test job: [e414334883f4835058ca06f934bc4988eb9cd9e6] https://lava.sirena.org.uk/scheduler/job/1904803
> # test job: [54653bb3ec83d1f717adab6108db82a3966d19ee] https://lava.sirena.org.uk/scheduler/job/1900685
> # test job: [87a877de367d835b527d1086f75727123ef85fc4] https://lava.sirena.org.uk/scheduler/job/1697972
> # test job: [c26675447faff8c4ddc1dc5d2cd28326b8181aaf] https://lava.sirena.org.uk/scheduler/job/1698132
> # test job: [6684aba0780da9f505c202f27e68ee6d18c0aa66] https://lava.sirena.org.uk/scheduler/job/1738722
> # test job: [7396732143a22b42bb97710173d598aaf50daa89] https://lava.sirena.org.uk/scheduler/job/1913276
> # bad: [7396732143a22b42bb97710173d598aaf50daa89] Add linux-next specific files for 20251002
> git bisect bad 7396732143a22b42bb97710173d598aaf50daa89
> # test job: [74fc450198cf792e3db35ea4d49197a467233373] https://lava.sirena.org.uk/scheduler/job/1913848
> # bad: [74fc450198cf792e3db35ea4d49197a467233373] Merge branch 'main' of https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
> git bisect bad 74fc450198cf792e3db35ea4d49197a467233373
> # test job: [db484ff3fff1fafa0017cdd017795bec09ace5e4] https://lava.sirena.org.uk/scheduler/job/1913993
> # bad: [db484ff3fff1fafa0017cdd017795bec09ace5e4] Merge branch 'docs-next' of git://git.lwn.net/linux.git
> git bisect bad db484ff3fff1fafa0017cdd017795bec09ace5e4
> # test job: [7d942c9d9660e6808dcd835c4c73ad5405cc5518] https://lava.sirena.org.uk/scheduler/job/1914055
> # bad: [7d942c9d9660e6808dcd835c4c73ad5405cc5518] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git
> git bisect bad 7d942c9d9660e6808dcd835c4c73ad5405cc5518
> # test job: [db03d3c83bdb21667392d1596fafdfb38325c2a0] https://lava.sirena.org.uk/scheduler/job/1914176
> # bad: [db03d3c83bdb21667392d1596fafdfb38325c2a0] Merge branch 'dma-mapping-for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux.git
> git bisect bad db03d3c83bdb21667392d1596fafdfb38325c2a0
> # test job: [84a7a9823e73fe3c0adcc4780fa7a091981048ef] https://lava.sirena.org.uk/scheduler/job/1914247
> # good: [84a7a9823e73fe3c0adcc4780fa7a091981048ef] mm/shmem, swap: remove redundant error handling for replacing folio
> git bisect good 84a7a9823e73fe3c0adcc4780fa7a091981048ef
> # test job: [c7416f37e4d31fb28ac4ed584b13037e69a22dbe] https://lava.sirena.org.uk/scheduler/job/1914387
> # bad: [c7416f37e4d31fb28ac4ed584b13037e69a22dbe] Merge branch 'mm-nonmm-stable' of https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad c7416f37e4d31fb28ac4ed584b13037e69a22dbe
> # test job: [3dfd02c900379d209ac9dcac24b4a61d8478842a] https://lava.sirena.org.uk/scheduler/job/1914497
> # good: [3dfd02c900379d209ac9dcac24b4a61d8478842a] hugetlb: increase number of reserving hugepages via cmdline
> git bisect good 3dfd02c900379d209ac9dcac24b4a61d8478842a
> # test job: [fe7a283b39160153b6d1bd7f61b0a9d5d44987a8] https://lava.sirena.org.uk/scheduler/job/1915206
> # good: [fe7a283b39160153b6d1bd7f61b0a9d5d44987a8] ocfs2: add suballoc slot check in ocfs2_validate_inode_block()
> git bisect good fe7a283b39160153b6d1bd7f61b0a9d5d44987a8
> # test job: [74058c0a9fc8b2b4d5f4a0ef7ee2cfa66a9e49cf] https://lava.sirena.org.uk/scheduler/job/1916011
> # good: [74058c0a9fc8b2b4d5f4a0ef7ee2cfa66a9e49cf] Squashfs: fix uninit-value in squashfs_get_parent
> git bisect good 74058c0a9fc8b2b4d5f4a0ef7ee2cfa66a9e49cf
> # test job: [9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b] https://lava.sirena.org.uk/scheduler/job/1916064
> # good: [9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b] Squashfs: reject negative file sizes in squashfs_read_inode()
> git bisect good 9f1c14c1de1bdde395f6cc893efa4f80a2ae3b2b
> # test job: [fb552b2425cf8f16c9c72229a972d1744b24d855] https://lava.sirena.org.uk/scheduler/job/1916102
> # good: [fb552b2425cf8f16c9c72229a972d1744b24d855] alloc_tag: fix boot failure due to NULL pointer dereference
> git bisect good fb552b2425cf8f16c9c72229a972d1744b24d855
> # test job: [81e78b7ec61e89e8bab9736551839f79b063614c] https://lava.sirena.org.uk/scheduler/job/1916193
> # bad: [81e78b7ec61e89e8bab9736551839f79b063614c] mm: convert folio_page() back to a macro
> git bisect bad 81e78b7ec61e89e8bab9736551839f79b063614c
> # test job: [1acc369373008b9eeb930fbb47847c0693055553] https://lava.sirena.org.uk/scheduler/job/1916218
> # bad: [1acc369373008b9eeb930fbb47847c0693055553] mm/khugepaged: use start_addr/addr for improved readability
> git bisect bad 1acc369373008b9eeb930fbb47847c0693055553
> # test job: [dd83609b88986f4add37c0871c3434310652ebd5] https://lava.sirena.org.uk/scheduler/job/1916225
> # bad: [dd83609b88986f4add37c0871c3434310652ebd5] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
> git bisect bad dd83609b88986f4add37c0871c3434310652ebd5
> # first bad commit: [dd83609b88986f4add37c0871c3434310652ebd5] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
2025-10-20 17:52 ` Mark Brown
@ 2025-10-21 21:10 ` Andrew Morton
2025-10-22 11:40 ` Mark Brown
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2025-10-21 21:10 UTC (permalink / raw)
To: Mark Brown
Cc: Deepanshu Kartikey, muchun.song, osalvador, david, linux-mm,
linux-kernel, syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV,
torvalds
On Mon, 20 Oct 2025 18:52:11 +0100 Mark Brown <broonie@kernel.org> wrote:
> > For the past few days I've been seeing failures on Raspberry Pi 4 in
> > the hugetlbfs-madvise kselftest in -next which bisect to this patch.
> > The test reports:
> >
> > # # -------------------------
> > # # running ./hugetlb-madvise
> > # # -------------------------
> > # # Unexpected number of free huge pages line 252
> > # # [FAIL]
> > # not ok 6 hugetlb-madvise # exit=1
>
> This issue is now present in mainline:
>
> Raspberry Pi 4: https://lava.sirena.org.uk/scheduler/job/1976561#L1798
> Orion O6: https://lava.sirena.org.uk/scheduler/job/1977081#L1779
>
> and still bisects to this patch.
Thanks. Were you able to test the proposed fix?
From: Deepanshu Kartikey <kartikey406@gmail.com>
Subject: hugetlbfs: move lock assertions after early returns in huge_pmd_unshare()
Date: Tue, 14 Oct 2025 17:03:44 +0530
When hugetlb_vmdelete_list() processes VMAs during truncate operations, it
may encounter VMAs where huge_pmd_unshare() is called without the required
shareable lock. This triggers an assertion failure in
hugetlb_vma_assert_locked().
The previous fix in commit dd83609b8898 ("hugetlbfs: skip VMAs without
shareable locks in hugetlb_vmdelete_list") skipped entire VMAs without
shareable locks to avoid the assertion. However, this prevented pages
from being unmapped and freed, causing a regression in
fallocate(PUNCH_HOLE) operations where pages were not freed immediately,
as reported by Mark Brown.
Instead of checking locks in the caller or skipping VMAs, move the lock
assertions in huge_pmd_unshare() to after the early return checks. The
assertions are only needed when actual PMD unsharing work will be
performed. If the function returns early because sz != PMD_SIZE or the
PMD is not shared, no locks are required and assertions should not fire.
This approach reverts the VMA skipping logic from commit dd83609b8898
("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
while moving the assertions to avoid the assertion failure, keeping all
the logic within huge_pmd_unshare() itself and allowing page unmapping and
freeing to proceed for all VMAs.
Link: https://lkml.kernel.org/r/20251014113344.21194-1-kartikey406@gmail.com
Fixes: dd83609b8898 ("hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list")
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reported-by: <syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com>
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://syzkaller.appspot.com/bug?extid=f26d7c75c26ec19790e7
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Oscar Salvador <osalvador@suse.de>
Tested-by: <syzbot+f26d7c75c26ec19790e7@syzkaller.appspotmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/hugetlbfs/inode.c | 9 ---------
mm/hugetlb.c | 5 ++---
2 files changed, 2 insertions(+), 12 deletions(-)
--- a/fs/hugetlbfs/inode.c~hugetlbfs-move-lock-assertions-after-early-returns-in-huge_pmd_unshare
+++ a/fs/hugetlbfs/inode.c
@@ -478,14 +478,6 @@ hugetlb_vmdelete_list(struct rb_root_cac
if (!hugetlb_vma_trylock_write(vma))
continue;
- /*
- * Skip VMAs without shareable locks. Per the design in commit
- * 40549ba8f8e0, these will be handled by remove_inode_hugepages()
- * called after this function with proper locking.
- */
- if (!__vma_shareable_lock(vma))
- goto skip;
-
v_start = vma_offset_start(vma, start);
v_end = vma_offset_end(vma, end);
@@ -496,7 +488,6 @@ hugetlb_vmdelete_list(struct rb_root_cac
* vmas. Therefore, lock is not held when calling
* unmap_hugepage_range for private vmas.
*/
-skip:
hugetlb_vma_unlock_write(vma);
}
}
--- a/mm/hugetlb.c~hugetlbfs-move-lock-assertions-after-early-returns-in-huge_pmd_unshare
+++ a/mm/hugetlb.c
@@ -7614,13 +7614,12 @@ int huge_pmd_unshare(struct mm_struct *m
p4d_t *p4d = p4d_offset(pgd, addr);
pud_t *pud = pud_offset(p4d, addr);
- i_mmap_assert_write_locked(vma->vm_file->f_mapping);
- hugetlb_vma_assert_locked(vma);
if (sz != PMD_SIZE)
return 0;
if (!ptdesc_pmd_is_shared(virt_to_ptdesc(ptep)))
return 0;
-
+ i_mmap_assert_write_locked(vma->vm_file->f_mapping);
+ hugetlb_vma_assert_locked(vma);
pud_clear(pud);
/*
* Once our caller drops the rmap lock, some other process might be
_
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
2025-10-21 21:10 ` Andrew Morton
@ 2025-10-22 11:40 ` Mark Brown
0 siblings, 0 replies; 8+ messages in thread
From: Mark Brown @ 2025-10-22 11:40 UTC (permalink / raw)
To: Andrew Morton
Cc: Deepanshu Kartikey, muchun.song, osalvador, david, linux-mm,
linux-kernel, syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV,
torvalds
[-- Attachment #1: Type: text/plain, Size: 493 bytes --]
On Tue, Oct 21, 2025 at 02:10:47PM -0700, Andrew Morton wrote:
> On Mon, 20 Oct 2025 18:52:11 +0100 Mark Brown <broonie@kernel.org> wrote:
> > This issue is now present in mainline:
> > Raspberry Pi 4: https://lava.sirena.org.uk/scheduler/job/1976561#L1798
> > Orion O6: https://lava.sirena.org.uk/scheduler/job/1977081#L1779
> > and still bisects to this patch.
> Thanks. Were you able to test the proposed fix?
I didn't, there were a lot of new versions in quite a short period.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-10-03 11:18 Deepanshu Kartikey
0 siblings, 0 replies; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 11:18 UTC (permalink / raw)
To: broonie
Cc: muchun.song, osalvador, david, akpm, linux-mm, linux-kernel,
syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV
Hi Mark,
Thank you for the report and bisection. I see the hugetlbfs-madvise
test is failing with unexpected free huge pages count.
I'm investigating why the VMAs skipped in my patch aren't getting
their pages freed properly. I'll analyze the test code and work on
a fix.
I'll follow up once I understand the root cause.
Deepanshu
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-10-03 15:09 Deepanshu Kartikey
0 siblings, 0 replies; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 15:09 UTC (permalink / raw)
To: akpm, broonie
Cc: muchun.song, osalvador, david, linux-mm, inux-kernel,
syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV
Hi,
My previous patch dd83609b8898 ("hugetlbfs: skip VMAs without shareable
locks in hugetlb_vmdelete_list") fixed a WARNING but introduced a
regression where pages aren't freed during punch hole operations, as
reported by Mark Brown.
The issue is that skipping the entire VMA means pages don't get unmapped,
so they can't be freed.
I'm considering the following fix approach:
1. Add a new ZAP_FLAG_NO_UNSHARE flag
2. In hugetlb_vmdelete_list(), try to get the shareable lock
3. If we can't get it, set ZAP_FLAG_NO_UNSHARE and proceed anyway
4. In __unmap_hugepage_range(), skip huge_pmd_unshare() if flag is set
5. But still clear page table entries so pages get freed
This way:
- For truncate: same behavior as before (might skip unsharing)
- For punch hole: pages get freed immediately (fixes regression)
- No WARNING (we don't call huge_pmd_unshare without lock)
The trade-off is that PMD metadata may not be cleaned up immediately
when we can't get the shareable lock, but it will be freed when the
VMA is destroyed.
Does this approach seem reasonable? Or is there a better way to handle
this?
Thanks,
Deepanshu
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
@ 2025-10-03 15:09 Deepanshu Kartikey
0 siblings, 0 replies; 8+ messages in thread
From: Deepanshu Kartikey @ 2025-10-03 15:09 UTC (permalink / raw)
To: akpm, broonie
Cc: muchun.song, osalvador, david, linux-mm, linux-kernel,
syzbot+f26d7c75c26ec19790e7, Aishwarya.TCV
Hi,
My previous patch dd83609b8898 ("hugetlbfs: skip VMAs without shareable
locks in hugetlb_vmdelete_list") fixed a WARNING but introduced a
regression where pages aren't freed during punch hole operations, as
reported by Mark Brown.
The issue is that skipping the entire VMA means pages don't get unmapped,
so they can't be freed.
I'm considering the following fix approach:
1. Add a new ZAP_FLAG_NO_UNSHARE flag
2. In hugetlb_vmdelete_list(), try to get the shareable lock
3. If we can't get it, set ZAP_FLAG_NO_UNSHARE and proceed anyway
4. In __unmap_hugepage_range(), skip huge_pmd_unshare() if flag is set
5. But still clear page table entries so pages get freed
This way:
- For truncate: same behavior as before (might skip unsharing)
- For punch hole: pages get freed immediately (fixes regression)
- No WARNING (we don't call huge_pmd_unshare without lock)
The trade-off is that PMD metadata may not be cleaned up immediately
when we can't get the shareable lock, but it will be freed when the
VMA is destroyed.
Does this approach seem reasonable? Or is there a better way to handle
this?
Thanks,
Deepanshu
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-10-22 11:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-26 3:32 [PATCH v2] hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list Deepanshu Kartikey
2025-10-03 10:57 ` Mark Brown
2025-10-20 17:52 ` Mark Brown
2025-10-21 21:10 ` Andrew Morton
2025-10-22 11:40 ` Mark Brown
2025-10-03 11:18 Deepanshu Kartikey
2025-10-03 15:09 Deepanshu Kartikey
2025-10-03 15:09 Deepanshu Kartikey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox