* [PATCH 1/2] mm: add filemap_lock_folio_nowait helper @ 2026-01-08 12:39 Jinchao Wang 2026-01-08 12:39 ` [PATCH 2/2] Fix an AB-BA deadlock in hugetlbfs_punch_hole() involving page migration Jinchao Wang 0 siblings, 1 reply; 4+ messages in thread From: Jinchao Wang @ 2026-01-08 12:39 UTC (permalink / raw) To: Muchun Song, Oscar Salvador, David Hildenbrand, Matthew Wilcox (Oracle), Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, linux-fsdevel Cc: Jinchao Wang Introduce filemap_lock_folio_nowait() to allow non-blocking folio lock attempts using FGP_NOWAIT. This allows callers to avoid AB-BA deadlocks by dropping higher-level locks when a folio is already locked. Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> --- include/linux/pagemap.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 31a848485ad9..b9d818a9409b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -825,6 +825,12 @@ static inline struct folio *filemap_lock_folio(struct address_space *mapping, return __filemap_get_folio(mapping, index, FGP_LOCK, 0); } +static inline struct folio *filemap_lock_folio_nowait(struct address_space *mapping, + pgoff_t index) +{ + return __filemap_get_folio(mapping, index, FGP_LOCK | FGP_NOWAIT, 0); +} + /** * filemap_grab_folio - grab a folio from the page cache * @mapping: The address space to search -- 2.43.0 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/2] Fix an AB-BA deadlock in hugetlbfs_punch_hole() involving page migration. 2026-01-08 12:39 [PATCH 1/2] mm: add filemap_lock_folio_nowait helper Jinchao Wang @ 2026-01-08 12:39 ` Jinchao Wang 2026-01-08 14:09 ` Matthew Wilcox 0 siblings, 1 reply; 4+ messages in thread From: Jinchao Wang @ 2026-01-08 12:39 UTC (permalink / raw) To: Muchun Song, Oscar Salvador, David Hildenbrand, Matthew Wilcox (Oracle), Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, linux-fsdevel Cc: Jinchao Wang, syzbot+2d9c96466c978346b55f The deadlock occurs due to the following lock ordering: Task A (punch_hole): Task B (migration): -------------------- ------------------- 1. i_mmap_lock_write(mapping) 1. folio_lock(folio) 2. folio_lock(folio) 2. i_mmap_lock_read(mapping) (blocks waiting for B) (blocks waiting for A) Task A is blocked in the punch-hole path: hugetlbfs_fallocate hugetlbfs_punch_hole hugetlbfs_zero_partial_page filemap_lock_hugetlb_folio filemap_lock_folio __filemap_get_folio folio_lock Task B is blocked in the migration path: migrate_pages migrate_hugetlbs unmap_and_move_huge_page remove_migration_ptes __rmap_walk_file i_mmap_lock_read To break this circular dependency, use filemap_lock_folio_nowait() in the punch-hole path. If the folio is already locked, Task A drops the i_mmap_rwsem and retries. This allows Task B to finish its rmap walk and release the folio lock. Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com> --- fs/hugetlbfs/inode.c | 34 +++++++++++++++++++++++----------- include/linux/hugetlb.h | 2 +- 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 3b4c152c5c73..e903344aa0ec 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -653,17 +653,16 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) remove_inode_hugepages(inode, offset, LLONG_MAX); } -static void hugetlbfs_zero_partial_page(struct hstate *h, - struct address_space *mapping, - loff_t start, - loff_t end) +static int hugetlbfs_zero_partial_page(struct hstate *h, + struct address_space *mapping, + loff_t start, loff_t end) { pgoff_t idx = start >> huge_page_shift(h); struct folio *folio; folio = filemap_lock_hugetlb_folio(h, mapping, idx); if (IS_ERR(folio)) - return; + return PTR_ERR(folio); start = start & ~huge_page_mask(h); end = end & ~huge_page_mask(h); @@ -674,6 +673,7 @@ static void hugetlbfs_zero_partial_page(struct hstate *h, folio_unlock(folio); folio_put(folio); + return 0; } static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) @@ -683,6 +683,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) struct hstate *h = hstate_inode(inode); loff_t hpage_size = huge_page_size(h); loff_t hole_start, hole_end; + int rc; /* * hole_start and hole_end indicate the full pages within the hole. @@ -698,12 +699,18 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) return -EPERM; } +repeat: i_mmap_lock_write(mapping); /* If range starts before first full page, zero partial page. */ - if (offset < hole_start) - hugetlbfs_zero_partial_page(h, mapping, - offset, min(offset + len, hole_start)); + if (offset < hole_start) { + rc = hugetlbfs_zero_partial_page(h, mapping, offset, + min(offset + len, hole_start)); + if (rc == -EAGAIN) { + i_mmap_unlock_write(mapping); + goto repeat; + } + } /* Unmap users of full pages in the hole. */ if (hole_end > hole_start) { @@ -714,9 +721,14 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) } /* If range extends beyond last full page, zero partial page. */ - if ((offset + len) > hole_end && (offset + len) > hole_start) - hugetlbfs_zero_partial_page(h, mapping, - hole_end, offset + len); + if ((offset + len) > hole_end && (offset + len) > hole_start) { + rc = hugetlbfs_zero_partial_page(h, mapping, hole_end, + offset + len); + if (rc == -EAGAIN) { + i_mmap_unlock_write(mapping); + goto repeat; + } + } i_mmap_unlock_write(mapping); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 019a1c5281e4..ad55b9dada0a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -814,7 +814,7 @@ static inline unsigned int blocks_per_huge_page(struct hstate *h) static inline struct folio *filemap_lock_hugetlb_folio(struct hstate *h, struct address_space *mapping, pgoff_t idx) { - return filemap_lock_folio(mapping, idx << huge_page_order(h)); + return filemap_lock_folio_nowait(mapping, idx << huge_page_order(h)); } #include <asm/hugetlb.h> -- 2.43.0 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] Fix an AB-BA deadlock in hugetlbfs_punch_hole() involving page migration. 2026-01-08 12:39 ` [PATCH 2/2] Fix an AB-BA deadlock in hugetlbfs_punch_hole() involving page migration Jinchao Wang @ 2026-01-08 14:09 ` Matthew Wilcox 2026-01-09 2:17 ` Jinchao Wang 0 siblings, 1 reply; 4+ messages in thread From: Matthew Wilcox @ 2026-01-08 14:09 UTC (permalink / raw) To: Jinchao Wang Cc: Muchun Song, Oscar Salvador, David Hildenbrand, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, linux-fsdevel, syzbot+2d9c96466c978346b55f, Zi Yan On Thu, Jan 08, 2026 at 08:39:25PM +0800, Jinchao Wang wrote: > The deadlock occurs due to the following lock ordering: > > Task A (punch_hole): Task B (migration): > -------------------- ------------------- > 1. i_mmap_lock_write(mapping) 1. folio_lock(folio) > 2. folio_lock(folio) 2. i_mmap_lock_read(mapping) > (blocks waiting for B) (blocks waiting for A) > > Task A is blocked in the punch-hole path: > hugetlbfs_fallocate > hugetlbfs_punch_hole > hugetlbfs_zero_partial_page > filemap_lock_hugetlb_folio > filemap_lock_folio > __filemap_get_folio > folio_lock > > Task B is blocked in the migration path: > migrate_pages > migrate_hugetlbs > unmap_and_move_huge_page > remove_migration_ptes > __rmap_walk_file > i_mmap_lock_read > > To break this circular dependency, use filemap_lock_folio_nowait() in > the punch-hole path. If the folio is already locked, Task A drops the > i_mmap_rwsem and retries. This allows Task B to finish its rmap walk > and release the folio lock. It looks like you didn't read the lock ordering at the top of mm/rmap.c carefully enough: * hugetlbfs PageHuge() take locks in this order: * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) * vma_lock (hugetlb specific lock for pmd_sharing) * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) * folio_lock So page migration is the one taking locks in the wrong order, not holepunch. Maybe something like this instead? diff --git a/mm/migrate.c b/mm/migrate.c index 5169f9717f60..4688b9e38cd2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1458,6 +1458,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, int page_was_mapped = 0; struct anon_vma *anon_vma = NULL; struct address_space *mapping = NULL; + enum ttu_flags ttu = 0; if (folio_ref_count(src) == 1) { /* page was freed from under us. So we are done. */ @@ -1498,8 +1499,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, goto put_anon; if (folio_mapped(src)) { - enum ttu_flags ttu = 0; - if (!folio_test_anon(src)) { /* * In shared mappings, try_to_unmap could potentially @@ -1516,16 +1515,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, try_to_migrate(src, ttu); page_was_mapped = 1; - - if (ttu & TTU_RMAP_LOCKED) - i_mmap_unlock_write(mapping); } if (!folio_mapped(src)) rc = move_to_new_folio(dst, src, mode); if (page_was_mapped) - remove_migration_ptes(src, !rc ? dst : src, 0); + remove_migration_ptes(src, !rc ? dst : src, + ttu ? RMP_LOCKED : 0); + + if (ttu & TTU_RMAP_LOCKED) + i_mmap_unlock_write(mapping); unlock_put_anon: folio_unlock(dst); ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/2] Fix an AB-BA deadlock in hugetlbfs_punch_hole() involving page migration. 2026-01-08 14:09 ` Matthew Wilcox @ 2026-01-09 2:17 ` Jinchao Wang 0 siblings, 0 replies; 4+ messages in thread From: Jinchao Wang @ 2026-01-09 2:17 UTC (permalink / raw) To: Matthew Wilcox Cc: Muchun Song, Oscar Salvador, David Hildenbrand, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm, linux-kernel, linux-fsdevel, syzbot+2d9c96466c978346b55f, Zi Yan On Thu, Jan 08, 2026 at 02:09:19PM +0000, Matthew Wilcox wrote: > On Thu, Jan 08, 2026 at 08:39:25PM +0800, Jinchao Wang wrote: > > The deadlock occurs due to the following lock ordering: > > > > Task A (punch_hole): Task B (migration): > > -------------------- ------------------- > > 1. i_mmap_lock_write(mapping) 1. folio_lock(folio) > > 2. folio_lock(folio) 2. i_mmap_lock_read(mapping) > > (blocks waiting for B) (blocks waiting for A) > > > > Task A is blocked in the punch-hole path: > > hugetlbfs_fallocate > > hugetlbfs_punch_hole > > hugetlbfs_zero_partial_page > > filemap_lock_hugetlb_folio > > filemap_lock_folio > > __filemap_get_folio > > folio_lock > > > > Task B is blocked in the migration path: > > migrate_pages > > migrate_hugetlbs > > unmap_and_move_huge_page > > remove_migration_ptes > > __rmap_walk_file > > i_mmap_lock_read > > > > To break this circular dependency, use filemap_lock_folio_nowait() in > > the punch-hole path. If the folio is already locked, Task A drops the > > i_mmap_rwsem and retries. This allows Task B to finish its rmap walk > > and release the folio lock. > > It looks like you didn't read the lock ordering at the top of mm/rmap.c > carefully enough: > > * hugetlbfs PageHuge() take locks in this order: > * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) > * vma_lock (hugetlb specific lock for pmd_sharing) > * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) > * folio_lock > Thanks for the correction, Matthew. > So page migration is the one taking locks in the wrong order, not > holepunch. Maybe something like this instead? > I will test your suggested change and resend the fix. > > diff --git a/mm/migrate.c b/mm/migrate.c > index 5169f9717f60..4688b9e38cd2 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1458,6 +1458,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, > int page_was_mapped = 0; > struct anon_vma *anon_vma = NULL; > struct address_space *mapping = NULL; > + enum ttu_flags ttu = 0; > > if (folio_ref_count(src) == 1) { > /* page was freed from under us. So we are done. */ > @@ -1498,8 +1499,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, > goto put_anon; > > if (folio_mapped(src)) { > - enum ttu_flags ttu = 0; > - > if (!folio_test_anon(src)) { > /* > * In shared mappings, try_to_unmap could potentially > @@ -1516,16 +1515,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio, > > try_to_migrate(src, ttu); > page_was_mapped = 1; > - > - if (ttu & TTU_RMAP_LOCKED) > - i_mmap_unlock_write(mapping); > } > > if (!folio_mapped(src)) > rc = move_to_new_folio(dst, src, mode); > > if (page_was_mapped) > - remove_migration_ptes(src, !rc ? dst : src, 0); > + remove_migration_ptes(src, !rc ? dst : src, > + ttu ? RMP_LOCKED : 0); > + > + if (ttu & TTU_RMAP_LOCKED) > + i_mmap_unlock_write(mapping); > > unlock_put_anon: > folio_unlock(dst); ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-09 2:17 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-01-08 12:39 [PATCH 1/2] mm: add filemap_lock_folio_nowait helper Jinchao Wang 2026-01-08 12:39 ` [PATCH 2/2] Fix an AB-BA deadlock in hugetlbfs_punch_hole() involving page migration Jinchao Wang 2026-01-08 14:09 ` Matthew Wilcox 2026-01-09 2:17 ` Jinchao Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox