* [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
@ 2026-01-09 3:47 Jinchao Wang
2026-01-09 4:06 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09 3:47 UTC (permalink / raw)
To: Matthew Wilcox, Andrew Morton, David Hildenbrand, Zi Yan,
Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
Gregory Price, Ying Huang, Alistair Popple, linux-mm,
linux-kernel
Cc: Jinchao Wang, syzbot+2d9c96466c978346b55f
Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
The deadlock occurs because migration violates the lock ordering defined
in mm/rmap.c for hugetlbfs:
* hugetlbfs PageHuge() take locks in this order:
* hugetlb_fault_mutex
* vma_lock
* mapping->i_mmap_rwsem
* folio_lock
The following trace illustrates the inversion:
Task A (punch_hole): Task B (migration):
-------------------- -------------------
1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
(blocks waiting for B) (blocks waiting for A)
Task A is blocked in the punch-hole path:
hugetlbfs_fallocate
hugetlbfs_punch_hole
hugetlbfs_zero_partial_page
folio_lock
Task B is blocked in the migration path:
migrate_pages
unmap_and_move_huge_page
remove_migration_ptes
__rmap_walk_file
i_mmap_lock_read
To fix this, adjust unmap_and_move_huge_page() to respect the established
hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
until remove_migration_ptes() completes.
This utilizes the existing retry logic, which unlocks the folio and
returns -EAGAIN if hugetlb_folio_mapping_lock_write() fails.
Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
---
mm/migrate.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 5169f9717f60..bcaa13541acc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1458,6 +1458,7 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
int page_was_mapped = 0;
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
+ enum ttu_flags ttu = 0;
if (folio_ref_count(src) == 1) {
/* page was freed from under us. So we are done. */
@@ -1498,8 +1499,6 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
goto put_anon;
if (folio_mapped(src)) {
- enum ttu_flags ttu = 0;
-
if (!folio_test_anon(src)) {
/*
* In shared mappings, try_to_unmap could potentially
@@ -1516,16 +1515,17 @@ static int unmap_and_move_huge_page(new_folio_t get_new_folio,
try_to_migrate(src, ttu);
page_was_mapped = 1;
-
- if (ttu & TTU_RMAP_LOCKED)
- i_mmap_unlock_write(mapping);
}
if (!folio_mapped(src))
rc = move_to_new_folio(dst, src, mode);
if (page_was_mapped)
- remove_migration_ptes(src, !rc ? dst : src, 0);
+ remove_migration_ptes(src, !rc ? dst : src,
+ ttu ? RMP_LOCKED : 0);
+
+ if (ttu & TTU_RMAP_LOCKED)
+ i_mmap_unlock_write(mapping);
unlock_put_anon:
folio_unlock(dst);
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
@ 2026-01-09 4:06 ` Matthew Wilcox
2026-01-09 5:17 ` Jinchao Wang
2026-01-09 6:37 ` Huang, Ying
2026-01-09 13:39 ` David Hildenbrand (Red Hat)
2 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2026-01-09 4:06 UTC (permalink / raw)
To: Jinchao Wang
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price,
Ying Huang, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
On Fri, Jan 09, 2026 at 11:47:16AM +0800, Jinchao Wang wrote:
> Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
... and by "Suggested-by", you mean "completely written by", right?
Or did you change it in some way?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 4:06 ` Matthew Wilcox
@ 2026-01-09 5:17 ` Jinchao Wang
0 siblings, 0 replies; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09 5:17 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, David Hildenbrand, Zi Yan, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price,
Ying Huang, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
On Fri, Jan 09, 2026 at 04:06:22AM +0000, Matthew Wilcox wrote:
> On Fri, Jan 09, 2026 at 11:47:16AM +0800, Jinchao Wang wrote:
> > Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> > Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> > Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
>
> ... and by "Suggested-by", you mean "completely written by", right?
>
> Or did you change it in some way?
Yes, it is completely written by you. I verified it against the syzkaller
reproducer and reviewed the code logic.
If you prefer, I am happy to update the attribution, for example by replacing
Suggested-by with Co-developed-by, or by listing you as the author instead. I
can also drop my patch if that is more appropriate.
Please let me know what you prefer.
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
2026-01-09 4:06 ` Matthew Wilcox
@ 2026-01-09 6:37 ` Huang, Ying
2026-01-09 8:08 ` Jinchao Wang
2026-01-09 13:39 ` David Hildenbrand (Red Hat)
2 siblings, 1 reply; 10+ messages in thread
From: Huang, Ying @ 2026-01-09 6:37 UTC (permalink / raw)
To: Jinchao Wang
Cc: Matthew Wilcox, Andrew Morton, David Hildenbrand, Zi Yan,
Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
Gregory Price, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
Jinchao Wang <wangjinchao600@gmail.com> writes:
> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
>
> The deadlock occurs because migration violates the lock ordering defined
> in mm/rmap.c for hugetlbfs:
>
> * hugetlbfs PageHuge() take locks in this order:
> * hugetlb_fault_mutex
> * vma_lock
> * mapping->i_mmap_rwsem
> * folio_lock
>
> The following trace illustrates the inversion:
>
> Task A (punch_hole): Task B (migration):
> -------------------- -------------------
> 1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
> 2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
> (blocks waiting for B) (blocks waiting for A)
>
> Task A is blocked in the punch-hole path:
> hugetlbfs_fallocate
> hugetlbfs_punch_hole
> hugetlbfs_zero_partial_page
> folio_lock
>
> Task B is blocked in the migration path:
> migrate_pages
> unmap_and_move_huge_page
> remove_migration_ptes
> __rmap_walk_file
> i_mmap_lock_read
>
> To fix this, adjust unmap_and_move_huge_page() to respect the established
> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
> until remove_migration_ptes() completes.
>
> This utilizes the existing retry logic, which unlocks the folio and
> returns -EAGAIN if hugetlb_folio_mapping_lock_write() fails.
>
> Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
Can you provide a "Fixes:" tag? That is helpful for backporting the bug
fix.
---
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 6:37 ` Huang, Ying
@ 2026-01-09 8:08 ` Jinchao Wang
0 siblings, 0 replies; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09 8:08 UTC (permalink / raw)
To: Huang, Ying
Cc: Matthew Wilcox, Andrew Morton, David Hildenbrand, Zi Yan,
Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
Gregory Price, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
On Fri, Jan 09, 2026 at 02:37:28PM +0800, Huang, Ying wrote:
> Jinchao Wang <wangjinchao600@gmail.com> writes:
>
> > Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
> >
> > The deadlock occurs because migration violates the lock ordering defined
> > in mm/rmap.c for hugetlbfs:
> >
> > * hugetlbfs PageHuge() take locks in this order:
> > * hugetlb_fault_mutex
> > * vma_lock
> > * mapping->i_mmap_rwsem
> > * folio_lock
> >
> > The following trace illustrates the inversion:
> >
> > Task A (punch_hole): Task B (migration):
> > -------------------- -------------------
> > 1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
> > 2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
> > (blocks waiting for B) (blocks waiting for A)
> >
> > Task A is blocked in the punch-hole path:
> > hugetlbfs_fallocate
> > hugetlbfs_punch_hole
> > hugetlbfs_zero_partial_page
> > folio_lock
> >
> > Task B is blocked in the migration path:
> > migrate_pages
> > unmap_and_move_huge_page
> > remove_migration_ptes
> > __rmap_walk_file
> > i_mmap_lock_read
> >
> > To fix this, adjust unmap_and_move_huge_page() to respect the established
> > hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
> > until remove_migration_ptes() completes.
> >
> > This utilizes the existing retry logic, which unlocks the folio and
> > returns -EAGAIN if hugetlb_folio_mapping_lock_write() fails.
> >
> > Link: https://lore.kernel.org/all/68e9715a.050a0220.1186a4.000d.GAE@google.com/
> > Link: https://lore.kernel.org/all/20260108123957.1123502-2-wangjinchao600@gmail.com
> > Reported-by: syzbot+2d9c96466c978346b55f@syzkaller.appspotmail.com
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Jinchao Wang <wangjinchao600@gmail.com>
>
> Can you provide a "Fixes:" tag? That is helpful for backporting the bug
> fix.
Thanks for the suggestion.
The deadlock appears to be caused by a violation of the lock ordering
introduced in commit 336bf30eb765 ("hugetlbfs: fix anon huge page migration
race"). Although commit 68d32527d340 ("hugetlbfs: zero partial pages during
fallocate hole punch") was the one that first triggered the crash,
I believe the 336bf30eb765 commit is the root cause.
I will add the following tag to v2:
Fixes: 336bf30eb765 ("hugetlbfs: fix anon huge page migration race")
>
> ---
> Best Regards,
> Huang, Ying
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
2026-01-09 4:06 ` Matthew Wilcox
2026-01-09 6:37 ` Huang, Ying
@ 2026-01-09 13:39 ` David Hildenbrand (Red Hat)
2026-01-09 14:16 ` Jinchao Wang
2 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 13:39 UTC (permalink / raw)
To: Jinchao Wang, Matthew Wilcox, Andrew Morton, Zi Yan,
Matthew Brost, Joshua Hahn, Rakie Kim, Byungchul Park,
Gregory Price, Ying Huang, Alistair Popple, linux-mm,
linux-kernel
Cc: syzbot+2d9c96466c978346b55f
On 1/9/26 04:47, Jinchao Wang wrote:
> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
>
> The deadlock occurs because migration violates the lock ordering defined
> in mm/rmap.c for hugetlbfs:
>
> * hugetlbfs PageHuge() take locks in this order:
> * hugetlb_fault_mutex
> * vma_lock
> * mapping->i_mmap_rwsem
> * folio_lock
>
> The following trace illustrates the inversion:
>
> Task A (punch_hole): Task B (migration):
> -------------------- -------------------
> 1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
> 2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
> (blocks waiting for B) (blocks waiting for A)
>
> Task A is blocked in the punch-hole path:
> hugetlbfs_fallocate
> hugetlbfs_punch_hole
> hugetlbfs_zero_partial_page
> folio_lock
>
> Task B is blocked in the migration path:
> migrate_pages
> unmap_and_move_huge_page
> remove_migration_ptes
> __rmap_walk_file
> i_mmap_lock_read
>
> To fix this, adjust unmap_and_move_huge_page() to respect the established
> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
try-lock)?
We now handle file-backed folios correctly I think. Could we somehow
also be in trouble for anon folios? Because there, we'd still take the
rmap lock after grabbing the folio lock.
--
Cheers
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 13:39 ` David Hildenbrand (Red Hat)
@ 2026-01-09 14:16 ` Jinchao Wang
2026-01-09 14:18 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09 14:16 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price,
Ying Huang, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/9/26 04:47, Jinchao Wang wrote:
> > Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
> >
> > The deadlock occurs because migration violates the lock ordering defined
> > in mm/rmap.c for hugetlbfs:
> >
> > * hugetlbfs PageHuge() take locks in this order:
> > * hugetlb_fault_mutex
> > * vma_lock
> > * mapping->i_mmap_rwsem
> > * folio_lock
> >
> > The following trace illustrates the inversion:
> >
> > Task A (punch_hole): Task B (migration):
> > -------------------- -------------------
> > 1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
> > 2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
> > (blocks waiting for B) (blocks waiting for A)
> >
> > Task A is blocked in the punch-hole path:
> > hugetlbfs_fallocate
> > hugetlbfs_punch_hole
> > hugetlbfs_zero_partial_page
> > folio_lock
> >
> > Task B is blocked in the migration path:
> > migrate_pages
> > unmap_and_move_huge_page
> > remove_migration_ptes
> > __rmap_walk_file
> > i_mmap_lock_read
> >
> > To fix this, adjust unmap_and_move_huge_page() to respect the established
> > hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
>
>
> I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
> i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
> try-lock)?
Yes, but the lock is released before remove_migration_ptes().
Task A can enter the race window between
i_mmap_unlock_write(mapping)
and
remove_migration_ptes() -> i_mmap_lock_read(mapping).
This window was introduced by the change below:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765
>
>
> We now handle file-backed folios correctly I think. Could we somehow also be
> in trouble for anon folios? Because there, we'd still take the rmap lock
> after grabbing the folio lock.
>
>
> --
> Cheers
>
> David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 14:16 ` Jinchao Wang
@ 2026-01-09 14:18 ` David Hildenbrand (Red Hat)
2026-01-09 15:32 ` Jinchao Wang
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 14:18 UTC (permalink / raw)
To: Jinchao Wang
Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price,
Ying Huang, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
On 1/9/26 15:16, Jinchao Wang wrote:
> On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 1/9/26 04:47, Jinchao Wang wrote:
>>> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
>>>
>>> The deadlock occurs because migration violates the lock ordering defined
>>> in mm/rmap.c for hugetlbfs:
>>>
>>> * hugetlbfs PageHuge() take locks in this order:
>>> * hugetlb_fault_mutex
>>> * vma_lock
>>> * mapping->i_mmap_rwsem
>>> * folio_lock
>>>
>>> The following trace illustrates the inversion:
>>>
>>> Task A (punch_hole): Task B (migration):
>>> -------------------- -------------------
>>> 1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
>>> 2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
>>> (blocks waiting for B) (blocks waiting for A)
>>>
>>> Task A is blocked in the punch-hole path:
>>> hugetlbfs_fallocate
>>> hugetlbfs_punch_hole
>>> hugetlbfs_zero_partial_page
>>> folio_lock
>>>
>>> Task B is blocked in the migration path:
>>> migrate_pages
>>> unmap_and_move_huge_page
>>> remove_migration_ptes
>>> __rmap_walk_file
>>> i_mmap_lock_read
>>>
>>> To fix this, adjust unmap_and_move_huge_page() to respect the established
>>> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
>>
>>
>> I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
>> i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
>> try-lock)?
> Yes, but the lock is released before remove_migration_ptes().
>
> Task A can enter the race window between
> i_mmap_unlock_write(mapping)
> and
> remove_migration_ptes() -> i_mmap_lock_read(mapping).
>
> This window was introduced by the change below:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765
try_to_migrate() is not the problem, but remove_migration_ptes() ?
Anyhow, I saw that Willy sent out a version.
--
Cheers
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 14:18 ` David Hildenbrand (Red Hat)
@ 2026-01-09 15:32 ` Jinchao Wang
2026-01-09 15:41 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 10+ messages in thread
From: Jinchao Wang @ 2026-01-09 15:32 UTC (permalink / raw)
To: David Hildenbrand (Red Hat)
Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price,
Ying Huang, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
On Fri, Jan 09, 2026 at 03:18:37PM +0100, David Hildenbrand (Red Hat) wrote:
> On 1/9/26 15:16, Jinchao Wang wrote:
> > On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
> > > On 1/9/26 04:47, Jinchao Wang wrote:
> > > > Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
> > > >
> > > > The deadlock occurs because migration violates the lock ordering defined
> > > > in mm/rmap.c for hugetlbfs:
> > > >
> > > > * hugetlbfs PageHuge() take locks in this order:
> > > > * hugetlb_fault_mutex
> > > > * vma_lock
> > > > * mapping->i_mmap_rwsem
> > > > * folio_lock
> > > >
> > > > The following trace illustrates the inversion:
> > > >
> > > > Task A (punch_hole): Task B (migration):
> > > > -------------------- -------------------
> > > > 1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
> > > > 2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
> > > > (blocks waiting for B) (blocks waiting for A)
> > > >
> > > > Task A is blocked in the punch-hole path:
> > > > hugetlbfs_fallocate
> > > > hugetlbfs_punch_hole
> > > > hugetlbfs_zero_partial_page
> > > > folio_lock
> > > >
> > > > Task B is blocked in the migration path:
> > > > migrate_pages
> > > > unmap_and_move_huge_page
> > > > remove_migration_ptes
> > > > __rmap_walk_file
> > > > i_mmap_lock_read
> > > >
> > > > To fix this, adjust unmap_and_move_huge_page() to respect the established
> > > > hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
> > >
> > >
> > > I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
> > > i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
> > > try-lock)?
> > Yes, but the lock is released before remove_migration_ptes().
> >
> > Task A can enter the race window between
> > i_mmap_unlock_write(mapping)
> > and
> > remove_migration_ptes() -> i_mmap_lock_read(mapping).
> >
> > This window was introduced by the change below:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765
>
> try_to_migrate() is not the problem, but remove_migration_ptes() ?
>
> Anyhow, I saw that Willy sent out a version.
Thank you for letting me know.
>
> --
> Cheers
>
> David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering
2026-01-09 15:32 ` Jinchao Wang
@ 2026-01-09 15:41 ` David Hildenbrand (Red Hat)
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-09 15:41 UTC (permalink / raw)
To: Jinchao Wang
Cc: Matthew Wilcox, Andrew Morton, Zi Yan, Matthew Brost,
Joshua Hahn, Rakie Kim, Byungchul Park, Gregory Price,
Ying Huang, Alistair Popple, linux-mm, linux-kernel,
syzbot+2d9c96466c978346b55f
On 1/9/26 16:32, Jinchao Wang wrote:
> On Fri, Jan 09, 2026 at 03:18:37PM +0100, David Hildenbrand (Red Hat) wrote:
>> On 1/9/26 15:16, Jinchao Wang wrote:
>>> On Fri, Jan 09, 2026 at 02:39:08PM +0100, David Hildenbrand (Red Hat) wrote:
>>>> On 1/9/26 04:47, Jinchao Wang wrote:
>>>>> Fix an AB-BA deadlock between hugetlbfs_punch_hole() and page migration.
>>>>>
>>>>> The deadlock occurs because migration violates the lock ordering defined
>>>>> in mm/rmap.c for hugetlbfs:
>>>>>
>>>>> * hugetlbfs PageHuge() take locks in this order:
>>>>> * hugetlb_fault_mutex
>>>>> * vma_lock
>>>>> * mapping->i_mmap_rwsem
>>>>> * folio_lock
>>>>>
>>>>> The following trace illustrates the inversion:
>>>>>
>>>>> Task A (punch_hole): Task B (migration):
>>>>> -------------------- -------------------
>>>>> 1. i_mmap_lock_write(mapping) 1. folio_lock(folio)
>>>>> 2. folio_lock(folio) 2. i_mmap_lock_read(mapping)
>>>>> (blocks waiting for B) (blocks waiting for A)
>>>>>
>>>>> Task A is blocked in the punch-hole path:
>>>>> hugetlbfs_fallocate
>>>>> hugetlbfs_punch_hole
>>>>> hugetlbfs_zero_partial_page
>>>>> folio_lock
>>>>>
>>>>> Task B is blocked in the migration path:
>>>>> migrate_pages
>>>>> unmap_and_move_huge_page
>>>>> remove_migration_ptes
>>>>> __rmap_walk_file
>>>>> i_mmap_lock_read
>>>>>
>>>>> To fix this, adjust unmap_and_move_huge_page() to respect the established
>>>>> hierarchy. If i_mmap_rwsem is acquired during try_to_migrate(), hold it
>>>>
>>>>
>>>> I'm confused. Isn't it unmap_and_move_huge_page() that grabs the
>>>> i_mmap_rwsem during hugetlb_page_mapping_lock_write() (where we do a
>>>> try-lock)?
>>> Yes, but the lock is released before remove_migration_ptes().
>>>
>>> Task A can enter the race window between
>>> i_mmap_unlock_write(mapping)
>>> and
>>> remove_migration_ptes() -> i_mmap_lock_read(mapping).
>>>
>>> This window was introduced by the change below:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/mm/migrate.c?id=336bf30eb765
>>
>> try_to_migrate() is not the problem, but remove_migration_ptes() ?
>>
>> Anyhow, I saw that Willy sent out a version.
> Thank you for letting me know.
For reference:
https://lkml.kernel.org/r/20260109041345.3863089-1-willy@infradead.org
--
Cheers
David
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-01-09 15:41 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-09 3:47 [PATCH] mm/migrate: fix hugetlbfs deadlock by respecting lock ordering Jinchao Wang
2026-01-09 4:06 ` Matthew Wilcox
2026-01-09 5:17 ` Jinchao Wang
2026-01-09 6:37 ` Huang, Ying
2026-01-09 8:08 ` Jinchao Wang
2026-01-09 13:39 ` David Hildenbrand (Red Hat)
2026-01-09 14:16 ` Jinchao Wang
2026-01-09 14:18 ` David Hildenbrand (Red Hat)
2026-01-09 15:32 ` Jinchao Wang
2026-01-09 15:41 ` David Hildenbrand (Red Hat)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox