[RFC] mm/migrate: make sure folio_unlock() before folio_wait

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
@ 2025-10-02  8:16 Byungchul Park
  2025-10-02 11:38 ` David Hildenbrand
  2025-10-02 11:42 ` Yeoreum Yun
  0 siblings, 2 replies; 16+ messages in thread
From: Byungchul Park @ 2025-10-02  8:16 UTC (permalink / raw)
  To: akpm
  Cc: david, ziy, matthew.brost, joshua.hahnjy, rakie.kim, gourry,
	ying.huang, apopple, clameter, kravetz, linux-mm, linux-kernel,
	max.byungchul.park, kernel_team, harry.yoo, gwan-gyeong.mun,
	yeoreum.yun, syzkaller, ysk

DEPT(Dependency Tracker) reported a deadlock:

   ===================================================
   DEPT: Circular dependency has been detected.
   6.15.11-00046-g2c223fa7bd9a-dirty #13 Not tainted
   ---------------------------------------------------
   summary
   ---------------------------------------------------
   *** DEADLOCK ***

   context A
      [S] (unknown)(pg_locked_map:0)
      [W] dept_page_wait_on_bit(pg_writeback_map:0)
      [E] dept_page_clear_bit(pg_locked_map:0)

   context B
      [S] (unknown)(pg_writeback_map:0)
      [W] dept_page_wait_on_bit(pg_locked_map:0)
      [E] dept_page_clear_bit(pg_writeback_map:0)

   [S]: start of the event context
   [W]: the wait blocked
   [E]: the event not reachable
   ---------------------------------------------------
   context A's detail
   ---------------------------------------------------
   context A
      [S] (unknown)(pg_locked_map:0)
      [W] dept_page_wait_on_bit(pg_writeback_map:0)
      [E] dept_page_clear_bit(pg_locked_map:0)

   [S] (unknown)(pg_locked_map:0):
   (N/A)

   [W] dept_page_wait_on_bit(pg_writeback_map:0):
   [<ffff800080589c94>] folio_wait_bit+0x2c/0x38
   stacktrace:
         folio_wait_bit_common+0x824/0x8b8
         folio_wait_bit+0x2c/0x38
         folio_wait_writeback+0x5c/0xa4
         migrate_pages_batch+0x5e4/0x1788
         migrate_pages+0x15c4/0x1840
         compact_zone+0x9c8/0x1d20
         compact_node+0xd4/0x27c
         sysctl_compaction_handler+0x104/0x194
         proc_sys_call_handler+0x25c/0x3f8
         proc_sys_write+0x20/0x2c
         do_iter_readv_writev+0x350/0x448
         vfs_writev+0x1ac/0x44c
         do_pwritev+0x100/0x15c
         __arm64_sys_pwritev2+0x6c/0xcc
         invoke_syscall.constprop.0+0x64/0x18c
         el0_svc_common.constprop.0+0x80/0x198

   [E] dept_page_clear_bit(pg_locked_map:0):
   [<ffff800080700914>] migrate_folio_undo_src+0x1b4/0x200
   stacktrace:
         migrate_folio_undo_src+0x1b4/0x200
         migrate_pages_batch+0x1578/0x1788
         migrate_pages+0x15c4/0x1840
         compact_zone+0x9c8/0x1d20
         compact_node+0xd4/0x27c
         sysctl_compaction_handler+0x104/0x194
         proc_sys_call_handler+0x25c/0x3f8
         proc_sys_write+0x20/0x2c
         do_iter_readv_writev+0x350/0x448
         vfs_writev+0x1ac/0x44c
         do_pwritev+0x100/0x15c
         __arm64_sys_pwritev2+0x6c/0xcc
         invoke_syscall.constprop.0+0x64/0x18c
         el0_svc_common.constprop.0+0x80/0x198
         do_el0_svc+0x28/0x3c
         el0_svc+0x50/0x220
   ---------------------------------------------------
   context B's detail
   ---------------------------------------------------
   context B
      [S] (unknown)(pg_writeback_map:0)
      [W] dept_page_wait_on_bit(pg_locked_map:0)
      [E] dept_page_clear_bit(pg_writeback_map:0)

   [S] (unknown)(pg_writeback_map:0):
   (N/A)

   [W] dept_page_wait_on_bit(pg_locked_map:0):
   [<ffff80008081e478>] bdev_getblk+0x58/0x120
   stacktrace:
         find_get_block_common+0x224/0xbc4
         bdev_getblk+0x58/0x120
         __ext4_get_inode_loc+0x194/0x98c
         ext4_get_inode_loc+0x4c/0xcc
         ext4_reserve_inode_write+0x74/0x158
         __ext4_mark_inode_dirty+0xd4/0x4e0
         __ext4_ext_dirty+0x118/0x164
         ext4_ext_map_blocks+0x1578/0x2ca8
         ext4_map_blocks+0x2a4/0xa60
         ext4_convert_unwritten_extents+0x1b0/0x3c0
         ext4_convert_unwritten_io_end_vec+0x90/0x1a0
         ext4_end_io_end+0x58/0x194
         ext4_end_io_rsv_work+0xc4/0x150
         process_one_work+0x3b4/0xac0
         worker_thread+0x2b0/0x53c
         kthread+0x1a0/0x33c

   [E] dept_page_clear_bit(pg_writeback_map:0):
   [<ffff8000809dfc5c>] ext4_finish_bio+0x638/0x820
   stacktrace:
         folio_end_writeback+0x140/0x488
         ext4_finish_bio+0x638/0x820
         ext4_release_io_end+0x74/0x188
         ext4_end_io_end+0xa0/0x194
         ext4_end_io_rsv_work+0xc4/0x150
         process_one_work+0x3b4/0xac0
         worker_thread+0x2b0/0x53c
         kthread+0x1a0/0x33c
         ret_from_fork+0x10/0x20

To simplify the scenario:

   context X (wq worker)	context Y (process context)

				migrate_pages_batch()
   ext4_end_io_end()		  ...
     ...			  migrate_folio_unmap()
     ext4_get_inode_loc()	    ...
       ...			    folio_lock() // hold the folio lock
       bdev_getblk()		    ...
         ...			    folio_wait_writeback() // wait forever
         __find_get_block_slow()
           ...			    ...
           folio_lock() // wait forever
           folio_unlock()	  migrate_folio_undo_src()
				    ...
     ...			    folio_unlock() // never reachable
     ext4_finish_bio()
	...
	folio_end_writeback() // never reachable

context X is waiting for the folio lock to be released by context Y,
while context Y is waiting for the writeback to end in context X.
Ultimately, two contexts are waiting for the event that will never
happen, say, deadlock.

*Only one* of the following two conditions should be allowed, or we
cannot avoid this kind of deadlock:

   1. while holding a folio lock (and heading for folio_unlock()),
      waiting for a writeback to end,
   2. while heading for the writeback end, waiting for the folio lock to
      be released,

Since allowing 2 and avoiding 1 sound more sensible than the other,
remove the first condition by making sure folio_unlock() before
folio_wait_writeback() in migrate_folio_unmap().

Fixes: 49d2e9cc45443 ("[PATCH] Swap Migration V5: migrate_pages() function")
Reported-by: Yunseong Kim <ysk@kzalloc.com>
Signed-off-by: Byungchul Park <byungchul@sk.com>
Tested-by: Yunseong Kim <ysk@kzalloc.com>
---

Hi,

Thanks to Yunseong for reporting the issue, testing, and confirming if
this patch can resolve the issue.  We used the latest version of DEPT
to detect the issue:

   https://lore.kernel.org/all/20251002081247.51255-1-byungchul@sk.com/

I mentioned in the commit message above like:

   *Only one* of the following two conditions should be allowed, or we
   cannot avoid this kind of deadlock:
   
      1. while holding a folio lock (and heading for folio_unlock()),
         waiting for a writeback to end,
      2. while heading for the writeback end, waiting for the folio lock
         to be released,

Honestly, I'm not convinced which one we should choose between two, I
chose 'allowing 2 and avoiding 1' to resolve this issue though.

However, please let me know if I was wrong and we should go for
'allowing 1 and avoiding 2'.  If so, I should try a different approach,
for example, to fix by preventing folio_lock() or using folio_try_lock()
while heading for writeback end in ext4_end_io_end() or something.

To Yunseong,

The link you shared for a system hang is:

   https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5

I think an important stacktrace for this issue, this is, waiting for
PG_writeback, was missed in the log.

	Byungchul

---
 mm/migrate.c | 57 +++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 9e5ef39ce73a..60b0b054f27a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1215,6 +1215,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 
 	dst->private = NULL;
 
+retry_wait_writeback:
+	/*
+	 * Only in the case of a full synchronous migration is it
+	 * necessary to wait for PageWriteback.  In the async case, the
+	 * retry loop is too short and in the sync-light case, the
+	 * overhead of stalling is too much.  Plus, do not write-back if
+	 * it's in the middle of direct compaction
+	 */
+	if (folio_test_writeback(src) && mode == MIGRATE_SYNC)
+		folio_wait_writeback(src);
+
 	if (!folio_trylock(src)) {
 		if (mode == MIGRATE_ASYNC)
 			goto out;
@@ -1245,27 +1256,41 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
 
 		folio_lock(src);
 	}
-	locked = true;
-	if (folio_test_mlocked(src))
-		old_page_state |= PAGE_WAS_MLOCKED;
 
 	if (folio_test_writeback(src)) {
-		/*
-		 * Only in the case of a full synchronous migration is it
-		 * necessary to wait for PageWriteback. In the async case,
-		 * the retry loop is too short and in the sync-light case,
-		 * the overhead of stalling is too much
-		 */
-		switch (mode) {
-		case MIGRATE_SYNC:
-			break;
-		default:
-			rc = -EBUSY;
-			goto out;
+		if (mode == MIGRATE_SYNC) {
+			/*
+			 * folio_unlock() is required before trying
+			 * folio_wait_writeback().  Or it leads a
+			 * deadlock like:
+			 *
+			 *   context x		context y
+			 *   in XXX_io_end()	in migrate_folio_unmap()
+			 *
+			 *   ...		...
+			 *   bdev_getblk();	folio_lock();
+			 *
+			 *     // wait forever	// wait forever
+			 *     folio_lock();	folio_wait_writeback();
+			 *
+			 *     ...		...
+			 *     folio_unlock();
+			 *   ...		// never reachable
+			 *			folio_unlock();
+			 *   // never reachable
+			 *   folio_end_writeback();
+			 */
+			folio_unlock(src);
+			goto retry_wait_writeback;
 		}
-		folio_wait_writeback(src);
+		rc = -EBUSY;
+		goto out;
 	}
 
+	locked = true;
+	if (folio_test_mlocked(src))
+		old_page_state |= PAGE_WAS_MLOCKED;
+
 	/*
 	 * By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
 	 * we cannot notice that anon_vma is freed while we migrate a page.

base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
-- 
2.17.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02  8:16 [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback() Byungchul Park
@ 2025-10-02 11:38 ` David Hildenbrand
  2025-10-02 22:02   ` Hillf Danton
                     ` (3 more replies)
  2025-10-02 11:42 ` Yeoreum Yun
  1 sibling, 4 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-10-02 11:38 UTC (permalink / raw)
  To: Byungchul Park, akpm
  Cc: ziy, matthew.brost, joshua.hahnjy, rakie.kim, gourry, ying.huang,
	apopple, clameter, kravetz, linux-mm, linux-kernel,
	max.byungchul.park, kernel_team, harry.yoo, gwan-gyeong.mun,
	yeoreum.yun, syzkaller, ysk, Matthew Wilcox

> To simplify the scenario:
> 

Just curious, where is the __folio_start_writeback() to complete the 
picture?

>     context X (wq worker)	context Y (process context)
> 
> 				migrate_pages_batch()
>     ext4_end_io_end()		  ...
>       ...			  migrate_folio_unmap()
>       ext4_get_inode_loc()	    ...
>         ...			    folio_lock() // hold the folio lock
>         bdev_getblk()		    ...
>           ...			    folio_wait_writeback() // wait forever
>           __find_get_block_slow()
>             ...			    ...
>             folio_lock() // wait forever
>             folio_unlock()	  migrate_folio_undo_src()
> 				    ...
>       ...			    folio_unlock() // never reachable
>       ext4_finish_bio()
> 	...
> 	folio_end_writeback() // never reachable
> 

But aren't you implying that it should from this point on be disallowed 
to call folio_wait_writeback() with the folio lock held? That sounds ... 
a bit wrong.

Note that it is currently explicitly allowed: folio_wait_writeback() 
documents "If the folio is not locked, writeback may start again after 
writeback has finished.". So there is no way to prevent writeback from 
immediately starting again.

In particular, wouldn't we have to fixup other callsites to make this 
consistent and then VM_WARN_ON_ONCE() assert that in folio_wait_writeback()?

Of course, as we've never seen this deadlock before in practice, I do 
wonder if something else prevents it?

If it's a real issue, I wonder if a trylock on the writeback path could 
be an option.

-- 
Cheers

David / dhildenb

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02 11:38 ` David Hildenbrand
@ 2025-10-02 22:02   ` Hillf Danton
  2025-10-03  0:48     ` Byungchul Park
  2025-10-03  1:02   ` Byungchul Park
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Hillf Danton @ 2025-10-02 22:02 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Byungchul Park, akpm, linux-mm, linux-kernel

On Thu, 2 Oct 2025 13:38:59 +0200 David Hildenbrand wrote:
> 
> If it's a real issue, I wonder if a trylock on the writeback path could 
> be an option.
> 
Given Thanks to Yunseong for reporting the issue, testing, and confirming if
this patch can resolve the issue, could you share your reproducer with
reviewers Byungchul?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02 22:02   ` Hillf Danton
@ 2025-10-03  0:48     ` Byungchul Park
  2025-10-03  0:52       ` Byungchul Park
  0 siblings, 1 reply; 16+ messages in thread
From: Byungchul Park @ 2025-10-03  0:48 UTC (permalink / raw)
  To: Hillf Danton; +Cc: David Hildenbrand, akpm, linux-mm, linux-kernel, kernel_team

On Fri, Oct 03, 2025 at 06:02:10AM +0800, Hillf Danton wrote:
> On Thu, 2 Oct 2025 13:38:59 +0200 David Hildenbrand wrote:
> >
> > If it's a real issue, I wonder if a trylock on the writeback path could
> > be an option.
> >
> Given Thanks to Yunseong for reporting the issue, testing, and confirming if
> this patch can resolve the issue, could you share your reproducer with
> reviewers Byungchul?

Sure.  Yunseong told me it's 100% reproducable.  Yunseong, can you help
him reproduce the issue, please?

	Byungchul


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-03  0:48     ` Byungchul Park
@ 2025-10-03  0:52       ` Byungchul Park
  2025-10-07  6:32         ` Yunseong Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Byungchul Park @ 2025-10-03  0:52 UTC (permalink / raw)
  To: Hillf Danton, ysk
  Cc: David Hildenbrand, akpm, linux-mm, linux-kernel, kernel_team

On Fri, Oct 03, 2025 at 09:48:28AM +0900, Byungchul Park wrote:
> On Fri, Oct 03, 2025 at 06:02:10AM +0800, Hillf Danton wrote:
> > On Thu, 2 Oct 2025 13:38:59 +0200 David Hildenbrand wrote:
> > >
> > > If it's a real issue, I wonder if a trylock on the writeback path could
> > > be an option.
> > >
> > Given Thanks to Yunseong for reporting the issue, testing, and confirming if
> > this patch can resolve the issue, could you share your reproducer with
> > reviewers Byungchul?
> 
> Sure.  Yunseong told me it's 100% reproducable.  Yunseong, can you help
> him reproduce the issue, please?

+to ysk@kzalloc.com

> 
> 	Byungchul


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-03  0:52       ` Byungchul Park
@ 2025-10-07  6:32         ` Yunseong Kim
  2025-10-07  7:04           ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Yunseong Kim @ 2025-10-07  6:32 UTC (permalink / raw)
  To: Byungchul Park, Hillf Danton
  Cc: David Hildenbrand, akpm, linux-mm, linux-kernel, kernel_team,
	Yeoreum Yun

Hi Hillf,

Here are the syzlang and kernel log, and you can also find the gist snippet
in the body of the first RFC mail:

 https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5

I am reviewing this issue again on the v6.17, The issue is always reproducible,
usually occurring within about 10k attempts with the 8 procs.

On 10/3/25 9:52 AM, Byungchul Park wrote:
> On Fri, Oct 03, 2025 at 09:48:28AM +0900, Byungchul Park wrote:
>> On Fri, Oct 03, 2025 at 06:02:10AM +0800, Hillf Danton wrote:
>>> On Thu, 2 Oct 2025 13:38:59 +0200 David Hildenbrand wrote:
>>>>
>>>> If it's a real issue, I wonder if a trylock on the writeback path could
>>>> be an option.
>>>>
>>> Given Thanks to Yunseong for reporting the issue, testing, and confirming if
>>> this patch can resolve the issue, could you share your reproducer with
>>> reviewers Byungchul?
>>
>> Sure.  Yunseong told me it's 100% reproducable.  Yunseong, can you help
>> him reproduce the issue, please?
> 
> +to ysk@kzalloc.com
> 
>>
>> 	Byungchul
> 

Thank you!

Yunseong


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-07  6:32         ` Yunseong Kim
@ 2025-10-07  7:04           ` David Hildenbrand
  2025-10-07  7:53             ` Yeoreum Yun
  2025-10-13  4:36             ` Byungchul Park
  0 siblings, 2 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-10-07  7:04 UTC (permalink / raw)
  To: Yunseong Kim, Byungchul Park, Hillf Danton
  Cc: akpm, linux-mm, linux-kernel, kernel_team, Yeoreum Yun

On 07.10.25 08:32, Yunseong Kim wrote:
> Hi Hillf,
> 
> Here are the syzlang and kernel log, and you can also find the gist snippet
> in the body of the first RFC mail:
> 
>   https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
> 
> I am reviewing this issue again on the v6.17, The issue is always reproducible,
> usually occurring within about 10k attempts with the 8 procs.

I can see a DEPT splat and I wonder what happens if DEPT is disabled.

Will the machine actually deadlock or is this just DEPT complaining (and 
probably getting something wrong)?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-07  7:04           ` David Hildenbrand
@ 2025-10-07  7:53             ` Yeoreum Yun
  2025-10-13  4:36             ` Byungchul Park
  1 sibling, 0 replies; 16+ messages in thread
From: Yeoreum Yun @ 2025-10-07  7:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Yunseong Kim, Byungchul Park, Hillf Danton, akpm, linux-mm,
	linux-kernel, kernel_team

Hi David,

> On 07.10.25 08:32, Yunseong Kim wrote:
> > Hi Hillf,
> >
> > Here are the syzlang and kernel log, and you can also find the gist snippet
> > in the body of the first RFC mail:
> >
> >   https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
> >
> > I am reviewing this issue again on the v6.17, The issue is always reproducible,
> > usually occurring within about 10k attempts with the 8 procs.
>
> I can see a DEPT splat and I wonder what happens if DEPT is disabled.
>
> Will the machine actually deadlock or is this just DEPT complaining (and
> probably getting something wrong)?
>

As Pedro mention[0], I believe this DEPT splat is a false positive.
The folio targeted by __find_get_block_slow() belongs to bd_mapping,
which is not the same folio whose writeback flag gets cleared
in ext4_end_io_end().

Since DEPT currently does not distinguish regular-file data folios from
the corresponding block-device folios,
such false positives are a known issue, and we plan to fix it.

Also, when i see the log shared from Yunseong (in hung.log)
I can check the migration is stuck while waiting buffer_head lock:
...
[ 3123.713542][   T89] INFO: task syz.4.2628:42733 blocked for more than 143 seconds.
[ 3123.713550][   T89]       Not tainted 6.15.11-00046-g2c223fa7bd9a-dirty #13
[ 3123.713557][   T89] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3123.713562][   T89] task:syz.4.2628      state:D stack:0     pid:42733 tgid:42732 ppid:41804  task_flags:0x400040 flags:0x00000009
[ 3123.713577][   T89] Call trace:
[ 3123.713582][   T89]  __switch_to+0x19c/0x2c0 (T)
[ 3123.713598][   T89]  __schedule+0x514/0x1208
[ 3123.713614][   T89]  schedule+0x40/0x164
[ 3123.713629][   T89]  io_schedule+0x3c/0x5c
[ 3123.713644][   T89]  bit_wait_io+0x14/0x70
[ 3123.713662][   T89]  __wait_on_bit_lock+0xa0/0x120
[ 3123.713678][   T89]  out_of_line_wait_on_bit_lock+0x8c/0xc0
[ 3123.713695][   T89]  __lock_buffer+0x74/0xb8
[ 3123.713720][   T89]  __buffer_migrate_folio+0x190/0x504
[ 3123.713747][   T89]  buffer_migrate_folio_norefs+0x30/0x3c
[ 3123.713764][   T89]  move_to_new_folio+0xe4/0x528
[ 3123.713779][   T89]  migrate_pages_batch+0xee0/0x1788
[ 3123.713795][   T89]  migrate_pages+0x15c4/0x1840
[ 3123.713810][   T89]  compact_zone+0x9c8/0x1d20
[ 3123.713822][   T89]  compact_node+0xd4/0x27c
[ 3123.713832][   T89]  sysctl_compaction_handler+0x104/0x194
[ 3123.713843][   T89]  proc_sys_call_handler+0x25c/0x3f8
[ 3123.713865][   T89]  proc_sys_write+0x20/0x2c
[ 3123.713878][   T89]  do_iter_readv_writev+0x350/0x448
[ 3123.713897][   T89]  vfs_writev+0x1ac/0x44c
[ 3123.713913][   T89]  do_pwritev+0x100/0x15c
[ 3123.713929][   T89]  __arm64_sys_pwritev2+0x6c/0xcc
[ 3123.713945][   T89]  invoke_syscall.constprop.0+0x64/0x18c
[ 3123.713961][   T89]  el0_svc_common.constprop.0+0x80/0x198
[ 3123.713978][   T89]  do_el0_svc+0x28/0x3c
[ 3123.713993][   T89]  el0_svc+0x50/0x220
[ 3123.714004][   T89]  el0t_64_sync_handler+0x10c/0x140
[ 3123.714017][   T89]  el0t_64_sync+0x1b8/0x1bc
...

which is different from description "stuck on writeback".

Unfortunately, I couldn't analyse more with the log he shared
since it was truncated.

@Yunseong, Could you make a reproduce without DEPT and share
full log for futher analysis?

Thanks.

[0] https://lore.kernel.org/all/dglxbwe2i5ubofefdxwo5jvyhdfjov37z5jzc5guedhe4dl6ia@pmkjkec3isb4/

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-07  7:04           ` David Hildenbrand
  2025-10-07  7:53             ` Yeoreum Yun
@ 2025-10-13  4:36             ` Byungchul Park
  2025-10-13  8:08               ` David Hildenbrand
  1 sibling, 1 reply; 16+ messages in thread
From: Byungchul Park @ 2025-10-13  4:36 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Yunseong Kim, Hillf Danton, akpm, linux-mm, linux-kernel,
	kernel_team, Yeoreum Yun

On Tue, Oct 07, 2025 at 09:04:59AM +0200, David Hildenbrand wrote:
> On 07.10.25 08:32, Yunseong Kim wrote:
> > Hi Hillf,
> > 
> > Here are the syzlang and kernel log, and you can also find the gist snippet
> > in the body of the first RFC mail:
> > 
> >   https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
> > 
> > I am reviewing this issue again on the v6.17, The issue is always reproducible,
> > usually occurring within about 10k attempts with the 8 procs.
> 
> I can see a DEPT splat and I wonder what happens if DEPT is disabled.
> 
> Will the machine actually deadlock or is this just DEPT complaining (and

Of course, it was an actual deadlock, not just DEPT splat.

However, even though this patch resolved the acutal hang issue, it
looked mismatched between the watchdog hang report and the DEPT report.
We are now re-checking it using the reproducer.

	Byungchul

> probably getting something wrong)?
> 
> --
> Cheers
> 
> David / dhildenb


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-13  4:36             ` Byungchul Park
@ 2025-10-13  8:08               ` David Hildenbrand
  0 siblings, 0 replies; 16+ messages in thread
From: David Hildenbrand @ 2025-10-13  8:08 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Yunseong Kim, Hillf Danton, akpm, linux-mm, linux-kernel,
	kernel_team, Yeoreum Yun

On 13.10.25 06:36, Byungchul Park wrote:
> On Tue, Oct 07, 2025 at 09:04:59AM +0200, David Hildenbrand wrote:
>> On 07.10.25 08:32, Yunseong Kim wrote:
>>> Hi Hillf,
>>>
>>> Here are the syzlang and kernel log, and you can also find the gist snippet
>>> in the body of the first RFC mail:
>>>
>>>    https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
>>>
>>> I am reviewing this issue again on the v6.17, The issue is always reproducible,
>>> usually occurring within about 10k attempts with the 8 procs.
>>
>> I can see a DEPT splat and I wonder what happens if DEPT is disabled.
>>
>> Will the machine actually deadlock or is this just DEPT complaining (and
> 
> Of course, it was an actual deadlock, not just DEPT splat.

Okay, so some deadlock was triggered, but not the one described by DEPT, 
thanks.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02 11:38 ` David Hildenbrand
  2025-10-02 22:02   ` Hillf Danton
@ 2025-10-03  1:02   ` Byungchul Park
  2025-10-03  2:31   ` Byungchul Park
  2025-10-03 14:04   ` Pedro Falcato
  3 siblings, 0 replies; 16+ messages in thread
From: Byungchul Park @ 2025-10-03  1:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, ziy, matthew.brost, joshua.hahnjy, rakie.kim, gourry,
	ying.huang, apopple, clameter, kravetz, linux-mm, linux-kernel,
	max.byungchul.park, kernel_team, harry.yoo, gwan-gyeong.mun,
	yeoreum.yun, syzkaller, ysk, Matthew Wilcox

On Thu, Oct 02, 2025 at 01:38:59PM +0200, David Hildenbrand wrote:
> > To simplify the scenario:
> > 
> 
> Just curious, where is the __folio_start_writeback() to complete the
> picture?
> 
> >     context X (wq worker)     context Y (process context)
> > 
> >                               migrate_pages_batch()
> >     ext4_end_io_end()           ...
> >       ...                       migrate_folio_unmap()
> >       ext4_get_inode_loc()        ...
> >         ...                       folio_lock() // hold the folio lock
> >         bdev_getblk()             ...
> >           ...                     folio_wait_writeback() // wait forever
> >           __find_get_block_slow()
> >             ...                           ...
> >             folio_lock() // wait forever
> >             folio_unlock()      migrate_folio_undo_src()
> >                                   ...
> >       ...                         folio_unlock() // never reachable
> >       ext4_finish_bio()
> >       ...
> >       folio_end_writeback() // never reachable
> > 
> 
> But aren't you implying that it should from this point on be disallowed
> to call folio_wait_writeback() with the folio lock held? That sounds ...
> a bit wrong.
> 
> Note that it is currently explicitly allowed: folio_wait_writeback()
> documents "If the folio is not locked, writeback may start again after
> writeback has finished.". So there is no way to prevent writeback from

Thank you for the information.  I was wrong then.

That means we shouldn't allow folio_lock() but try lock, while heading
for writeback end in terms of dependency.

	Byungchul

> immediately starting again.
> 
> In particular, wouldn't we have to fixup other callsites to make this
> consistent and then VM_WARN_ON_ONCE() assert that in folio_wait_writeback()?
> 
> Of course, as we've never seen this deadlock before in practice, I do
> wonder if something else prevents it?
> 
> If it's a real issue, I wonder if a trylock on the writeback path could
> be an option.
> 
> --
> Cheers
> 
> David / dhildenb
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02 11:38 ` David Hildenbrand
  2025-10-02 22:02   ` Hillf Danton
  2025-10-03  1:02   ` Byungchul Park
@ 2025-10-03  2:31   ` Byungchul Park
  2025-10-03 14:04   ` Pedro Falcato
  3 siblings, 0 replies; 16+ messages in thread
From: Byungchul Park @ 2025-10-03  2:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: akpm, ziy, matthew.brost, joshua.hahnjy, rakie.kim, gourry,
	ying.huang, apopple, clameter, kravetz, linux-mm, linux-kernel,
	max.byungchul.park, kernel_team, harry.yoo, gwan-gyeong.mun,
	yeoreum.yun, syzkaller, ysk, Matthew Wilcox

On Thu, Oct 02, 2025 at 01:38:59PM +0200, David Hildenbrand wrote:
> > To simplify the scenario:
> > 
> 
> Just curious, where is the __folio_start_writeback() to complete the
> picture?

ext4_end_io_end() was running as a wq worker after the io completion.

DEPT report can tell that the following scenario happened with
__folio_start_writeback() called far earlier, at least, before
folio_test_writeback() was seen as true, but unfortunately DEPT doesn't
capture the exact location of __folio_start_writeback().

	Byungchul

> >     context X (wq worker)     context Y (process context)
> > 
> >                               migrate_pages_batch()
> >     ext4_end_io_end()           ...
> >       ...                       migrate_folio_unmap()
> >       ext4_get_inode_loc()        ...
> >         ...                       folio_lock() // hold the folio lock
> >         bdev_getblk()             ...
> >           ...                     folio_wait_writeback() // wait forever
> >           __find_get_block_slow()
> >             ...                           ...
> >             folio_lock() // wait forever
> >             folio_unlock()      migrate_folio_undo_src()
> >                                   ...
> >       ...                         folio_unlock() // never reachable
> >       ext4_finish_bio()
> >       ...
> >       folio_end_writeback() // never reachable
> > 
> 
> But aren't you implying that it should from this point on be disallowed
> to call folio_wait_writeback() with the folio lock held? That sounds ...
> a bit wrong.
> 
> Note that it is currently explicitly allowed: folio_wait_writeback()
> documents "If the folio is not locked, writeback may start again after
> writeback has finished.". So there is no way to prevent writeback from
> immediately starting again.
> 
> In particular, wouldn't we have to fixup other callsites to make this
> consistent and then VM_WARN_ON_ONCE() assert that in folio_wait_writeback()?
> 
> Of course, as we've never seen this deadlock before in practice, I do
> wonder if something else prevents it?
> 
> If it's a real issue, I wonder if a trylock on the writeback path could
> be an option.
> 
> --
> Cheers
> 
> David / dhildenb
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02 11:38 ` David Hildenbrand
                     ` (2 preceding siblings ...)
  2025-10-03  2:31   ` Byungchul Park
@ 2025-10-03 14:04   ` Pedro Falcato
  3 siblings, 0 replies; 16+ messages in thread
From: Pedro Falcato @ 2025-10-03 14:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Byungchul Park, akpm, ziy, matthew.brost, joshua.hahnjy,
	rakie.kim, gourry, ying.huang, apopple, clameter, kravetz,
	linux-mm, linux-kernel, max.byungchul.park, kernel_team,
	harry.yoo, gwan-gyeong.mun, yeoreum.yun, syzkaller, ysk,
	Matthew Wilcox, linux-ext4

(Adding ext4 list to CC)

On Thu, Oct 02, 2025 at 01:38:59PM +0200, David Hildenbrand wrote:
> > To simplify the scenario:
> > 
> 
> Just curious, where is the __folio_start_writeback() to complete the
> picture?
> 
> >     context X (wq worker)	context Y (process context)
> > 
> > 				migrate_pages_batch()
> >     ext4_end_io_end()		  ...
> >       ...			  migrate_folio_unmap()
> >       ext4_get_inode_loc()	    ...
> >         ...			    folio_lock() // hold the folio lock
> >         bdev_getblk()		    ...
> >           ...			    folio_wait_writeback() // wait forever
> >           __find_get_block_slow()
> >             ...			    ...
> >             folio_lock() // wait forever
> >             folio_unlock()	  migrate_folio_undo_src()
> > 				    ...
> >       ...			    folio_unlock() // never reachable
> >       ext4_finish_bio()
> > 	...
> > 	folio_end_writeback() // never reachable
> > 
> 
> But aren't you implying that it should from this point on be disallowed to
> call folio_wait_writeback() with the folio lock held? That sounds ... a bit
> wrong.
> 
> Note that it is currently explicitly allowed: folio_wait_writeback()
> documents "If the folio is not locked, writeback may start again after
> writeback has finished.". So there is no way to prevent writeback from
> immediately starting again.
> 
> In particular, wouldn't we have to fixup other callsites to make this
> consistent and then VM_WARN_ON_ONCE() assert that in folio_wait_writeback()?
> 
> Of course, as we've never seen this deadlock before in practice, I do wonder
> if something else prevents it?

As far as I can tell, the folio under writeback and the folio that
__find_get_block() finds will _never_ be the same. ext4_end_io_end() is
called for pages in an inode's address_space, and bdev_getblk() is called for
metadata blocks in block cache. Having an actual deadlock here would mean
that the folio is somehow both in an inode's address_space, and in the block
cache, I think? Also, AFAIK there is no way a folio can be removed from the
page cache while under writeback.

In any case, I added linux-ext4 so they can tell me how right/wrong I am.

-- 
Pedro


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02  8:16 [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback() Byungchul Park
  2025-10-02 11:38 ` David Hildenbrand
@ 2025-10-02 11:42 ` Yeoreum Yun
  2025-10-02 11:49   ` Yeoreum Yun
  1 sibling, 1 reply; 16+ messages in thread
From: Yeoreum Yun @ 2025-10-02 11:42 UTC (permalink / raw)
  To: Byungchul Park
  Cc: akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	gourry, ying.huang, apopple, clameter, kravetz, linux-mm,
	linux-kernel, max.byungchul.park, kernel_team, harry.yoo,
	gwan-gyeong.mun, syzkaller, ysk

Hi Byoungchul,

> DEPT(Dependency Tracker) reported a deadlock:
>
>    ===================================================
>    DEPT: Circular dependency has been detected.
>    6.15.11-00046-g2c223fa7bd9a-dirty #13 Not tainted
>    ---------------------------------------------------
>    summary
>    ---------------------------------------------------
>    *** DEADLOCK ***
>
>    context A
>       [S] (unknown)(pg_locked_map:0)
>       [W] dept_page_wait_on_bit(pg_writeback_map:0)
>       [E] dept_page_clear_bit(pg_locked_map:0)
>
>    context B
>       [S] (unknown)(pg_writeback_map:0)
>       [W] dept_page_wait_on_bit(pg_locked_map:0)
>       [E] dept_page_clear_bit(pg_writeback_map:0)
>
>    [S]: start of the event context
>    [W]: the wait blocked
>    [E]: the event not reachable
>    ---------------------------------------------------
>    context A's detail
>    ---------------------------------------------------
>    context A
>       [S] (unknown)(pg_locked_map:0)
>       [W] dept_page_wait_on_bit(pg_writeback_map:0)
>       [E] dept_page_clear_bit(pg_locked_map:0)
>
>    [S] (unknown)(pg_locked_map:0):
>    (N/A)
>
>    [W] dept_page_wait_on_bit(pg_writeback_map:0):
>    [<ffff800080589c94>] folio_wait_bit+0x2c/0x38
>    stacktrace:
>          folio_wait_bit_common+0x824/0x8b8
>          folio_wait_bit+0x2c/0x38
>          folio_wait_writeback+0x5c/0xa4
>          migrate_pages_batch+0x5e4/0x1788
>          migrate_pages+0x15c4/0x1840
>          compact_zone+0x9c8/0x1d20
>          compact_node+0xd4/0x27c
>          sysctl_compaction_handler+0x104/0x194
>          proc_sys_call_handler+0x25c/0x3f8
>          proc_sys_write+0x20/0x2c
>          do_iter_readv_writev+0x350/0x448
>          vfs_writev+0x1ac/0x44c
>          do_pwritev+0x100/0x15c
>          __arm64_sys_pwritev2+0x6c/0xcc
>          invoke_syscall.constprop.0+0x64/0x18c
>          el0_svc_common.constprop.0+0x80/0x198
>
>    [E] dept_page_clear_bit(pg_locked_map:0):
>    [<ffff800080700914>] migrate_folio_undo_src+0x1b4/0x200
>    stacktrace:
>          migrate_folio_undo_src+0x1b4/0x200
>          migrate_pages_batch+0x1578/0x1788
>          migrate_pages+0x15c4/0x1840
>          compact_zone+0x9c8/0x1d20
>          compact_node+0xd4/0x27c
>          sysctl_compaction_handler+0x104/0x194
>          proc_sys_call_handler+0x25c/0x3f8
>          proc_sys_write+0x20/0x2c
>          do_iter_readv_writev+0x350/0x448
>          vfs_writev+0x1ac/0x44c
>          do_pwritev+0x100/0x15c
>          __arm64_sys_pwritev2+0x6c/0xcc
>          invoke_syscall.constprop.0+0x64/0x18c
>          el0_svc_common.constprop.0+0x80/0x198
>          do_el0_svc+0x28/0x3c
>          el0_svc+0x50/0x220
>    ---------------------------------------------------
>    context B's detail
>    ---------------------------------------------------
>    context B
>       [S] (unknown)(pg_writeback_map:0)
>       [W] dept_page_wait_on_bit(pg_locked_map:0)
>       [E] dept_page_clear_bit(pg_writeback_map:0)
>
>    [S] (unknown)(pg_writeback_map:0):
>    (N/A)
>
>    [W] dept_page_wait_on_bit(pg_locked_map:0):
>    [<ffff80008081e478>] bdev_getblk+0x58/0x120
>    stacktrace:
>          find_get_block_common+0x224/0xbc4
>          bdev_getblk+0x58/0x120
>          __ext4_get_inode_loc+0x194/0x98c
>          ext4_get_inode_loc+0x4c/0xcc
>          ext4_reserve_inode_write+0x74/0x158
>          __ext4_mark_inode_dirty+0xd4/0x4e0
>          __ext4_ext_dirty+0x118/0x164
>          ext4_ext_map_blocks+0x1578/0x2ca8
>          ext4_map_blocks+0x2a4/0xa60
>          ext4_convert_unwritten_extents+0x1b0/0x3c0
>          ext4_convert_unwritten_io_end_vec+0x90/0x1a0
>          ext4_end_io_end+0x58/0x194
>          ext4_end_io_rsv_work+0xc4/0x150
>          process_one_work+0x3b4/0xac0
>          worker_thread+0x2b0/0x53c
>          kthread+0x1a0/0x33c
>
>    [E] dept_page_clear_bit(pg_writeback_map:0):
>    [<ffff8000809dfc5c>] ext4_finish_bio+0x638/0x820
>    stacktrace:
>          folio_end_writeback+0x140/0x488
>          ext4_finish_bio+0x638/0x820
>          ext4_release_io_end+0x74/0x188
>          ext4_end_io_end+0xa0/0x194
>          ext4_end_io_rsv_work+0xc4/0x150
>          process_one_work+0x3b4/0xac0
>          worker_thread+0x2b0/0x53c
>          kthread+0x1a0/0x33c
>          ret_from_fork+0x10/0x20
>
> To simplify the scenario:
>
>    context X (wq worker)	context Y (process context)
>
> 				migrate_pages_batch()
>    ext4_end_io_end()		  ...
>      ...			  migrate_folio_unmap()
>      ext4_get_inode_loc()	    ...
>        ...			    folio_lock() // hold the folio lock
>        bdev_getblk()		    ...
>          ...			    folio_wait_writeback() // wait forever
>          __find_get_block_slow()
>            ...			    ...
>            folio_lock() // wait forever
>            folio_unlock()	  migrate_folio_undo_src()
> 				    ...
>      ...			    folio_unlock() // never reachable
>      ext4_finish_bio()
> 	...
> 	folio_end_writeback() // never reachable
>
> context X is waiting for the folio lock to be released by context Y,
> while context Y is waiting for the writeback to end in context X.
> Ultimately, two contexts are waiting for the event that will never
> happen, say, deadlock.
>
> *Only one* of the following two conditions should be allowed, or we
> cannot avoid this kind of deadlock:
>
>    1. while holding a folio lock (and heading for folio_unlock()),
>       waiting for a writeback to end,
>    2. while heading for the writeback end, waiting for the folio lock to
>       be released,
>
> Since allowing 2 and avoiding 1 sound more sensible than the other,
> remove the first condition by making sure folio_unlock() before
> folio_wait_writeback() in migrate_folio_unmap().
>
> Fixes: 49d2e9cc45443 ("[PATCH] Swap Migration V5: migrate_pages() function")
> Reported-by: Yunseong Kim <ysk@kzalloc.com>
> Signed-off-by: Byungchul Park <byungchul@sk.com>
> Tested-by: Yunseong Kim <ysk@kzalloc.com>
> ---
>
> Hi,
>
> Thanks to Yunseong for reporting the issue, testing, and confirming if
> this patch can resolve the issue.  We used the latest version of DEPT
> to detect the issue:
>
>    https://lore.kernel.org/all/20251002081247.51255-1-byungchul@sk.com/
>
> I mentioned in the commit message above like:
>
>    *Only one* of the following two conditions should be allowed, or we
>    cannot avoid this kind of deadlock:
>
>       1. while holding a folio lock (and heading for folio_unlock()),
>          waiting for a writeback to end,
>       2. while heading for the writeback end, waiting for the folio lock
>          to be released,
>
> Honestly, I'm not convinced which one we should choose between two, I
> chose 'allowing 2 and avoiding 1' to resolve this issue though.
>
> However, please let me know if I was wrong and we should go for
> 'allowing 1 and avoiding 2'.  If so, I should try a different approach,
> for example, to fix by preventing folio_lock() or using folio_try_lock()
> while heading for writeback end in ext4_end_io_end() or something.
>
> To Yunseong,
>
> The link you shared for a system hang is:
>
>    https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
>
> I think an important stacktrace for this issue, this is, waiting for
> PG_writeback, was missed in the log.
>
> 	Byungchul
>
> ---
>  mm/migrate.c | 57 +++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 41 insertions(+), 16 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 9e5ef39ce73a..60b0b054f27a 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1215,6 +1215,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>
>  	dst->private = NULL;
>
> +retry_wait_writeback:
> +	/*
> +	 * Only in the case of a full synchronous migration is it
> +	 * necessary to wait for PageWriteback.  In the async case, the
> +	 * retry loop is too short and in the sync-light case, the
> +	 * overhead of stalling is too much.  Plus, do not write-back if
> +	 * it's in the middle of direct compaction
> +	 */
> +	if (folio_test_writeback(src) && mode == MIGRATE_SYNC)
> +		folio_wait_writeback(src);
> +
>  	if (!folio_trylock(src)) {
>  		if (mode == MIGRATE_ASYNC)
>  			goto out;
> @@ -1245,27 +1256,41 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>
>  		folio_lock(src);
>  	}
> -	locked = true;
> -	if (folio_test_mlocked(src))
> -		old_page_state |= PAGE_WAS_MLOCKED;
>
>  	if (folio_test_writeback(src)) {
> -		/*
> -		 * Only in the case of a full synchronous migration is it
> -		 * necessary to wait for PageWriteback. In the async case,
> -		 * the retry loop is too short and in the sync-light case,
> -		 * the overhead of stalling is too much
> -		 */
> -		switch (mode) {
> -		case MIGRATE_SYNC:
> -			break;
> -		default:
> -			rc = -EBUSY;
> -			goto out;
> +		if (mode == MIGRATE_SYNC) {
> +			/*
> +			 * folio_unlock() is required before trying
> +			 * folio_wait_writeback().  Or it leads a
> +			 * deadlock like:
> +			 *
> +			 *   context x		context y
> +			 *   in XXX_io_end()	in migrate_folio_unmap()
> +			 *
> +			 *   ...		...
> +			 *   bdev_getblk();	folio_lock();
> +			 *
> +			 *     // wait forever	// wait forever
> +			 *     folio_lock();	folio_wait_writeback();
> +			 *
> +			 *     ...		...
> +			 *     folio_unlock();
> +			 *   ...		// never reachable
> +			 *			folio_unlock();
> +			 *   // never reachable
> +			 *   folio_end_writeback();
> +			 */
> +			folio_unlock(src);
> +			goto retry_wait_writeback;
>  		}
> -		folio_wait_writeback(src);
> +		rc = -EBUSY;
> +		goto out;
>  	}
>
> +	locked = true;
> +	if (folio_test_mlocked(src))
> +		old_page_state |= PAGE_WAS_MLOCKED;
> +
>  	/*
>  	 * By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
>  	 * we cannot notice that anon_vma is freed while we migrate a page.

Hmm, I still have concerns about this change.
(1) seems to imply that the use of WB_SYNC_ALL by
mpage_writebacks() is also incorrect. In addition,
this change could introduce another theoretical livelock
when the folio enters writeback frequently.

AFAIK, while a folio is under writeback,
its related buffers won’t be freed by migration, and
since try_free_buffer() checks the writeback state first,
taking folio_lock() shouldn’t be necessary while bdev_getblk().

Therefore, it seems sufficient to check whether
the folio is under writeback in __find_get_block_slow(), e.g.:

diff --git a/fs/buffer.c b/fs/buffer.c
index 6a8752f7bbed..804d33df6b0f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -194,6 +194,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
        if (IS_ERR(folio))
                goto out;

+       if (folio_test_writeback(folio))
+               return true;
+
        /*
         * Folio lock protects the buffers. Callers that cannot block
         * will fallback to serializing vs try_to_free_buffers() via

Am I missing something?

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02 11:42 ` Yeoreum Yun
@ 2025-10-02 11:49   ` Yeoreum Yun
  2025-10-03  2:08     ` Byungchul Park
  0 siblings, 1 reply; 16+ messages in thread
From: Yeoreum Yun @ 2025-10-02 11:49 UTC (permalink / raw)
  To: Byungchul Park
  Cc: akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	gourry, ying.huang, apopple, clameter, kravetz, linux-mm,
	linux-kernel, max.byungchul.park, kernel_team, harry.yoo,
	gwan-gyeong.mun, syzkaller, ysk

Sorry code was wrong.
> Hi Byoungchul,
>
> > DEPT(Dependency Tracker) reported a deadlock:
> >
> >    ===================================================
> >    DEPT: Circular dependency has been detected.
> >    6.15.11-00046-g2c223fa7bd9a-dirty #13 Not tainted
> >    ---------------------------------------------------
> >    summary
> >    ---------------------------------------------------
> >    *** DEADLOCK ***
> >
> >    context A
> >       [S] (unknown)(pg_locked_map:0)
> >       [W] dept_page_wait_on_bit(pg_writeback_map:0)
> >       [E] dept_page_clear_bit(pg_locked_map:0)
> >
> >    context B
> >       [S] (unknown)(pg_writeback_map:0)
> >       [W] dept_page_wait_on_bit(pg_locked_map:0)
> >       [E] dept_page_clear_bit(pg_writeback_map:0)
> >
> >    [S]: start of the event context
> >    [W]: the wait blocked
> >    [E]: the event not reachable
> >    ---------------------------------------------------
> >    context A's detail
> >    ---------------------------------------------------
> >    context A
> >       [S] (unknown)(pg_locked_map:0)
> >       [W] dept_page_wait_on_bit(pg_writeback_map:0)
> >       [E] dept_page_clear_bit(pg_locked_map:0)
> >
> >    [S] (unknown)(pg_locked_map:0):
> >    (N/A)
> >
> >    [W] dept_page_wait_on_bit(pg_writeback_map:0):
> >    [<ffff800080589c94>] folio_wait_bit+0x2c/0x38
> >    stacktrace:
> >          folio_wait_bit_common+0x824/0x8b8
> >          folio_wait_bit+0x2c/0x38
> >          folio_wait_writeback+0x5c/0xa4
> >          migrate_pages_batch+0x5e4/0x1788
> >          migrate_pages+0x15c4/0x1840
> >          compact_zone+0x9c8/0x1d20
> >          compact_node+0xd4/0x27c
> >          sysctl_compaction_handler+0x104/0x194
> >          proc_sys_call_handler+0x25c/0x3f8
> >          proc_sys_write+0x20/0x2c
> >          do_iter_readv_writev+0x350/0x448
> >          vfs_writev+0x1ac/0x44c
> >          do_pwritev+0x100/0x15c
> >          __arm64_sys_pwritev2+0x6c/0xcc
> >          invoke_syscall.constprop.0+0x64/0x18c
> >          el0_svc_common.constprop.0+0x80/0x198
> >
> >    [E] dept_page_clear_bit(pg_locked_map:0):
> >    [<ffff800080700914>] migrate_folio_undo_src+0x1b4/0x200
> >    stacktrace:
> >          migrate_folio_undo_src+0x1b4/0x200
> >          migrate_pages_batch+0x1578/0x1788
> >          migrate_pages+0x15c4/0x1840
> >          compact_zone+0x9c8/0x1d20
> >          compact_node+0xd4/0x27c
> >          sysctl_compaction_handler+0x104/0x194
> >          proc_sys_call_handler+0x25c/0x3f8
> >          proc_sys_write+0x20/0x2c
> >          do_iter_readv_writev+0x350/0x448
> >          vfs_writev+0x1ac/0x44c
> >          do_pwritev+0x100/0x15c
> >          __arm64_sys_pwritev2+0x6c/0xcc
> >          invoke_syscall.constprop.0+0x64/0x18c
> >          el0_svc_common.constprop.0+0x80/0x198
> >          do_el0_svc+0x28/0x3c
> >          el0_svc+0x50/0x220
> >    ---------------------------------------------------
> >    context B's detail
> >    ---------------------------------------------------
> >    context B
> >       [S] (unknown)(pg_writeback_map:0)
> >       [W] dept_page_wait_on_bit(pg_locked_map:0)
> >       [E] dept_page_clear_bit(pg_writeback_map:0)
> >
> >    [S] (unknown)(pg_writeback_map:0):
> >    (N/A)
> >
> >    [W] dept_page_wait_on_bit(pg_locked_map:0):
> >    [<ffff80008081e478>] bdev_getblk+0x58/0x120
> >    stacktrace:
> >          find_get_block_common+0x224/0xbc4
> >          bdev_getblk+0x58/0x120
> >          __ext4_get_inode_loc+0x194/0x98c
> >          ext4_get_inode_loc+0x4c/0xcc
> >          ext4_reserve_inode_write+0x74/0x158
> >          __ext4_mark_inode_dirty+0xd4/0x4e0
> >          __ext4_ext_dirty+0x118/0x164
> >          ext4_ext_map_blocks+0x1578/0x2ca8
> >          ext4_map_blocks+0x2a4/0xa60
> >          ext4_convert_unwritten_extents+0x1b0/0x3c0
> >          ext4_convert_unwritten_io_end_vec+0x90/0x1a0
> >          ext4_end_io_end+0x58/0x194
> >          ext4_end_io_rsv_work+0xc4/0x150
> >          process_one_work+0x3b4/0xac0
> >          worker_thread+0x2b0/0x53c
> >          kthread+0x1a0/0x33c
> >
> >    [E] dept_page_clear_bit(pg_writeback_map:0):
> >    [<ffff8000809dfc5c>] ext4_finish_bio+0x638/0x820
> >    stacktrace:
> >          folio_end_writeback+0x140/0x488
> >          ext4_finish_bio+0x638/0x820
> >          ext4_release_io_end+0x74/0x188
> >          ext4_end_io_end+0xa0/0x194
> >          ext4_end_io_rsv_work+0xc4/0x150
> >          process_one_work+0x3b4/0xac0
> >          worker_thread+0x2b0/0x53c
> >          kthread+0x1a0/0x33c
> >          ret_from_fork+0x10/0x20
> >
> > To simplify the scenario:
> >
> >    context X (wq worker)	context Y (process context)
> >
> > 				migrate_pages_batch()
> >    ext4_end_io_end()		  ...
> >      ...			  migrate_folio_unmap()
> >      ext4_get_inode_loc()	    ...
> >        ...			    folio_lock() // hold the folio lock
> >        bdev_getblk()		    ...
> >          ...			    folio_wait_writeback() // wait forever
> >          __find_get_block_slow()
> >            ...			    ...
> >            folio_lock() // wait forever
> >            folio_unlock()	  migrate_folio_undo_src()
> > 				    ...
> >      ...			    folio_unlock() // never reachable
> >      ext4_finish_bio()
> > 	...
> > 	folio_end_writeback() // never reachable
> >
> > context X is waiting for the folio lock to be released by context Y,
> > while context Y is waiting for the writeback to end in context X.
> > Ultimately, two contexts are waiting for the event that will never
> > happen, say, deadlock.
> >
> > *Only one* of the following two conditions should be allowed, or we
> > cannot avoid this kind of deadlock:
> >
> >    1. while holding a folio lock (and heading for folio_unlock()),
> >       waiting for a writeback to end,
> >    2. while heading for the writeback end, waiting for the folio lock to
> >       be released,
> >
> > Since allowing 2 and avoiding 1 sound more sensible than the other,
> > remove the first condition by making sure folio_unlock() before
> > folio_wait_writeback() in migrate_folio_unmap().
> >
> > Fixes: 49d2e9cc45443 ("[PATCH] Swap Migration V5: migrate_pages() function")
> > Reported-by: Yunseong Kim <ysk@kzalloc.com>
> > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > Tested-by: Yunseong Kim <ysk@kzalloc.com>
> > ---
> >
> > Hi,
> >
> > Thanks to Yunseong for reporting the issue, testing, and confirming if
> > this patch can resolve the issue.  We used the latest version of DEPT
> > to detect the issue:
> >
> >    https://lore.kernel.org/all/20251002081247.51255-1-byungchul@sk.com/
> >
> > I mentioned in the commit message above like:
> >
> >    *Only one* of the following two conditions should be allowed, or we
> >    cannot avoid this kind of deadlock:
> >
> >       1. while holding a folio lock (and heading for folio_unlock()),
> >          waiting for a writeback to end,
> >       2. while heading for the writeback end, waiting for the folio lock
> >          to be released,
> >
> > Honestly, I'm not convinced which one we should choose between two, I
> > chose 'allowing 2 and avoiding 1' to resolve this issue though.
> >
> > However, please let me know if I was wrong and we should go for
> > 'allowing 1 and avoiding 2'.  If so, I should try a different approach,
> > for example, to fix by preventing folio_lock() or using folio_try_lock()
> > while heading for writeback end in ext4_end_io_end() or something.
> >
> > To Yunseong,
> >
> > The link you shared for a system hang is:
> >
> >    https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
> >
> > I think an important stacktrace for this issue, this is, waiting for
> > PG_writeback, was missed in the log.
> >
> > 	Byungchul
> >
> > ---
> >  mm/migrate.c | 57 +++++++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 41 insertions(+), 16 deletions(-)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 9e5ef39ce73a..60b0b054f27a 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1215,6 +1215,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> >
> >  	dst->private = NULL;
> >
> > +retry_wait_writeback:
> > +	/*
> > +	 * Only in the case of a full synchronous migration is it
> > +	 * necessary to wait for PageWriteback.  In the async case, the
> > +	 * retry loop is too short and in the sync-light case, the
> > +	 * overhead of stalling is too much.  Plus, do not write-back if
> > +	 * it's in the middle of direct compaction
> > +	 */
> > +	if (folio_test_writeback(src) && mode == MIGRATE_SYNC)
> > +		folio_wait_writeback(src);
> > +
> >  	if (!folio_trylock(src)) {
> >  		if (mode == MIGRATE_ASYNC)
> >  			goto out;
> > @@ -1245,27 +1256,41 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> >
> >  		folio_lock(src);
> >  	}
> > -	locked = true;
> > -	if (folio_test_mlocked(src))
> > -		old_page_state |= PAGE_WAS_MLOCKED;
> >
> >  	if (folio_test_writeback(src)) {
> > -		/*
> > -		 * Only in the case of a full synchronous migration is it
> > -		 * necessary to wait for PageWriteback. In the async case,
> > -		 * the retry loop is too short and in the sync-light case,
> > -		 * the overhead of stalling is too much
> > -		 */
> > -		switch (mode) {
> > -		case MIGRATE_SYNC:
> > -			break;
> > -		default:
> > -			rc = -EBUSY;
> > -			goto out;
> > +		if (mode == MIGRATE_SYNC) {
> > +			/*
> > +			 * folio_unlock() is required before trying
> > +			 * folio_wait_writeback().  Or it leads a
> > +			 * deadlock like:
> > +			 *
> > +			 *   context x		context y
> > +			 *   in XXX_io_end()	in migrate_folio_unmap()
> > +			 *
> > +			 *   ...		...
> > +			 *   bdev_getblk();	folio_lock();
> > +			 *
> > +			 *     // wait forever	// wait forever
> > +			 *     folio_lock();	folio_wait_writeback();
> > +			 *
> > +			 *     ...		...
> > +			 *     folio_unlock();
> > +			 *   ...		// never reachable
> > +			 *			folio_unlock();
> > +			 *   // never reachable
> > +			 *   folio_end_writeback();
> > +			 */
> > +			folio_unlock(src);
> > +			goto retry_wait_writeback;
> >  		}
> > -		folio_wait_writeback(src);
> > +		rc = -EBUSY;
> > +		goto out;
> >  	}
> >
> > +	locked = true;
> > +	if (folio_test_mlocked(src))
> > +		old_page_state |= PAGE_WAS_MLOCKED;
> > +
> >  	/*
> >  	 * By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
> >  	 * we cannot notice that anon_vma is freed while we migrate a page.
>
> Hmm, I still have concerns about this change.
> (1) seems to imply that the use of WB_SYNC_ALL by
> mpage_writebacks() is also incorrect. In addition,
> this change could introduce another theoretical livelock
> when the folio enters writeback frequently.
>
> AFAIK, while a folio is under writeback,
> its related buffers won’t be freed by migration, and
> since try_free_buffer() checks the writeback state first,
> taking folio_lock() shouldn’t be necessary while bdev_getblk().
>
> Therefore, it seems sufficient to check whether
> the folio is under writeback in __find_get_block_slow(), e.g.:
>
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 6a8752f7bbed..804d33df6b0f 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -194,6 +194,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
>         if (IS_ERR(folio))
>                 goto out;
>
> +       if (folio_test_writeback(folio))
> +               return true;
> +
>         /*
>          * Folio lock protects the buffers. Callers that cannot block
>          * will fallback to serializing vs try_to_free_buffers() via
>
> Am I missing something?

Sorry, the code was wrong. the suggestion is:

diff --git a/fs/buffer.c b/fs/buffer.c
index 6a8752f7bbed..804d33df6b0f 100644
 --- a/fs/buffer.c
 +++ b/fs/buffer.c
 @@ -194,6 +194,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
         if (IS_ERR(folio))
                 goto out;

 +       if (folio_test_writeback(folio))
 +               atomic = true;
 +
         /*
          * Folio lock protects the buffers. Callers that cannot block
          * will fallback to serializing vs try_to_free_buffers() via

--
Sincerely,
Yeoreum Yun


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback()
  2025-10-02 11:49   ` Yeoreum Yun
@ 2025-10-03  2:08     ` Byungchul Park
  0 siblings, 0 replies; 16+ messages in thread
From: Byungchul Park @ 2025-10-03  2:08 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: akpm, david, ziy, matthew.brost, joshua.hahnjy, rakie.kim,
	gourry, ying.huang, apopple, clameter, kravetz, linux-mm,
	linux-kernel, max.byungchul.park, kernel_team, harry.yoo,
	gwan-gyeong.mun, syzkaller, ysk

On Thu, Oct 02, 2025 at 12:49:23PM +0100, Yeoreum Yun wrote:
> Sorry code was wrong.
> > Hi Byoungchul,
> >
> > > DEPT(Dependency Tracker) reported a deadlock:
> > >
> > >    ===================================================
> > >    DEPT: Circular dependency has been detected.
> > >    6.15.11-00046-g2c223fa7bd9a-dirty #13 Not tainted
> > >    ---------------------------------------------------
> > >    summary
> > >    ---------------------------------------------------
> > >    *** DEADLOCK ***
> > >
> > >    context A
> > >       [S] (unknown)(pg_locked_map:0)
> > >       [W] dept_page_wait_on_bit(pg_writeback_map:0)
> > >       [E] dept_page_clear_bit(pg_locked_map:0)
> > >
> > >    context B
> > >       [S] (unknown)(pg_writeback_map:0)
> > >       [W] dept_page_wait_on_bit(pg_locked_map:0)
> > >       [E] dept_page_clear_bit(pg_writeback_map:0)
> > >
> > >    [S]: start of the event context
> > >    [W]: the wait blocked
> > >    [E]: the event not reachable
> > >    ---------------------------------------------------
> > >    context A's detail
> > >    ---------------------------------------------------
> > >    context A
> > >       [S] (unknown)(pg_locked_map:0)
> > >       [W] dept_page_wait_on_bit(pg_writeback_map:0)
> > >       [E] dept_page_clear_bit(pg_locked_map:0)
> > >
> > >    [S] (unknown)(pg_locked_map:0):
> > >    (N/A)
> > >
> > >    [W] dept_page_wait_on_bit(pg_writeback_map:0):
> > >    [<ffff800080589c94>] folio_wait_bit+0x2c/0x38
> > >    stacktrace:
> > >          folio_wait_bit_common+0x824/0x8b8
> > >          folio_wait_bit+0x2c/0x38
> > >          folio_wait_writeback+0x5c/0xa4
> > >          migrate_pages_batch+0x5e4/0x1788
> > >          migrate_pages+0x15c4/0x1840
> > >          compact_zone+0x9c8/0x1d20
> > >          compact_node+0xd4/0x27c
> > >          sysctl_compaction_handler+0x104/0x194
> > >          proc_sys_call_handler+0x25c/0x3f8
> > >          proc_sys_write+0x20/0x2c
> > >          do_iter_readv_writev+0x350/0x448
> > >          vfs_writev+0x1ac/0x44c
> > >          do_pwritev+0x100/0x15c
> > >          __arm64_sys_pwritev2+0x6c/0xcc
> > >          invoke_syscall.constprop.0+0x64/0x18c
> > >          el0_svc_common.constprop.0+0x80/0x198
> > >
> > >    [E] dept_page_clear_bit(pg_locked_map:0):
> > >    [<ffff800080700914>] migrate_folio_undo_src+0x1b4/0x200
> > >    stacktrace:
> > >          migrate_folio_undo_src+0x1b4/0x200
> > >          migrate_pages_batch+0x1578/0x1788
> > >          migrate_pages+0x15c4/0x1840
> > >          compact_zone+0x9c8/0x1d20
> > >          compact_node+0xd4/0x27c
> > >          sysctl_compaction_handler+0x104/0x194
> > >          proc_sys_call_handler+0x25c/0x3f8
> > >          proc_sys_write+0x20/0x2c
> > >          do_iter_readv_writev+0x350/0x448
> > >          vfs_writev+0x1ac/0x44c
> > >          do_pwritev+0x100/0x15c
> > >          __arm64_sys_pwritev2+0x6c/0xcc
> > >          invoke_syscall.constprop.0+0x64/0x18c
> > >          el0_svc_common.constprop.0+0x80/0x198
> > >          do_el0_svc+0x28/0x3c
> > >          el0_svc+0x50/0x220
> > >    ---------------------------------------------------
> > >    context B's detail
> > >    ---------------------------------------------------
> > >    context B
> > >       [S] (unknown)(pg_writeback_map:0)
> > >       [W] dept_page_wait_on_bit(pg_locked_map:0)
> > >       [E] dept_page_clear_bit(pg_writeback_map:0)
> > >
> > >    [S] (unknown)(pg_writeback_map:0):
> > >    (N/A)
> > >
> > >    [W] dept_page_wait_on_bit(pg_locked_map:0):
> > >    [<ffff80008081e478>] bdev_getblk+0x58/0x120
> > >    stacktrace:
> > >          find_get_block_common+0x224/0xbc4
> > >          bdev_getblk+0x58/0x120
> > >          __ext4_get_inode_loc+0x194/0x98c
> > >          ext4_get_inode_loc+0x4c/0xcc
> > >          ext4_reserve_inode_write+0x74/0x158
> > >          __ext4_mark_inode_dirty+0xd4/0x4e0
> > >          __ext4_ext_dirty+0x118/0x164
> > >          ext4_ext_map_blocks+0x1578/0x2ca8
> > >          ext4_map_blocks+0x2a4/0xa60
> > >          ext4_convert_unwritten_extents+0x1b0/0x3c0
> > >          ext4_convert_unwritten_io_end_vec+0x90/0x1a0
> > >          ext4_end_io_end+0x58/0x194
> > >          ext4_end_io_rsv_work+0xc4/0x150
> > >          process_one_work+0x3b4/0xac0
> > >          worker_thread+0x2b0/0x53c
> > >          kthread+0x1a0/0x33c
> > >
> > >    [E] dept_page_clear_bit(pg_writeback_map:0):
> > >    [<ffff8000809dfc5c>] ext4_finish_bio+0x638/0x820
> > >    stacktrace:
> > >          folio_end_writeback+0x140/0x488
> > >          ext4_finish_bio+0x638/0x820
> > >          ext4_release_io_end+0x74/0x188
> > >          ext4_end_io_end+0xa0/0x194
> > >          ext4_end_io_rsv_work+0xc4/0x150
> > >          process_one_work+0x3b4/0xac0
> > >          worker_thread+0x2b0/0x53c
> > >          kthread+0x1a0/0x33c
> > >          ret_from_fork+0x10/0x20
> > >
> > > To simplify the scenario:
> > >
> > >    context X (wq worker)    context Y (process context)
> > >
> > >                             migrate_pages_batch()
> > >    ext4_end_io_end()                  ...
> > >      ...                      migrate_folio_unmap()
> > >      ext4_get_inode_loc()       ...
> > >        ...                      folio_lock() // hold the folio lock
> > >        bdev_getblk()                    ...
> > >          ...                            folio_wait_writeback() // wait forever
> > >          __find_get_block_slow()
> > >            ...                          ...
> > >            folio_lock() // wait forever
> > >            folio_unlock()     migrate_folio_undo_src()
> > >                                 ...
> > >      ...                        folio_unlock() // never reachable
> > >      ext4_finish_bio()
> > >     ...
> > >     folio_end_writeback() // never reachable
> > >
> > > context X is waiting for the folio lock to be released by context Y,
> > > while context Y is waiting for the writeback to end in context X.
> > > Ultimately, two contexts are waiting for the event that will never
> > > happen, say, deadlock.
> > >
> > > *Only one* of the following two conditions should be allowed, or we
> > > cannot avoid this kind of deadlock:
> > >
> > >    1. while holding a folio lock (and heading for folio_unlock()),
> > >       waiting for a writeback to end,
> > >    2. while heading for the writeback end, waiting for the folio lock to
> > >       be released,
> > >
> > > Since allowing 2 and avoiding 1 sound more sensible than the other,
> > > remove the first condition by making sure folio_unlock() before
> > > folio_wait_writeback() in migrate_folio_unmap().
> > >
> > > Fixes: 49d2e9cc45443 ("[PATCH] Swap Migration V5: migrate_pages() function")
> > > Reported-by: Yunseong Kim <ysk@kzalloc.com>
> > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > Tested-by: Yunseong Kim <ysk@kzalloc.com>
> > > ---
> > >
> > > Hi,
> > >
> > > Thanks to Yunseong for reporting the issue, testing, and confirming if
> > > this patch can resolve the issue.  We used the latest version of DEPT
> > > to detect the issue:
> > >
> > >    https://lore.kernel.org/all/20251002081247.51255-1-byungchul@sk.com/
> > >
> > > I mentioned in the commit message above like:
> > >
> > >    *Only one* of the following two conditions should be allowed, or we
> > >    cannot avoid this kind of deadlock:
> > >
> > >       1. while holding a folio lock (and heading for folio_unlock()),
> > >          waiting for a writeback to end,
> > >       2. while heading for the writeback end, waiting for the folio lock
> > >          to be released,
> > >
> > > Honestly, I'm not convinced which one we should choose between two, I
> > > chose 'allowing 2 and avoiding 1' to resolve this issue though.
> > >
> > > However, please let me know if I was wrong and we should go for
> > > 'allowing 1 and avoiding 2'.  If so, I should try a different approach,
> > > for example, to fix by preventing folio_lock() or using folio_try_lock()
> > > while heading for writeback end in ext4_end_io_end() or something.
> > >
> > > To Yunseong,
> > >
> > > The link you shared for a system hang is:
> > >
> > >    https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
> > >
> > > I think an important stacktrace for this issue, this is, waiting for
> > > PG_writeback, was missed in the log.
> > >
> > >     Byungchul
> > >
> > > ---
> > >  mm/migrate.c | 57 +++++++++++++++++++++++++++++++++++++---------------
> > >  1 file changed, 41 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > index 9e5ef39ce73a..60b0b054f27a 100644
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -1215,6 +1215,17 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> > >
> > >     dst->private = NULL;
> > >
> > > +retry_wait_writeback:
> > > +   /*
> > > +    * Only in the case of a full synchronous migration is it
> > > +    * necessary to wait for PageWriteback.  In the async case, the
> > > +    * retry loop is too short and in the sync-light case, the
> > > +    * overhead of stalling is too much.  Plus, do not write-back if
> > > +    * it's in the middle of direct compaction
> > > +    */
> > > +   if (folio_test_writeback(src) && mode == MIGRATE_SYNC)
> > > +           folio_wait_writeback(src);
> > > +
> > >     if (!folio_trylock(src)) {
> > >             if (mode == MIGRATE_ASYNC)
> > >                     goto out;
> > > @@ -1245,27 +1256,41 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
> > >
> > >             folio_lock(src);
> > >     }
> > > -   locked = true;
> > > -   if (folio_test_mlocked(src))
> > > -           old_page_state |= PAGE_WAS_MLOCKED;
> > >
> > >     if (folio_test_writeback(src)) {
> > > -           /*
> > > -            * Only in the case of a full synchronous migration is it
> > > -            * necessary to wait for PageWriteback. In the async case,
> > > -            * the retry loop is too short and in the sync-light case,
> > > -            * the overhead of stalling is too much
> > > -            */
> > > -           switch (mode) {
> > > -           case MIGRATE_SYNC:
> > > -                   break;
> > > -           default:
> > > -                   rc = -EBUSY;
> > > -                   goto out;
> > > +           if (mode == MIGRATE_SYNC) {
> > > +                   /*
> > > +                    * folio_unlock() is required before trying
> > > +                    * folio_wait_writeback().  Or it leads a
> > > +                    * deadlock like:
> > > +                    *
> > > +                    *   context x          context y
> > > +                    *   in XXX_io_end()    in migrate_folio_unmap()
> > > +                    *
> > > +                    *   ...                ...
> > > +                    *   bdev_getblk();     folio_lock();
> > > +                    *
> > > +                    *     // wait forever  // wait forever
> > > +                    *     folio_lock();    folio_wait_writeback();
> > > +                    *
> > > +                    *     ...              ...
> > > +                    *     folio_unlock();
> > > +                    *   ...                // never reachable
> > > +                    *                      folio_unlock();
> > > +                    *   // never reachable
> > > +                    *   folio_end_writeback();
> > > +                    */
> > > +                   folio_unlock(src);
> > > +                   goto retry_wait_writeback;
> > >             }
> > > -           folio_wait_writeback(src);
> > > +           rc = -EBUSY;
> > > +           goto out;
> > >     }
> > >
> > > +   locked = true;
> > > +   if (folio_test_mlocked(src))
> > > +           old_page_state |= PAGE_WAS_MLOCKED;
> > > +
> > >     /*
> > >      * By try_to_migrate(), src->mapcount goes down to 0 here. In this case,
> > >      * we cannot notice that anon_vma is freed while we migrate a page.
> >
> > Hmm, I still have concerns about this change.
> > (1) seems to imply that the use of WB_SYNC_ALL by
> > mpage_writebacks() is also incorrect. In addition,
> > this change could introduce another theoretical livelock
> > when the folio enters writeback frequently.
> >
> > AFAIK, while a folio is under writeback,
> > its related buffers won’t be freed by migration, and
> > since try_free_buffer() checks the writeback state first,
> > taking folio_lock() shouldn’t be necessary while bdev_getblk().
> >
> > Therefore, it seems sufficient to check whether
> > the folio is under writeback in __find_get_block_slow(), e.g.:
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 6a8752f7bbed..804d33df6b0f 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -194,6 +194,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
> >         if (IS_ERR(folio))
> >                 goto out;
> >
> > +       if (folio_test_writeback(folio))
> > +               return true;
> > +
> >         /*
> >          * Folio lock protects the buffers. Callers that cannot block
> >          * will fallback to serializing vs try_to_free_buffers() via
> >
> > Am I missing something?
> 
> Sorry, the code was wrong. the suggestion is:
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 6a8752f7bbed..804d33df6b0f 100644
>  --- a/fs/buffer.c
>  +++ b/fs/buffer.c
>  @@ -194,6 +194,9 @@ __find_get_block_slow(struct block_device *bdev, sector_t block, bool atomic)
>          if (IS_ERR(folio))
>                  goto out;
> 
>  +       if (folio_test_writeback(folio))
>  +               atomic = true;
>  +

Looks much better to me.

Or make sure to atomic(= true) is passed if folio_test_writeback(folio).

	Byungchul

>          /*
>           * Folio lock protects the buffers. Callers that cannot block
>           * will fallback to serializing vs try_to_free_buffers() via
> 
> --
> Sincerely,
> Yeoreum Yun


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-10-13  8:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-02  8:16 [RFC] mm/migrate: make sure folio_unlock() before folio_wait_writeback() Byungchul Park
2025-10-02 11:38 ` David Hildenbrand
2025-10-02 22:02   ` Hillf Danton
2025-10-03  0:48     ` Byungchul Park
2025-10-03  0:52       ` Byungchul Park
2025-10-07  6:32         ` Yunseong Kim
2025-10-07  7:04           ` David Hildenbrand
2025-10-07  7:53             ` Yeoreum Yun
2025-10-13  4:36             ` Byungchul Park
2025-10-13  8:08               ` David Hildenbrand
2025-10-03  1:02   ` Byungchul Park
2025-10-03  2:31   ` Byungchul Park
2025-10-03 14:04   ` Pedro Falcato
2025-10-02 11:42 ` Yeoreum Yun
2025-10-02 11:49   ` Yeoreum Yun
2025-10-03  2:08     ` Byungchul Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox