[PATCH 0/2] mm/swap: fix missing locks in swap_reclaim

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] mm/swap: fix missing locks in swap_reclaim_work()
@ 2026-03-06 11:50 Hui Zhu
  2026-03-06 11:50 ` [PATCH 1/2] " Hui Zhu
  2026-03-06 11:50 ` [PATCH 2/2] mm/swap: add lockdep for si->global_cluster_lock in swap_cluster_alloc_table() Hui Zhu
  0 siblings, 2 replies; 5+ messages in thread
From: Hui Zhu @ 2026-03-06 11:50 UTC (permalink / raw)
  To: Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, linux-mm, linux-kernel
  Cc: Hui Zhu

From: Hui Zhu <zhuhui@kylinos.cn>

swap_cluster_alloc_table() assumes that the caller holds the following
locks:
ci->lock
percpu_swap_cluster.lock
si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices)

There are five call paths leading to swap_cluster_alloc_table():
swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table

swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list
->isolate_lock_cluster->swap_cluster_alloc_table

swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table

swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters
->isolate_lock_cluster->swap_cluster_alloc_table

swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table

Other paths correctly acquire the necessary locks before calling
swap_cluster_alloc_table().
But the swap_reclaim_work() path fails to acquire
percpu_swap_cluster.lock and, for non-SWP_SOLIDSTATE devices,
si->global_cluster_lock.

The first patch ensures swap_reclaim_work() correctly acquires
percpu_swap_cluster.lock and si->global_cluster_lock before calling
swap_reclaim_full_clusters(). Without these locks, the preconditions
for swap_cluster_alloc_table() are not met.

The second patch adds lockdep assertions in swap_cluster_alloc_table()
to help catch such locking inconsistencies early.

I tried to reproduce this naturally, but the swap_reclaim_work path
rarely hits the !cluster_table_is_alloced(found) condition. To verify
the fix, I used GDB to force found->table to NULL, which triggered
the following warning due to the missing locks:
[  554.388797] ------------[ cut here ]------------
[  554.388932] WARNING: mm/swapfile.c:480 at isolate_lock_cluster+0x199/0x470, CPU#6: kworker/6:2/656
[  554.388947] Modules linked in:
[  554.388990] CPU: 6 UID: 0 PID: 656 Comm: kworker/6:2 Not tainted 7.0.0-rc2+ #28 PREEMPT(full)
[  554.388995] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  554.389013] Workqueue: events swap_reclaim_work
[  554.389020] RIP: 0010:isolate_lock_cluster+0x199/0x470
[  554.389025] Code: 02 0f 0b 8b 35 dc 69 57 02 85 f6 74 b0 65 48 8b 05 f4 20 af 02 be ff ff ff ff 48 8d b8 60 98 31 84 e8 2b 0e f5 00 85 c0 75 93 <0f> 0b eb 8f 48 89 df e8 0b 78 f6 00 41 f6 45 10 10 0f 84 0b 01 00
[  554.389028] RSP: 0018:ffffc9000183bd68 EFLAGS: 00010246
[  554.389033] RAX: 0000000000000000 RBX: ffff88810a410060 RCX: 0000000000000000
[  554.389037] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  554.389046] RBP: ffffc9000183bd88 R08: 0000000000000000 R09: 0000000000000000
[  554.389048] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88811108e878
[  554.389049] R13: ffff88811108e800 R14: ffff88811108ea90 R15: ffff888101e41e40
[  554.389051] FS:  0000000000000000(0000) GS:ffff8881b7812000(0000) knlGS:0000000000000000
[  554.389053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  554.389054] CR2: 000000c000637f80 CR3: 000000010cfd5006 CR4: 0000000000770ef0
[  554.389065] PKRU: 55555554
[  554.389067] Call Trace:
[  554.389068]  <TASK>
[  554.389080]  swap_reclaim_full_clusters+0x6b/0x350
[  554.389083]  ? __pfx_swap_reclaim_work+0x10/0x10
[  554.389090]  ? swap_reclaim_full_clusters+0x52/0x350
[  554.389094]  swap_reclaim_work+0x1a/0x30
[  554.389097]  process_one_work+0x223/0x770
[  554.389106]  worker_thread+0x1c6/0x3b0
[  554.389110]  ? __pfx_worker_thread+0x10/0x10
[  554.389113]  kthread+0xfe/0x140
[  554.389117]  ? __pfx_kthread+0x10/0x10
[  554.389121]  ret_from_fork+0x3d4/0x480
[  554.389125]  ? __pfx_kthread+0x10/0x10
[  554.389129]  ret_from_fork_asm+0x1a/0x30
[  554.389141]  </TASK>
[  554.389142] irq event stamp: 9775
[  554.389144] hardirqs last  enabled at (9781): [<ffffffff8148ca99>] __up_console_sem+0x79/0xa0
[  554.389150] hardirqs last disabled at (9786): [<ffffffff8148ca7e>] __up_console_sem+0x5e/0xa0
[  554.389153] softirqs last  enabled at (8676): [<ffffffff813b3aff>] __irq_exit_rcu+0x13f/0x160
[  554.389156] softirqs last disabled at (8615): [<ffffffff813b3aff>] __irq_exit_rcu+0x13f/0x160
[  554.389159] ---[ end trace 0000000000000000 ]---
[  554.477105] ------------[ cut here ]------------
[  554.477253] WARNING: mm/swapfile.c:480 at isolate_lock_cluster+0x199/0x470, CPU#6: kworker/6:2/656
[  554.477264] Modules linked in:
[  554.477277] CPU: 6 UID: 0 PID: 656 Comm: kworker/6:2 Tainted: G        W           7.0.0-rc2+ #28 PREEMPT(full)
[  554.477284] Tainted: [W]=WARN
[  554.477288] Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  554.477291] Workqueue: events swap_reclaim_work
[  554.477294] RIP: 0010:isolate_lock_cluster+0x199/0x470
[  554.477296] Code: 02 0f 0b 8b 35 dc 69 57 02 85 f6 74 b0 65 48 8b 05 f4 20 af 02 be ff ff ff ff 48 8d b8 60 98 31 84 e8 2b 0e f5 00 85 c0 75 93 <0f> 0b eb 8f 48 89 df e8 0b 78 f6 00 41 f6 45 10 10 0f

Hui Zhu (2):
  mm/swap: fix missing locks in swap_reclaim_work()
  mm/swap: add lockdep for si->global_cluster_lock in
    swap_cluster_alloc_table()

 mm/swapfile.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] mm/swap: fix missing locks in swap_reclaim_work()
  2026-03-06 11:50 [PATCH 0/2] mm/swap: fix missing locks in swap_reclaim_work() Hui Zhu
@ 2026-03-06 11:50 ` Hui Zhu
  2026-03-06 13:52   ` YoungJun Park
  2026-03-06 11:50 ` [PATCH 2/2] mm/swap: add lockdep for si->global_cluster_lock in swap_cluster_alloc_table() Hui Zhu
  1 sibling, 1 reply; 5+ messages in thread
From: Hui Zhu @ 2026-03-06 11:50 UTC (permalink / raw)
  To: Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, linux-mm, linux-kernel
  Cc: Hui Zhu

From: Hui Zhu <zhuhui@kylinos.cn>

swap_cluster_alloc_table() assumes that the caller holds the following
locks:
ci->lock
percpu_swap_cluster.lock
si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices)

There are five call paths leading to swap_cluster_alloc_table():
swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table

swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list
->isolate_lock_cluster->swap_cluster_alloc_table

swap_alloc_hibernation_slot->cluster_alloc_swap_entry
->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table

swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters
->isolate_lock_cluster->swap_cluster_alloc_table

swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster
->swap_cluster_alloc_table

Other paths correctly acquire the necessary locks before calling
swap_cluster_alloc_table().
But the swap_reclaim_work() path fails to acquire
percpu_swap_cluster.lock and, for non-SWP_SOLIDSTATE devices,
si->global_cluster_lock.

This patch fixes the issue by ensuring swap_reclaim_work() properly
acquires the required locks before proceeding with the swap cluster
allocation.

Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
---
 mm/swapfile.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 94af29d1de88..2e8717f84ba3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1031,7 +1031,15 @@ static void swap_reclaim_work(struct work_struct *work)
 
 	si = container_of(work, struct swap_info_struct, reclaim_work);
 
+	local_lock(&percpu_swap_cluster.lock);
+	if (!(si->flags & SWP_SOLIDSTATE))
+		spin_lock(&si->global_cluster_lock);
+
 	swap_reclaim_full_clusters(si, true);
+
+	if (!(si->flags & SWP_SOLIDSTATE))
+		spin_unlock(&si->global_cluster_lock);
+	local_unlock(&percpu_swap_cluster.lock);
 }
 
 /*
-- 
2.43.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/2] mm/swap: add lockdep for si->global_cluster_lock in swap_cluster_alloc_table()
  2026-03-06 11:50 [PATCH 0/2] mm/swap: fix missing locks in swap_reclaim_work() Hui Zhu
  2026-03-06 11:50 ` [PATCH 1/2] " Hui Zhu
@ 2026-03-06 11:50 ` Hui Zhu
  2026-03-06 14:08   ` YoungJun Park
  1 sibling, 1 reply; 5+ messages in thread
From: Hui Zhu @ 2026-03-06 11:50 UTC (permalink / raw)
  To: Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, linux-mm, linux-kernel
  Cc: Hui Zhu

From: Hui Zhu <zhuhui@kylinos.cn>

Add a lockdep_assert_held(&si->global_cluster_lock) in
swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.

The function already requires the caller to hold both ci->lock
and percpu_swap_cluster.lock.
And it also necessitates si->global_cluster_lock when the device is not
SWP_SOLIDSTATE.
Adding this assertion ensures locking consistency and helps catch
potential synchronization issues during development.

Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
---
 mm/swapfile.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2e8717f84ba3..1400a1585033 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -477,6 +477,8 @@ swap_cluster_alloc_table(struct swap_info_struct *si,
 	 * Swap allocator uses percpu clusters and holds the local lock.
 	 */
 	lockdep_assert_held(&ci->lock);
+	if (!(si->flags & SWP_SOLIDSTATE))
+		lockdep_assert_held(&si->global_cluster_lock);
 	lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock);
 
 	/* The cluster must be free and was just isolated from the free list. */
-- 
2.43.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] mm/swap: fix missing locks in swap_reclaim_work()
  2026-03-06 11:50 ` [PATCH 1/2] " Hui Zhu
@ 2026-03-06 13:52   ` YoungJun Park
  0 siblings, 0 replies; 5+ messages in thread
From: YoungJun Park @ 2026-03-06 13:52 UTC (permalink / raw)
  To: Hui Zhu
  Cc: Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, linux-mm, linux-kernel, Hui Zhu

On Fri, Mar 06, 2026 at 07:50:36PM +0800, Hui Zhu wrote:
> From: Hui Zhu <zhuhui@kylinos.cn>

Hello Hui Zhu! :)
> 
> swap_cluster_alloc_table() assumes that the caller holds the following
> locks:
> ci->lock
> percpu_swap_cluster.lock
> si->global_cluster_lock (required for non-SWP_SOLIDSTATE devices)
> 
> There are five call paths leading to swap_cluster_alloc_table():
> swap_alloc_hibernation_slot->cluster_alloc_swap_entry
> ->alloc_swap_scan_list->isolate_lock_cluster->swap_cluster_alloc_table
> 
> swap_alloc_slow->cluster_alloc_swap_entry->alloc_swap_scan_list
> ->isolate_lock_cluster->swap_cluster_alloc_table
> 
> swap_alloc_hibernation_slot->cluster_alloc_swap_entry
> ->swap_reclaim_full_clusters->isolate_lock_cluster
> ->swap_cluster_alloc_table
> 
> swap_alloc_slow->cluster_alloc_swap_entry->swap_reclaim_full_clusters
> ->isolate_lock_cluster->swap_cluster_alloc_table
> 
> swap_reclaim_work->swap_reclaim_full_clusters->isolate_lock_cluster
> ->swap_cluster_alloc_table

Can isolate_lock_cluster() actually invoke swap_cluster_alloc_table()
on a full cluster? My understanding is that full clusters already have
a swap_table allocated, and swap_cluster_alloc_table() is only called
for free clusters that need a new allocation. If isolate_lock_cluster()
checks !cluster_table_is_alloced() before calling swap_cluster_alloc_table(),
wouldn't the full-cluster reclaim path skip that allocation entirely?

> Other paths correctly acquire the necessary locks before calling
> swap_cluster_alloc_table().
> But the swap_reclaim_work() path fails to acquire
> percpu_swap_cluster.lock and, for non-SWP_SOLIDSTATE devices,
> si->global_cluster_lock.

If my assumtion is right, table is not alloced so synchronization is not need.
Also, percpu_swap_cluster.lock and si->global_cluster_lock appear to protect
the percpu cluster cache and global cluster state, not the allocation
table itself as I think.

Best Regards
Youngjun Park

> This patch fixes the issue by ensuring swap_reclaim_work() properly
> acquires the required locks before proceeding with the swap cluster
> allocation.
> 
> Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
> ---
>  mm/swapfile.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 94af29d1de88..2e8717f84ba3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1031,7 +1031,15 @@ static void swap_reclaim_work(struct work_struct *work)
>  
>  	si = container_of(work, struct swap_info_struct, reclaim_work);
>  
> +	local_lock(&percpu_swap_cluster.lock);
> +	if (!(si->flags & SWP_SOLIDSTATE))
> +		spin_lock(&si->global_cluster_lock);
> +
>  	swap_reclaim_full_clusters(si, true);
> +
> +	if (!(si->flags & SWP_SOLIDSTATE))
> +		spin_unlock(&si->global_cluster_lock);
> +	local_unlock(&percpu_swap_cluster.lock);
>  }
>  
>  /*
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] mm/swap: add lockdep for si->global_cluster_lock in swap_cluster_alloc_table()
  2026-03-06 11:50 ` [PATCH 2/2] mm/swap: add lockdep for si->global_cluster_lock in swap_cluster_alloc_table() Hui Zhu
@ 2026-03-06 14:08   ` YoungJun Park
  0 siblings, 0 replies; 5+ messages in thread
From: YoungJun Park @ 2026-03-06 14:08 UTC (permalink / raw)
  To: Hui Zhu
  Cc: Andrew Morton, Chris Li, Kairui Song, Kemeng Shi, Nhat Pham,
	Baoquan He, Barry Song, linux-mm, linux-kernel, Hui Zhu

On Fri, Mar 06, 2026 at 07:50:37PM +0800, Hui Zhu wrote:
> From: Hui Zhu <zhuhui@kylinos.cn>
> 
> Add a lockdep_assert_held(&si->global_cluster_lock) in
> swap_cluster_alloc_table() for non-SWP_SOLIDSTATE devices.
> 
> The function already requires the caller to hold both ci->lock
> and percpu_swap_cluster.lock.
> And it also necessitates si->global_cluster_lock when the device is not
> SWP_SOLIDSTATE.
> Adding this assertion ensures locking consistency and helps catch
> potential synchronization issues during development.
> 
> Signed-off-by: Hui Zhu <zhuhui@kylinos.cn>
> ---
>  mm/swapfile.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 2e8717f84ba3..1400a1585033 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -477,6 +477,8 @@ swap_cluster_alloc_table(struct swap_info_struct *si,
>  	 * Swap allocator uses percpu clusters and holds the local lock.
>  	 */
>  	lockdep_assert_held(&ci->lock);
> +	if (!(si->flags & SWP_SOLIDSTATE))
> +		lockdep_assert_held(&si->global_cluster_lock);
>  	lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock);

The addition looks fine to me.

If others agree, one minor suggestion: it might be slightly cleaner to
order the lockdep_assert_held() calls to match the actual lock
acquisition order:

        lockdep_assert_held(&this_cpu_ptr(&percpu_swap_cluster)->lock);
        if (!(si->flags & SWP_SOLIDSTATE))
                lockdep_assert_held(&si->global_cluster_lock);
        lockdep_assert_held(&ci->lock);

No strong opinion on this, just a thought. :)

Best regards,
Youngjun Park
>  	/* The cluster must be free and was just isolated from the free list. */
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-06 14:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-06 11:50 [PATCH 0/2] mm/swap: fix missing locks in swap_reclaim_work() Hui Zhu
2026-03-06 11:50 ` [PATCH 1/2] " Hui Zhu
2026-03-06 13:52   ` YoungJun Park
2026-03-06 11:50 ` [PATCH 2/2] mm/swap: add lockdep for si->global_cluster_lock in swap_cluster_alloc_table() Hui Zhu
2026-03-06 14:08   ` YoungJun Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox