* [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c
@ 2025-02-22 16:08 Kemeng Shi
2025-02-22 16:08 ` [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters() Kemeng Shi
` (6 more replies)
0 siblings, 7 replies; 14+ messages in thread
From: Kemeng Shi @ 2025-02-22 16:08 UTC (permalink / raw)
To: akpm, kasong; +Cc: linux-mm, linux-kernel
This series includes several random fixes and cleanups to the mm/swapfile.c
code. The issues were discovered during code review and can only manifest
under extremely rare and race-condition scenarios. Further details about
each fix can be found in the respective patches. Thanks for your
attention!
Kemeng Shi (6):
mm: swap: avoid losting cluster in swap_reclaim_full_clusters()
mm: swap: use correct step in loop to wait all clusters in
wait_for_allocation()
mm, swap: avoid BUG_ON in relocate_cluster()
mm, swap: remove setting SWAP_MAP_BAD for discard cluster
mm, swap: correct comment in swap_usage_sub()
mm: swap: remove stale comment of swap_reclaim_full_clusters()
mm/swapfile.c | 22 +++++++---------------
1 file changed, 7 insertions(+), 15 deletions(-)
--
2.30.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters()
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
@ 2025-02-22 16:08 ` Kemeng Shi
2025-02-22 17:19 ` Kairui Song
2025-02-22 16:08 ` [PATCH 2/6] mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() Kemeng Shi
` (5 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Kemeng Shi @ 2025-02-22 16:08 UTC (permalink / raw)
To: akpm, kasong; +Cc: linux-mm, linux-kernel
If no swap cache is reclaimed, cluster taken off from full_clusters list
will not be put in any list and may not be reused. Do relocate_cluster
for such cluster to fix the issue.
Fixes: 3b644773eefda ("mm, swap: reduce contention on device lock")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/swapfile.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 34baefb000b5..e5f58ab86329 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -861,6 +861,10 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force)
offset++;
}
+ /* in case no swap cache is reclaimed */
+ if (ci->flags == CLUSTER_FLAG_NONE)
+ relocate_cluster(si, ci);
+
unlock_cluster(ci);
if (to_scan <= 0)
break;
--
2.30.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/6] mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
2025-02-22 16:08 ` [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters() Kemeng Shi
@ 2025-02-22 16:08 ` Kemeng Shi
2025-02-22 17:32 ` Kairui Song
2025-02-22 16:08 ` [PATCH 3/6] mm, swap: avoid BUG_ON in relocate_cluster() Kemeng Shi
` (4 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Kemeng Shi @ 2025-02-22 16:08 UTC (permalink / raw)
To: akpm, kasong; +Cc: linux-mm, linux-kernel
Use correct step in loop to wait all clusters in wait_for_allocation().
If we miss some cluster in wait_for_allocation(), use after free may
occurs as following:
shmem_writepage swapoff
folio_alloc_swap
get_swap_pages
scan_swap_map_slots
cluster_alloc_swap_entry
alloc_swap_scan_cluster
cluster_alloc_range
/* SWP_WRITEOK is valid */
if (!(si->flags & SWP_WRITEOK))
...
del_from_avail_list(p, true);
...
/* miss the cluster in shmem_writepage */
wait_for_allocation()
...
try_to_unuse()
memset(si->swap_map + start, usage, nr_pages);
swap_range_alloc(si, nr_pages);
ci->count += nr_pages;
/* return a valid entry */
...
exit_swap_address_space(p->type);
...
...
add_to_swap_cache
/* dereference swap_address_space(entry) which is NULL */
xas_lock_irq(&xas);
Fixes: e47bd46eab97e ("mm, swap: hold a reference during scan and cleanup flag usage")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/swapfile.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index e5f58ab86329..425126c0a07d 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2627,7 +2627,6 @@ static void wait_for_allocation(struct swap_info_struct *si)
for (offset = 0; offset < end; offset += SWAPFILE_CLUSTER) {
ci = lock_cluster(si, offset);
unlock_cluster(ci);
- offset += SWAPFILE_CLUSTER;
}
}
--
2.30.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/6] mm, swap: avoid BUG_ON in relocate_cluster()
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
2025-02-22 16:08 ` [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters() Kemeng Shi
2025-02-22 16:08 ` [PATCH 2/6] mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() Kemeng Shi
@ 2025-02-22 16:08 ` Kemeng Shi
2025-02-22 18:43 ` Kairui Song
2025-02-22 16:08 ` [PATCH 4/6] mm, swap: remove setting SWAP_MAP_BAD for discard cluster Kemeng Shi
` (3 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Kemeng Shi @ 2025-02-22 16:08 UTC (permalink / raw)
To: akpm, kasong; +Cc: linux-mm, linux-kernel
If allocation is racy with swapoff, we may call free_cluster for cluster
already in free list and trigger bug on as following:
Allocation Swapoff
cluster_alloc_swap_entry
...
/* may get a free cluster with offset */
offset = xxx;
if (offset)
ci = lock_cluster(si, offset);
...
del_from_avail_list(p, true);
si->flags &= ~SWP_WRITEOK;
alloc_swap_scan_cluster(si, ci, ...)
...
/* failed to alloc entry from free entry */
if (!cluster_alloc_range(...))
break;
...
/* add back a free cluster */
relocate_cluster(si, ci);
if (!ci->count)
free_cluster(si, ci);
VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE);
Despite bug_on could be triggered, call free_cluster() for free cluster
only move cluster to tail of list and should be fine.
Check cluster is not free before calling free_cluster() in
relocate_cluster() to avoid bug_on.
Fixes: 3b644773eefda ("mm, swap: reduce contention on device lock")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/swapfile.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 425126c0a07d..fc45b9d56639 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -653,7 +653,8 @@ static void relocate_cluster(struct swap_info_struct *si,
return;
if (!ci->count) {
- free_cluster(si, ci);
+ if (ci->flags != CLUSTER_FLAG_FREE)
+ free_cluster(si, ci);
} else if (ci->count != SWAPFILE_CLUSTER) {
if (ci->flags != CLUSTER_FLAG_FRAG)
move_cluster(si, ci, &si->frag_clusters[ci->order],
--
2.30.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/6] mm, swap: remove setting SWAP_MAP_BAD for discard cluster
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
` (2 preceding siblings ...)
2025-02-22 16:08 ` [PATCH 3/6] mm, swap: avoid BUG_ON in relocate_cluster() Kemeng Shi
@ 2025-02-22 16:08 ` Kemeng Shi
2025-02-24 1:24 ` Kairui Song
2025-02-22 16:08 ` [PATCH 5/6] mm, swap: correct comment in swap_usage_sub() Kemeng Shi
` (2 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Kemeng Shi @ 2025-02-22 16:08 UTC (permalink / raw)
To: akpm, kasong; +Cc: linux-mm, linux-kernel
Before alloc from a cluster, we will aqcuire cluster's lock and make
sure it is usable by cluster_is_usable(), so there is no need to
set SWAP_MAP_BAD for cluster to be discarded.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/swapfile.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index fc45b9d56639..c640f77a464a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -479,15 +479,6 @@ static void move_cluster(struct swap_info_struct *si,
static void swap_cluster_schedule_discard(struct swap_info_struct *si,
struct swap_cluster_info *ci)
{
- unsigned int idx = cluster_index(si, ci);
- /*
- * If scan_swap_map_slots() can't find a free cluster, it will check
- * si->swap_map directly. To make sure the discarding cluster isn't
- * taken by scan_swap_map_slots(), mark the swap entries bad (occupied).
- * It will be cleared after discard
- */
- memset(si->swap_map + idx * SWAPFILE_CLUSTER,
- SWAP_MAP_BAD, SWAPFILE_CLUSTER);
VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE);
move_cluster(si, ci, &si->discard_clusters, CLUSTER_FLAG_DISCARD);
schedule_work(&si->discard_work);
@@ -571,8 +562,6 @@ static bool swap_do_scheduled_discard(struct swap_info_struct *si)
* return the cluster to allocation list.
*/
ci->flags = CLUSTER_FLAG_NONE;
- memset(si->swap_map + idx * SWAPFILE_CLUSTER,
- 0, SWAPFILE_CLUSTER);
__free_cluster(si, ci);
spin_unlock(&ci->lock);
ret = true;
--
2.30.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 5/6] mm, swap: correct comment in swap_usage_sub()
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
` (3 preceding siblings ...)
2025-02-22 16:08 ` [PATCH 4/6] mm, swap: remove setting SWAP_MAP_BAD for discard cluster Kemeng Shi
@ 2025-02-22 16:08 ` Kemeng Shi
2025-02-22 16:08 ` [PATCH 6/6] mm: swap: remove stale comment of swap_reclaim_full_clusters() Kemeng Shi
2025-02-23 1:44 ` [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Andrew Morton
6 siblings, 0 replies; 14+ messages in thread
From: Kemeng Shi @ 2025-02-22 16:08 UTC (permalink / raw)
To: akpm, kasong; +Cc: linux-mm, linux-kernel
We will add si back to plist in swap_usage_sub(), just correct the wrong
comment which says we will remove si from plist in swap_usage_sub().
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index c640f77a464a..6ff57ed23e27 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1098,7 +1098,7 @@ static void swap_usage_sub(struct swap_info_struct *si, unsigned int nr_entries)
/*
* If device is not full, and SWAP_USAGE_OFFLIST_BIT is set,
- * remove it from the plist.
+ * add it to the plist.
*/
if (unlikely(val & SWAP_USAGE_OFFLIST_BIT))
add_to_avail_list(si, false);
--
2.30.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 6/6] mm: swap: remove stale comment of swap_reclaim_full_clusters()
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
` (4 preceding siblings ...)
2025-02-22 16:08 ` [PATCH 5/6] mm, swap: correct comment in swap_usage_sub() Kemeng Shi
@ 2025-02-22 16:08 ` Kemeng Shi
2025-02-23 1:44 ` [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Andrew Morton
6 siblings, 0 replies; 14+ messages in thread
From: Kemeng Shi @ 2025-02-22 16:08 UTC (permalink / raw)
To: akpm, kasong; +Cc: linux-mm, linux-kernel
swap_reclaim_full_clusters() has no return value now, just remove the
stale comment which says swap_reclaim_full_clusters() wil return a bool
value.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/swapfile.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6ff57ed23e27..dc0dc5a26a88 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -820,7 +820,6 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si,
return found;
}
-/* Return true if reclaimed a whole cluster */
static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force)
{
long to_scan = 1;
--
2.30.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters()
2025-02-22 16:08 ` [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters() Kemeng Shi
@ 2025-02-22 17:19 ` Kairui Song
2025-02-24 1:17 ` Kemeng Shi
0 siblings, 1 reply; 14+ messages in thread
From: Kairui Song @ 2025-02-22 17:19 UTC (permalink / raw)
To: Kemeng Shi, akpm; +Cc: linux-mm, linux-kernel
On Sat, Feb 22, 2025 at 3:12 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>
> If no swap cache is reclaimed, cluster taken off from full_clusters list
> will not be put in any list and may not be reused. Do relocate_cluster
> for such cluster to fix the issue.
>
> Fixes: 3b644773eefda ("mm, swap: reduce contention on device lock")
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/swapfile.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 34baefb000b5..e5f58ab86329 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -861,6 +861,10 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force)
> offset++;
> }
>
> + /* in case no swap cache is reclaimed */
> + if (ci->flags == CLUSTER_FLAG_NONE)
> + relocate_cluster(si, ci);
> +
> unlock_cluster(ci);
> if (to_scan <= 0)
> break;
> --
> 2.30.0
Thanks. A little nick pick, "losting" is not a word, I think you mean "leaking".
And BTW maybe it's better to describe the result of this leak in a bit
more details, "cluster leaking from lists" and "will not be reused"
looked a bit scary at a glance to me. But realizing it's full
clusters, they will be moved back to a list if any slots on them are
freed, so the worst result is inefficiently reclaiming of HAS_CACHE
slots, we didn't really lose these clusters.
We do need to fix it though. So other than the commit summary and
message nitpick:
Reviewed-by: Kairui Song <kasong@tencent.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/6] mm: swap: use correct step in loop to wait all clusters in wait_for_allocation()
2025-02-22 16:08 ` [PATCH 2/6] mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() Kemeng Shi
@ 2025-02-22 17:32 ` Kairui Song
0 siblings, 0 replies; 14+ messages in thread
From: Kairui Song @ 2025-02-22 17:32 UTC (permalink / raw)
To: Kemeng Shi; +Cc: akpm, linux-mm, linux-kernel
On Sat, Feb 22, 2025 at 3:12 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>
> Use correct step in loop to wait all clusters in wait_for_allocation().
> If we miss some cluster in wait_for_allocation(), use after free may
> occurs as following:
> shmem_writepage swapoff
> folio_alloc_swap
> get_swap_pages
> scan_swap_map_slots
> cluster_alloc_swap_entry
> alloc_swap_scan_cluster
> cluster_alloc_range
> /* SWP_WRITEOK is valid */
> if (!(si->flags & SWP_WRITEOK))
>
> ...
> del_from_avail_list(p, true);
> ...
> /* miss the cluster in shmem_writepage */
> wait_for_allocation()
> ...
> try_to_unuse()
>
> memset(si->swap_map + start, usage, nr_pages);
> swap_range_alloc(si, nr_pages);
> ci->count += nr_pages;
> /* return a valid entry */
>
> ...
> exit_swap_address_space(p->type);
> ...
>
> ...
> add_to_swap_cache
> /* dereference swap_address_space(entry) which is NULL */
> xas_lock_irq(&xas);
>
> Fixes: e47bd46eab97e ("mm, swap: hold a reference during scan and cleanup flag usage")
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/swapfile.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index e5f58ab86329..425126c0a07d 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -2627,7 +2627,6 @@ static void wait_for_allocation(struct swap_info_struct *si)
> for (offset = 0; offset < end; offset += SWAPFILE_CLUSTER) {
> ci = lock_cluster(si, offset);
> unlock_cluster(ci);
> - offset += SWAPFILE_CLUSTER;
> }
> }
>
Thanks, good catch.
Reviewed-by: Kairui Song <kasong@tencent.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/6] mm, swap: avoid BUG_ON in relocate_cluster()
2025-02-22 16:08 ` [PATCH 3/6] mm, swap: avoid BUG_ON in relocate_cluster() Kemeng Shi
@ 2025-02-22 18:43 ` Kairui Song
0 siblings, 0 replies; 14+ messages in thread
From: Kairui Song @ 2025-02-22 18:43 UTC (permalink / raw)
To: Kemeng Shi; +Cc: akpm, linux-mm, linux-kernel
On Sat, Feb 22, 2025 at 3:12 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>
> If allocation is racy with swapoff, we may call free_cluster for cluster
> already in free list and trigger bug on as following:
Maybe capitalize this "bug on" to BUG_ON to be consistent with the title.
> Allocation Swapoff
> cluster_alloc_swap_entry
> ...
> /* may get a free cluster with offset */
> offset = xxx;
> if (offset)
> ci = lock_cluster(si, offset);
>
> ...
> del_from_avail_list(p, true);
> si->flags &= ~SWP_WRITEOK;
>
> alloc_swap_scan_cluster(si, ci, ...)
> ...
> /* failed to alloc entry from free entry */
> if (!cluster_alloc_range(...))
> break;
> ...
> /* add back a free cluster */
> relocate_cluster(si, ci);
> if (!ci->count)
> free_cluster(si, ci);
> VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE);
>
> Despite bug_on could be triggered, call free_cluster() for free cluster
> only move cluster to tail of list and should be fine.
>
> Check cluster is not free before calling free_cluster() in
> relocate_cluster() to avoid bug_on.
Same here.
>
> Fixes: 3b644773eefda ("mm, swap: reduce contention on device lock")
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/swapfile.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 425126c0a07d..fc45b9d56639 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -653,7 +653,8 @@ static void relocate_cluster(struct swap_info_struct *si,
> return;
>
> if (!ci->count) {
> - free_cluster(si, ci);
> + if (ci->flags != CLUSTER_FLAG_FREE)
> + free_cluster(si, ci);
> } else if (ci->count != SWAPFILE_CLUSTER) {
> if (ci->flags != CLUSTER_FLAG_FRAG)
> move_cluster(si, ci, &si->frag_clusters[ci->order],
> --
> 2.30.0
Thanks, other than minor commit message issue:
Reviewed-by: Kairui Song <kasong@tencent.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
` (5 preceding siblings ...)
2025-02-22 16:08 ` [PATCH 6/6] mm: swap: remove stale comment of swap_reclaim_full_clusters() Kemeng Shi
@ 2025-02-23 1:44 ` Andrew Morton
2025-02-24 1:27 ` Kemeng Shi
6 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2025-02-23 1:44 UTC (permalink / raw)
To: Kemeng Shi; +Cc: kasong, linux-mm, linux-kernel
On Sun, 23 Feb 2025 00:08:44 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> This series includes several random fixes and cleanups to the mm/swapfile.c
> code. The issues were discovered during code review and can only manifest
> under extremely rare and race-condition scenarios. Further details about
> each fix can be found in the respective patches. Thanks for your
> attention!
Thanks. I queued the first three patches in mm-hotfixes-stable, as
they address post-6.13 issues which should be addressed during this -rc
cycle.
I queued the other three patches for 6.15-rcX.
I corrected the Fixes: hash in [2/6].
I made some changelog alterations, along the lines that Kairui
mentioned. Please check the results and send along any alterations
which you feel are desirable.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters()
2025-02-22 17:19 ` Kairui Song
@ 2025-02-24 1:17 ` Kemeng Shi
0 siblings, 0 replies; 14+ messages in thread
From: Kemeng Shi @ 2025-02-24 1:17 UTC (permalink / raw)
To: Kairui Song, akpm; +Cc: linux-mm, linux-kernel
on 2/23/2025 1:19 AM, Kairui Song wrote:
> On Sat, Feb 22, 2025 at 3:12 PM Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>>
>> If no swap cache is reclaimed, cluster taken off from full_clusters list
>> will not be put in any list and may not be reused. Do relocate_cluster
>> for such cluster to fix the issue.
>>
>> Fixes: 3b644773eefda ("mm, swap: reduce contention on device lock")
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> mm/swapfile.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 34baefb000b5..e5f58ab86329 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -861,6 +861,10 @@ static void swap_reclaim_full_clusters(struct swap_info_struct *si, bool force)
>> offset++;
>> }
>>
>> + /* in case no swap cache is reclaimed */
>> + if (ci->flags == CLUSTER_FLAG_NONE)
>> + relocate_cluster(si, ci);
>> +
>> unlock_cluster(ci);
>> if (to_scan <= 0)
>> break;
>> --
>> 2.30.0
>
> Thanks. A little nick pick, "losting" is not a word, I think you mean "leaking".
>
> And BTW maybe it's better to describe the result of this leak in a bit
> more details, "cluster leaking from lists" and "will not be reused"
> looked a bit scary at a glance to me. But realizing it's full
> clusters, they will be moved back to a list if any slots on them are
> freed, so the worst result is inefficiently reclaiming of HAS_CACHE
> slots, we didn't really lose these clusters.
>
> We do need to fix it though. So other than the commit summary and
> message nitpick:
Thanks for feedback, I will improve commit summary and message in next
verstion.
>
> Reviewed-by: Kairui Song <kasong@tencent.com>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/6] mm, swap: remove setting SWAP_MAP_BAD for discard cluster
2025-02-22 16:08 ` [PATCH 4/6] mm, swap: remove setting SWAP_MAP_BAD for discard cluster Kemeng Shi
@ 2025-02-24 1:24 ` Kairui Song
0 siblings, 0 replies; 14+ messages in thread
From: Kairui Song @ 2025-02-24 1:24 UTC (permalink / raw)
To: Kemeng Shi; +Cc: Andrew Morton, linux-mm, LKML
[-- Attachment #1: Type: text/plain, Size: 2037 bytes --]
On Sat, Feb 22, 2025 at 3:12 PM Kemeng Shi <shikemeng@huaweicloud.com>
wrote:
>
> Before alloc from a cluster, we will aqcuire cluster's lock and make
> sure it is usable by cluster_is_usable(), so there is no need to
> set SWAP_MAP_BAD for cluster to be discarded.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/swapfile.c | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index fc45b9d56639..c640f77a464a 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -479,15 +479,6 @@ static void move_cluster(struct swap_info_struct *si,
> static void swap_cluster_schedule_discard(struct swap_info_struct *si,
> struct swap_cluster_info *ci)
> {
> - unsigned int idx = cluster_index(si, ci);
> - /*
> - * If scan_swap_map_slots() can't find a free cluster, it will
check
> - * si->swap_map directly. To make sure the discarding cluster
isn't
> - * taken by scan_swap_map_slots(), mark the swap entries bad
(occupied).
> - * It will be cleared after discard
> - */
> - memset(si->swap_map + idx * SWAPFILE_CLUSTER,
> - SWAP_MAP_BAD, SWAPFILE_CLUSTER);
> VM_BUG_ON(ci->flags == CLUSTER_FLAG_FREE);
> move_cluster(si, ci, &si->discard_clusters, CLUSTER_FLAG_DISCARD);
> schedule_work(&si->discard_work);
> @@ -571,8 +562,6 @@ static bool swap_do_scheduled_discard(struct
swap_info_struct *si)
> * return the cluster to allocation list.
> */
> ci->flags = CLUSTER_FLAG_NONE;
> - memset(si->swap_map + idx * SWAPFILE_CLUSTER,
> - 0, SWAPFILE_CLUSTER);
> __free_cluster(si, ci);
> spin_unlock(&ci->lock);
> ret = true;
Good, I wanted to do this some time ago, but forgot about this while busy
with other things.
Reviewed-by: Kairui Song <kasong@tencent.com>
[-- Attachment #2: Type: text/html, Size: 2795 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c
2025-02-23 1:44 ` [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Andrew Morton
@ 2025-02-24 1:27 ` Kemeng Shi
0 siblings, 0 replies; 14+ messages in thread
From: Kemeng Shi @ 2025-02-24 1:27 UTC (permalink / raw)
To: Andrew Morton; +Cc: kasong, linux-mm, linux-kernel
on 2/23/2025 9:44 AM, Andrew Morton wrote:
> On Sun, 23 Feb 2025 00:08:44 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>
>> This series includes several random fixes and cleanups to the mm/swapfile.c
>> code. The issues were discovered during code review and can only manifest
>> under extremely rare and race-condition scenarios. Further details about
>> each fix can be found in the respective patches. Thanks for your
>> attention!
>
> Thanks. I queued the first three patches in mm-hotfixes-stable, as
> they address post-6.13 issues which should be addressed during this -rc
> cycle.
>
> I queued the other three patches for 6.15-rcX.
>
> I corrected the Fixes: hash in [2/6].
>
> I made some changelog alterations, along the lines that Kairui
> mentioned. Please check the results and send along any alterations
> which you feel are desirable.
>
Thanks for correcting the changelog. The results look good to me
except patch 1 may still need some changelog improvement. Will
send a v2 of patch 1 soon.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-02-24 1:27 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-22 16:08 [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Kemeng Shi
2025-02-22 16:08 ` [PATCH 1/6] mm: swap: avoid losting cluster in swap_reclaim_full_clusters() Kemeng Shi
2025-02-22 17:19 ` Kairui Song
2025-02-24 1:17 ` Kemeng Shi
2025-02-22 16:08 ` [PATCH 2/6] mm: swap: use correct step in loop to wait all clusters in wait_for_allocation() Kemeng Shi
2025-02-22 17:32 ` Kairui Song
2025-02-22 16:08 ` [PATCH 3/6] mm, swap: avoid BUG_ON in relocate_cluster() Kemeng Shi
2025-02-22 18:43 ` Kairui Song
2025-02-22 16:08 ` [PATCH 4/6] mm, swap: remove setting SWAP_MAP_BAD for discard cluster Kemeng Shi
2025-02-24 1:24 ` Kairui Song
2025-02-22 16:08 ` [PATCH 5/6] mm, swap: correct comment in swap_usage_sub() Kemeng Shi
2025-02-22 16:08 ` [PATCH 6/6] mm: swap: remove stale comment of swap_reclaim_full_clusters() Kemeng Shi
2025-02-23 1:44 ` [PATCH 0/6] Some random fixes and cleanups to mm/swapfile.c Andrew Morton
2025-02-24 1:27 ` Kemeng Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox