From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Baoquan He <bhe@redhat.com>, Barry Song <baohua@kernel.org>,
Chris Li <chrisl@kernel.org>, Nhat Pham <nphamcs@gmail.com>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
David Hildenbrand <david@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Youngjun Park <youngjun.park@lge.com>,
Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Ying Huang <ying.huang@linux.alibaba.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
Subject: [PATCH v3 10/19] mm, swap: consolidate cluster reclaim and usability check
Date: Tue, 25 Nov 2025 03:13:53 +0800 [thread overview]
Message-ID: <20251125-swap-table-p2-v3-10-33f54f707a5c@tencent.com> (raw)
In-Reply-To: <20251125-swap-table-p2-v3-0-33f54f707a5c@tencent.com>
From: Kairui Song <kasong@tencent.com>
Swap cluster cache reclaim requires releasing the lock, so the cluster
may become unusable after the reclaim. To prepare for checking swap
cache using the swap table directly, consolidate the swap cluster
reclaim and the check logic.
We will want to avoid touching the cluster's data completely with the
swap table, to avoid RCU overhead here. And by moving the cluster usable
check into the reclaim helper, it will also help avoid a redundant scan of
the slots if the cluster is no longer usable, and we will want to avoid
touching the cluster.
Also, adjust it very slightly while at it: always scan the whole region
during reclaim, don't skip slots covered by a reclaimed folio. Because
the reclaim is lockless, it's possible that new cache lands at any time.
And for allocation, we want all caches to be reclaimed to avoid
fragmentation. Besides, if the scan offset is not aligned with the size
of the reclaimed folio, we might skip some existing cache and fail the
reclaim unexpectedly.
There should be no observable behavior change. It might slightly improve
the fragmentation issue or performance.
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/swapfile.c | 45 +++++++++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 16 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index cb59930b6415..bdbdb4a4c452 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -777,33 +777,51 @@ static int swap_cluster_setup_bad_slot(struct swap_cluster_info *cluster_info,
return 0;
}
+/*
+ * Reclaim drops the ci lock, so the cluster may become unusable (freed or
+ * stolen by a lower order). @usable will be set to false if that happens.
+ */
static bool cluster_reclaim_range(struct swap_info_struct *si,
struct swap_cluster_info *ci,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned int order,
+ bool *usable)
{
+ unsigned int nr_pages = 1 << order;
+ unsigned long offset = start, end = start + nr_pages;
unsigned char *map = si->swap_map;
- unsigned long offset = start;
int nr_reclaim;
spin_unlock(&ci->lock);
do {
switch (READ_ONCE(map[offset])) {
case 0:
- offset++;
break;
case SWAP_HAS_CACHE:
nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
- if (nr_reclaim > 0)
- offset += nr_reclaim;
- else
+ if (nr_reclaim < 0)
goto out;
break;
default:
goto out;
}
- } while (offset < end);
+ } while (++offset < end);
out:
spin_lock(&ci->lock);
+
+ /*
+ * We just dropped ci->lock so cluster could be used by another
+ * order or got freed, check if it's still usable or empty.
+ */
+ if (!cluster_is_usable(ci, order)) {
+ *usable = false;
+ return false;
+ }
+ *usable = true;
+
+ /* Fast path, no need to scan if the whole cluster is empty */
+ if (cluster_is_empty(ci))
+ return true;
+
/*
* Recheck the range no matter reclaim succeeded or not, the slot
* could have been be freed while we are not holding the lock.
@@ -900,9 +918,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si,
unsigned long start = ALIGN_DOWN(offset, SWAPFILE_CLUSTER);
unsigned long end = min(start + SWAPFILE_CLUSTER, si->max);
unsigned int nr_pages = 1 << order;
- bool need_reclaim, ret;
+ bool need_reclaim, ret, usable;
lockdep_assert_held(&ci->lock);
+ VM_WARN_ON(!cluster_is_usable(ci, order));
if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER)
goto out;
@@ -912,14 +931,8 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si,
if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim))
continue;
if (need_reclaim) {
- ret = cluster_reclaim_range(si, ci, offset, offset + nr_pages);
- /*
- * Reclaim drops ci->lock and cluster could be used
- * by another order. Not checking flag as off-list
- * cluster has no flag set, and change of list
- * won't cause fragmentation.
- */
- if (!cluster_is_usable(ci, order))
+ ret = cluster_reclaim_range(si, ci, offset, order, &usable);
+ if (!usable)
goto out;
if (cluster_is_empty(ci))
offset = start;
--
2.52.0
next prev parent reply other threads:[~2025-11-24 19:16 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-24 19:13 [PATCH v3 00/19] mm, swap: swap table phase II: unify swapin use swap cache and cleanup flags Kairui Song
2025-11-24 19:13 ` [PATCH v3 01/19] mm, swap: rename __read_swap_cache_async to swap_cache_alloc_folio Kairui Song
2025-11-24 19:13 ` [PATCH v3 02/19] mm, swap: split swap cache preparation loop into a standalone helper Kairui Song
2025-11-24 19:13 ` [PATCH v3 03/19] mm, swap: never bypass the swap cache even for SWP_SYNCHRONOUS_IO Kairui Song
2025-11-24 19:13 ` [PATCH v3 04/19] mm, swap: always try to free swap cache for SWP_SYNCHRONOUS_IO devices Kairui Song
2025-11-24 19:13 ` [PATCH v3 05/19] mm, swap: simplify the code and reduce indention Kairui Song
2025-11-24 19:13 ` [PATCH v3 06/19] mm, swap: free the swap cache after folio is mapped Kairui Song
2025-11-24 19:13 ` [PATCH v3 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Kairui Song
2025-12-02 7:34 ` Baolin Wang
2025-12-03 5:33 ` Kairui Song
2025-12-04 12:30 ` Baolin Wang
2025-11-24 19:13 ` [PATCH v3 08/19] mm/shmem, swap: remove SWAP_MAP_SHMEM Kairui Song
2025-12-02 7:04 ` Baolin Wang
2025-11-24 19:13 ` [PATCH v3 09/19] mm, swap: swap entry of a bad slot should not be considered as swapped out Kairui Song
2025-11-24 19:13 ` Kairui Song [this message]
2025-11-24 19:13 ` [PATCH v3 11/19] mm, swap: split locked entry duplicating into a standalone helper Kairui Song
2025-11-24 19:13 ` [PATCH v3 12/19] mm, swap: use swap cache as the swap in synchronize layer Kairui Song
2025-11-24 19:13 ` [PATCH v3 13/19] mm, swap: remove workaround for unsynchronized swap map cache state Kairui Song
2025-11-24 19:13 ` [PATCH v3 14/19] mm, swap: cleanup swap entry management workflow Kairui Song
2025-11-25 18:11 ` Rafael J. Wysocki
2025-11-24 19:13 ` [PATCH v3 15/19] mm, swap: add folio to swap cache directly on allocation Kairui Song
2025-11-24 19:13 ` [PATCH v3 16/19] mm, swap: check swap table directly for checking cache Kairui Song
2025-11-24 19:14 ` [PATCH v3 17/19] mm, swap: clean up and improve swap entries freeing Kairui Song
2025-11-24 19:14 ` [PATCH v3 18/19] mm, swap: drop the SWAP_HAS_CACHE flag Kairui Song
2025-11-24 19:14 ` [PATCH v3 19/19] mm, swap: remove no longer needed _swap_info_get Kairui Song
2025-11-29 17:07 ` [PATCH v3 00/19] mm, swap: swap table phase II: unify swapin use swap cache and cleanup flags Chris Li
2025-11-29 18:18 ` Andrew Morton
2025-11-30 20:44 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251125-swap-table-p2-v3-10-33f54f707a5c@tencent.com \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yosry.ahmed@linux.dev \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox