From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
linux-kernel@vger.kernel.org, Chris Li <chrisl@kernel.org>,
Kairui Song <kasong@tencent.com>
Subject: [PATCH 11/12] mm, swap: simplify checking if a folio is swapped
Date: Mon, 26 Jan 2026 01:57:34 +0800 [thread overview]
Message-ID: <20260126-swap-table-p3-v1-11-a74155fab9b0@tencent.com> (raw)
In-Reply-To: <20260126-swap-table-p3-v1-0-a74155fab9b0@tencent.com>
From: Kairui Song <kasong@tencent.com>
Clean up and simplify how we check if a folio is swapped. The helper
already requires the folio to be in swap cache and locked. That's enough
to pin the swap cluster from being freed, so there is no need to lock
anything else to avoid UAF.
And besides, we have cleaned up and defined the swap operation to be
mostly folio based, and now the only place a folio will have any of its
swap slots' count increased from 0 to 1 is folio_dup_swap, which also
requires the folio lock. So as we are holding the folio lock here, a
folio can't change its swap status from not swapped (all swap slots have
a count of 0) to swapped (any slot has a swap count larger than 0).
So there won't be any false negatives of this helper if we simply depend
on the folio lock to stabilize the cluster.
We are only using this helper to determine if we can and should release
the swap cache. So false positives are completely harmless, and also
already exist before. Depending on the timing, previously, it's also
possible that a racing thread releases the swap count right after
releasing the ci lock and before this helper returns. In any case, the
worst that could happen is we leave a clean swap cache. It will still be
reclaimed when under pressure just fine.
So, in conclusion, we can simplify and make the check much simpler and
lockless. Also, rename it to folio_maybe_swapped to reflect the design.
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/swap.h | 5 ++--
mm/swapfile.c | 82 ++++++++++++++++++++++++++++++++---------------------------
2 files changed, 48 insertions(+), 39 deletions(-)
diff --git a/mm/swap.h b/mm/swap.h
index 751430e2d2a5..3395c2aa1956 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -195,12 +195,13 @@ extern int swap_retry_table_alloc(swp_entry_t entry, gfp_t gfp);
*
* folio_alloc_swap(): the entry point for a folio to be swapped
* out. It allocates swap slots and pins the slots with swap cache.
- * The slots start with a swap count of zero.
+ * The slots start with a swap count of zero. The slots are pinned
+ * by swap cache reference which doesn't contribute to swap count.
*
* folio_dup_swap(): increases the swap count of a folio, usually
* during it gets unmapped and a swap entry is installed to replace
* it (e.g., swap entry in page table). A swap slot with swap
- * count == 0 should only be increasd by this helper.
+ * count == 0 can only be increased by this helper.
*
* folio_put_swap(): does the opposite thing of folio_dup_swap().
*/
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1cf18761c0fd..f2b5013c7692 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1740,7 +1740,11 @@ int folio_alloc_swap(struct folio *folio)
* @subpage: if not NULL, only increase the swap count of this subpage.
*
* Typically called when the folio is unmapped and have its swap entry to
- * take its palce.
+ * take its place: Swap entries allocated to a folio has count == 0 and pinned
+ * by swap cache. The swap cache pin doesn't increase the swap count. This
+ * helper sets the initial count == 1 and increases the count as the folio is
+ * unmapped and swap entries referencing the slots are generated to replace
+ * the folio.
*
* Context: Caller must ensure the folio is locked and in the swap cache.
* NOTE: The caller also has to ensure there is no raced call to
@@ -1941,49 +1945,44 @@ int swp_swapcount(swp_entry_t entry)
return count < 0 ? 0 : count;
}
-static bool swap_page_trans_huge_swapped(struct swap_info_struct *si,
- swp_entry_t entry, int order)
+/*
+ * folio_maybe_swapped - Test if a folio covers any swap slot with count > 0.
+ *
+ * Check if a folio is swapped. Holding the folio lock ensures the folio won't
+ * go from not-swapped to swapped because the initial swap count increment can
+ * only be done by folio_dup_swap, which also locks the folio. But a concurrent
+ * decrease of swap count is possible through swap_put_entries_direct, so this
+ * may return a false positive.
+ *
+ * Context: Caller must ensure the folio is locked and in the swap cache.
+ */
+static bool folio_maybe_swapped(struct folio *folio)
{
+ swp_entry_t entry = folio->swap;
struct swap_cluster_info *ci;
- unsigned int nr_pages = 1 << order;
- unsigned long roffset = swp_offset(entry);
- unsigned long offset = round_down(roffset, nr_pages);
- unsigned int ci_off;
- int i;
+ unsigned int ci_off, ci_end;
bool ret = false;
- ci = swap_cluster_lock(si, offset);
- if (nr_pages == 1) {
- ci_off = roffset % SWAPFILE_CLUSTER;
- if (swp_tb_get_count(__swap_table_get(ci, ci_off)))
- ret = true;
- goto unlock_out;
- }
- for (i = 0; i < nr_pages; i++) {
- ci_off = (offset + i) % SWAPFILE_CLUSTER;
- if (swp_tb_get_count(__swap_table_get(ci, ci_off))) {
- ret = true;
- break;
- }
- }
-unlock_out:
- swap_cluster_unlock(ci);
- return ret;
-}
-
-static bool folio_swapped(struct folio *folio)
-{
- swp_entry_t entry = folio->swap;
- struct swap_info_struct *si;
-
VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio);
- si = __swap_entry_to_info(entry);
- if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!folio_test_large(folio)))
- return swap_entry_swapped(si, entry);
+ ci = __swap_entry_to_cluster(entry);
+ ci_off = swp_cluster_offset(entry);
+ ci_end = ci_off + folio_nr_pages(folio);
+ /*
+ * Extra locking not needed, folio lock ensures its swap entries
+ * won't be released, the backing data won't be gone either.
+ */
+ rcu_read_lock();
+ do {
+ if (__swp_tb_get_count(__swap_table_get(ci, ci_off))) {
+ ret = true;
+ break;
+ }
+ } while (++ci_off < ci_end);
+ rcu_read_unlock();
- return swap_page_trans_huge_swapped(si, entry, folio_order(folio));
+ return ret;
}
static bool folio_swapcache_freeable(struct folio *folio)
@@ -2029,7 +2028,7 @@ bool folio_free_swap(struct folio *folio)
{
if (!folio_swapcache_freeable(folio))
return false;
- if (folio_swapped(folio))
+ if (folio_maybe_swapped(folio))
return false;
swap_cache_del_folio(folio);
@@ -3713,6 +3712,8 @@ void si_swapinfo(struct sysinfo *val)
*
* Context: Caller must ensure there is no race condition on the reference
* owner. e.g., locking the PTL of a PTE containing the entry being increased.
+ * Also the swap entry must have a count >= 1. Otherwise folio_dup_swap should
+ * be used.
*/
int swap_dup_entry_direct(swp_entry_t entry)
{
@@ -3724,6 +3725,13 @@ int swap_dup_entry_direct(swp_entry_t entry)
return -EINVAL;
}
+ /*
+ * The caller must be increasing the swap count from a direct
+ * reference of the swap slot (e.g. a swap entry in page table).
+ * So the swap count must be >= 1.
+ */
+ VM_WARN_ON_ONCE(!swap_entry_swapped(si, entry));
+
return swap_dup_entries_cluster(si, swp_offset(entry), 1);
}
--
2.52.0
next prev parent reply other threads:[~2026-01-25 17:58 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-25 17:57 [PATCH 00/12] mm, swap: swap table phase III: remove swap_map Kairui Song
2026-01-25 17:57 ` [PATCH 01/12] mm, swap: protect si->swap_file properly and use as a mount indicator Kairui Song
2026-01-25 17:57 ` [PATCH 02/12] mm, swap: clean up swapon process and locking Kairui Song
2026-01-25 17:57 ` [PATCH 03/12] mm, swap: remove redundant arguments and locking for enabling a device Kairui Song
2026-01-25 17:57 ` [PATCH 04/12] mm, swap: consolidate bad slots setup and make it more robust Kairui Song
2026-01-25 17:57 ` [PATCH 05/12] mm/workingset: leave highest bits empty for anon shadow Kairui Song
2026-01-25 17:57 ` [PATCH 06/12] mm, swap: implement helpers for reserving data in the swap table Kairui Song
2026-01-26 7:15 ` kernel test robot
2026-01-26 8:28 ` Kairui Song
2026-01-25 17:57 ` [PATCH 07/12] mm, swap: mark bad slots in swap table directly Kairui Song
2026-01-25 17:57 ` [PATCH 08/12] mm, swap: simplify swap table sanity range check Kairui Song
2026-01-25 17:57 ` [PATCH 09/12] mm, swap: use the swap table to track the swap count Kairui Song
2026-01-26 1:03 ` kernel test robot
2026-01-26 4:04 ` kernel test robot
2026-01-25 17:57 ` [PATCH 10/12] mm, swap: no need to truncate the scan border Kairui Song
2026-01-25 17:57 ` Kairui Song [this message]
2026-01-25 17:57 ` [PATCH 12/12] mm, swap: no need to clear the shadow explicitly Kairui Song
2026-01-25 22:13 ` [syzbot ci] Re: mm, swap: swap table phase III: remove swap_map syzbot ci
2026-01-26 3:11 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260126-swap-table-p3-v1-11-a74155fab9b0@tencent.com \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox