From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Chris Li <chrisl@kernel.org>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Carsten Grohmann <carstengrohmann@gmx.de>,
linux-kernel@vger.kernel.org,
"open list:SUSPEND TO RAM" <linux-pm@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm, swap: simplify checking if a folio is swapped
Date: Sun, 15 Feb 2026 18:41:05 +0800 [thread overview]
Message-ID: <CAMgjq7C2K5mAVcieX5rUUk9Qu3o7OgHkFqBU86AsvxNQG0uoUw@mail.gmail.com> (raw)
In-Reply-To: <20260215103815.87329-1-ryncsn@gmail.com>
On Sun, Feb 15, 2026 at 6:38 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Clean up and simplify how we check if a folio is swapped. The helper
> already requires the folio to be in swap cache and locked. That's enough
> to pin the swap cluster from being freed, so there is no need to lock
> anything else to avoid UAF.
>
> And besides, we have cleaned up and defined the swap operation to be
> mostly folio based, and now the only place a folio will have any of its
> swap slots' count increased from 0 to 1 is folio_dup_swap, which also
> requires the folio lock. So as we are holding the folio lock here, a
> folio can't change its swap status from not swapped (all swap slots have
> a count of 0) to swapped (any slot has a swap count larger than 0).
>
> So there won't be any false negatives of this helper if we simply depend
> on the folio lock to stabilize the cluster.
>
> We are only using this helper to determine if we can and should release
> the swap cache. So false positives are completely harmless, and also
> already exist before. Depending on the timing, previously, it's also
> possible that a racing thread releases the swap count right after
> releasing the ci lock and before this helper returns. In any case, the
> worst that could happen is we leave a clean swap cache. It will still be
> reclaimed when under pressure just fine.
>
> So, in conclusion, we can simplify and make the check much simpler and
> lockless. Also, rename it to folio_maybe_swapped to reflect the design.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
> mm/swap.h | 5 ++--
> mm/swapfile.c | 82 ++++++++++++++++++++++++++++-----------------------
> 2 files changed, 48 insertions(+), 39 deletions(-)
>
> diff --git a/mm/swap.h b/mm/swap.h
> index 9fc5fecdcfdf..3ee761ee8348 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -195,12 +195,13 @@ extern int swap_retry_table_alloc(swp_entry_t entry, gfp_t gfp);
> *
> * folio_alloc_swap(): the entry point for a folio to be swapped
> * out. It allocates swap slots and pins the slots with swap cache.
> - * The slots start with a swap count of zero.
> + * The slots start with a swap count of zero. The slots are pinned
> + * by swap cache reference which doesn't contribute to swap count.
> *
> * folio_dup_swap(): increases the swap count of a folio, usually
> * during it gets unmapped and a swap entry is installed to replace
> * it (e.g., swap entry in page table). A swap slot with swap
> - * count == 0 should only be increasd by this helper.
> + * count == 0 can only be increased by this helper.
> *
> * folio_put_swap(): does the opposite thing of folio_dup_swap().
> */
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 9628015fd8cf..cb18960a6089 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1743,7 +1743,11 @@ int folio_alloc_swap(struct folio *folio)
> * @subpage: if not NULL, only increase the swap count of this subpage.
> *
> * Typically called when the folio is unmapped and have its swap entry to
> - * take its palce.
> + * take its place: Swap entries allocated to a folio has count == 0 and pinned
> + * by swap cache. The swap cache pin doesn't increase the swap count. This
> + * helper sets the initial count == 1 and increases the count as the folio is
> + * unmapped and swap entries referencing the slots are generated to replace
> + * the folio.
> *
> * Context: Caller must ensure the folio is locked and in the swap cache.
> * NOTE: The caller also has to ensure there is no raced call to
> @@ -1944,49 +1948,44 @@ int swp_swapcount(swp_entry_t entry)
> return count < 0 ? 0 : count;
> }
>
> -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si,
> - swp_entry_t entry, int order)
> +/*
> + * folio_maybe_swapped - Test if a folio covers any swap slot with count > 0.
> + *
> + * Check if a folio is swapped. Holding the folio lock ensures the folio won't
> + * go from not-swapped to swapped because the initial swap count increment can
> + * only be done by folio_dup_swap, which also locks the folio. But a concurrent
> + * decrease of swap count is possible through swap_put_entries_direct, so this
> + * may return a false positive.
> + *
> + * Context: Caller must ensure the folio is locked and in the swap cache.
> + */
> +static bool folio_maybe_swapped(struct folio *folio)
> {
> + swp_entry_t entry = folio->swap;
> struct swap_cluster_info *ci;
> - unsigned int nr_pages = 1 << order;
> - unsigned long roffset = swp_offset(entry);
> - unsigned long offset = round_down(roffset, nr_pages);
> - unsigned int ci_off;
> - int i;
> + unsigned int ci_off, ci_end;
> bool ret = false;
>
> - ci = swap_cluster_lock(si, offset);
> - if (nr_pages == 1) {
> - ci_off = roffset % SWAPFILE_CLUSTER;
> - if (swp_tb_get_count(__swap_table_get(ci, ci_off)))
> - ret = true;
> - goto unlock_out;
> - }
> - for (i = 0; i < nr_pages; i++) {
> - ci_off = (offset + i) % SWAPFILE_CLUSTER;
> - if (swp_tb_get_count(__swap_table_get(ci, ci_off))) {
> - ret = true;
> - break;
> - }
> - }
> -unlock_out:
> - swap_cluster_unlock(ci);
> - return ret;
> -}
> -
> -static bool folio_swapped(struct folio *folio)
> -{
> - swp_entry_t entry = folio->swap;
> - struct swap_info_struct *si;
> -
> VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
> VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio);
>
> - si = __swap_entry_to_info(entry);
> - if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!folio_test_large(folio)))
> - return swap_entry_swapped(si, entry);
> + ci = __swap_entry_to_cluster(entry);
> + ci_off = swp_cluster_offset(entry);
> + ci_end = ci_off + folio_nr_pages(folio);
> + /*
> + * Extra locking not needed, folio lock ensures its swap entries
> + * won't be released, the backing data won't be gone either.
> + */
> + rcu_read_lock();
> + do {
> + if (__swp_tb_get_count(__swap_table_get(ci, ci_off))) {
> + ret = true;
> + break;
> + }
> + } while (++ci_off < ci_end);
> + rcu_read_unlock();
>
> - return swap_page_trans_huge_swapped(si, entry, folio_order(folio));
> + return ret;
> }
>
> static bool folio_swapcache_freeable(struct folio *folio)
> @@ -2032,7 +2031,7 @@ bool folio_free_swap(struct folio *folio)
> {
> if (!folio_swapcache_freeable(folio))
> return false;
> - if (folio_swapped(folio))
> + if (folio_maybe_swapped(folio))
> return false;
>
> swap_cache_del_folio(folio);
> @@ -3710,6 +3709,8 @@ void si_swapinfo(struct sysinfo *val)
> *
> * Context: Caller must ensure there is no race condition on the reference
> * owner. e.g., locking the PTL of a PTE containing the entry being increased.
> + * Also the swap entry must have a count >= 1. Otherwise folio_dup_swap should
> + * be used.
> */
> int swap_dup_entry_direct(swp_entry_t entry)
> {
> @@ -3721,6 +3722,13 @@ int swap_dup_entry_direct(swp_entry_t entry)
> return -EINVAL;
> }
>
> + /*
> + * The caller must be increasing the swap count from a direct
> + * reference of the swap slot (e.g. a swap entry in page table).
> + * So the swap count must be >= 1.
> + */
> + VM_WARN_ON_ONCE(!swap_entry_swapped(si, entry));
> +
> return swap_dup_entries_cluster(si, swp_offset(entry), 1);
> }
>
> --
> 2.52.0
>
Very sorry about this :/, something is wrong with my local branch
so this is the wrong patch.
Please ignore this series, thanks!
next prev parent reply other threads:[~2026-02-15 10:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-15 10:25 [PATCH 0/2] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song
2026-02-15 10:25 ` [PATCH 2/2] mm, swap: merge common convention and simplify allocation helper Kairui Song
2026-02-15 10:38 ` [PATCH 1/2] mm, swap: simplify checking if a folio is swapped Kairui Song
2026-02-15 10:41 ` Kairui Song [this message]
2026-02-15 11:18 ` [PATCH 0/2] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMgjq7C2K5mAVcieX5rUUk9Qu3o7OgHkFqBU86AsvxNQG0uoUw@mail.gmail.com \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=carstengrohmann@gmx.de \
--cc=chrisl@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=nphamcs@gmail.com \
--cc=rafael@kernel.org \
--cc=shikemeng@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox