linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: kasong@tencent.com
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>,  Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Youngjun Park <youngjun.park@lge.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 11/12] mm, swap: simplify checking if a folio is swapped
Date: Wed, 18 Feb 2026 23:18:25 -0800	[thread overview]
Message-ID: <CACePvbUWVvy68XGfRTKxG6cTSKSt5OGRy=FhO5_TYY=oCO=2FQ@mail.gmail.com> (raw)
In-Reply-To: <20260218-swap-table-p3-v3-11-f4e34be021a7@tencent.com>

On Tue, Feb 17, 2026 at 12:06 PM Kairui Song via B4 Relay
<devnull+kasong.tencent.com@kernel.org> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Clean up and simplify how we check if a folio is swapped. The helper
> already requires the folio to be in swap cache and locked. That's enough
> to pin the swap cluster from being freed, so there is no need to lock
> anything else to avoid UAF.
>
> And besides, we have cleaned up and defined the swap operation to be
> mostly folio based, and now the only place a folio will have any of its
> swap slots' count increased from 0 to 1 is folio_dup_swap, which also
> requires the folio lock. So as we are holding the folio lock here, a
> folio can't change its swap status from not swapped (all swap slots have
> a count of 0) to swapped (any slot has a swap count larger than 0).
>
> So there won't be any false negatives of this helper if we simply depend
> on the folio lock to stabilize the cluster.
>
> We are only using this helper to determine if we can and should release
> the swap cache. So false positives are completely harmless, and also
> already exist before. Depending on the timing, previously, it's also
> possible that a racing thread releases the swap count right after
> releasing the ci lock and before this helper returns. In any case, the
> worst that could happen is we leave a clean swap cache. It will still be
> reclaimed when under pressure just fine.
>
> So, in conclusion, we can simplify and make the check much simpler and
> lockless. Also, rename it to folio_maybe_swapped to reflect the design.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>

Acked-by: Chris Li <chrisl@kernel.org>

Chris

> ---
>  mm/swap.h     |  5 ++--
>  mm/swapfile.c | 82 ++++++++++++++++++++++++++++++++---------------------------
>  2 files changed, 48 insertions(+), 39 deletions(-)
>
> diff --git a/mm/swap.h b/mm/swap.h
> index cc410b94e91a..9728e6a944b2 100644
> --- a/mm/swap.h
> +++ b/mm/swap.h
> @@ -195,12 +195,13 @@ extern int swap_retry_table_alloc(swp_entry_t entry, gfp_t gfp);
>   *
>   * folio_alloc_swap(): the entry point for a folio to be swapped
>   * out. It allocates swap slots and pins the slots with swap cache.
> - * The slots start with a swap count of zero.
> + * The slots start with a swap count of zero. The slots are pinned
> + * by swap cache reference which doesn't contribute to swap count.
>   *
>   * folio_dup_swap(): increases the swap count of a folio, usually
>   * during it gets unmapped and a swap entry is installed to replace
>   * it (e.g., swap entry in page table). A swap slot with swap
> - * count == 0 should only be increasd by this helper.
> + * count == 0 can only be increased by this helper.
>   *
>   * folio_put_swap(): does the opposite thing of folio_dup_swap().
>   */
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index df2b88c6c67b..dab5e726855b 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1743,7 +1743,11 @@ int folio_alloc_swap(struct folio *folio)
>   * @subpage: if not NULL, only increase the swap count of this subpage.
>   *
>   * Typically called when the folio is unmapped and have its swap entry to
> - * take its palce.
> + * take its place: Swap entries allocated to a folio has count == 0 and pinned
> + * by swap cache. The swap cache pin doesn't increase the swap count. This
> + * helper sets the initial count == 1 and increases the count as the folio is
> + * unmapped and swap entries referencing the slots are generated to replace
> + * the folio.
>   *
>   * Context: Caller must ensure the folio is locked and in the swap cache.
>   * NOTE: The caller also has to ensure there is no raced call to
> @@ -1942,49 +1946,44 @@ int swp_swapcount(swp_entry_t entry)
>         return count < 0 ? 0 : count;
>  }
>
> -static bool swap_page_trans_huge_swapped(struct swap_info_struct *si,
> -                                        swp_entry_t entry, int order)
> +/*
> + * folio_maybe_swapped - Test if a folio covers any swap slot with count > 0.
> + *
> + * Check if a folio is swapped. Holding the folio lock ensures the folio won't
> + * go from not-swapped to swapped because the initial swap count increment can
> + * only be done by folio_dup_swap, which also locks the folio. But a concurrent
> + * decrease of swap count is possible through swap_put_entries_direct, so this
> + * may return a false positive.
> + *
> + * Context: Caller must ensure the folio is locked and in the swap cache.
> + */
> +static bool folio_maybe_swapped(struct folio *folio)
>  {
> +       swp_entry_t entry = folio->swap;
>         struct swap_cluster_info *ci;
> -       unsigned int nr_pages = 1 << order;
> -       unsigned long roffset = swp_offset(entry);
> -       unsigned long offset = round_down(roffset, nr_pages);
> -       unsigned int ci_off;
> -       int i;
> +       unsigned int ci_off, ci_end;
>         bool ret = false;
>
> -       ci = swap_cluster_lock(si, offset);
> -       if (nr_pages == 1) {
> -               ci_off = roffset % SWAPFILE_CLUSTER;
> -               if (swp_tb_get_count(__swap_table_get(ci, ci_off)))
> -                       ret = true;
> -               goto unlock_out;
> -       }
> -       for (i = 0; i < nr_pages; i++) {
> -               ci_off = (offset + i) % SWAPFILE_CLUSTER;
> -               if (swp_tb_get_count(__swap_table_get(ci, ci_off))) {
> -                       ret = true;
> -                       break;
> -               }
> -       }
> -unlock_out:
> -       swap_cluster_unlock(ci);
> -       return ret;
> -}
> -
> -static bool folio_swapped(struct folio *folio)
> -{
> -       swp_entry_t entry = folio->swap;
> -       struct swap_info_struct *si;
> -
>         VM_WARN_ON_ONCE_FOLIO(!folio_test_locked(folio), folio);
>         VM_WARN_ON_ONCE_FOLIO(!folio_test_swapcache(folio), folio);
>
> -       si = __swap_entry_to_info(entry);
> -       if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!folio_test_large(folio)))
> -               return swap_entry_swapped(si, entry);
> +       ci = __swap_entry_to_cluster(entry);
> +       ci_off = swp_cluster_offset(entry);
> +       ci_end = ci_off + folio_nr_pages(folio);
> +       /*
> +        * Extra locking not needed, folio lock ensures its swap entries
> +        * won't be released, the backing data won't be gone either.
> +        */
> +       rcu_read_lock();
> +       do {
> +               if (__swp_tb_get_count(__swap_table_get(ci, ci_off))) {
> +                       ret = true;
> +                       break;
> +               }
> +       } while (++ci_off < ci_end);
> +       rcu_read_unlock();
>
> -       return swap_page_trans_huge_swapped(si, entry, folio_order(folio));
> +       return ret;
>  }
>
>  static bool folio_swapcache_freeable(struct folio *folio)
> @@ -2030,7 +2029,7 @@ bool folio_free_swap(struct folio *folio)
>  {
>         if (!folio_swapcache_freeable(folio))
>                 return false;
> -       if (folio_swapped(folio))
> +       if (folio_maybe_swapped(folio))
>                 return false;
>
>         swap_cache_del_folio(folio);
> @@ -3719,6 +3718,8 @@ void si_swapinfo(struct sysinfo *val)
>   *
>   * Context: Caller must ensure there is no race condition on the reference
>   * owner. e.g., locking the PTL of a PTE containing the entry being increased.
> + * Also the swap entry must have a count >= 1. Otherwise folio_dup_swap should
> + * be used.
>   */
>  int swap_dup_entry_direct(swp_entry_t entry)
>  {
> @@ -3730,6 +3731,13 @@ int swap_dup_entry_direct(swp_entry_t entry)
>                 return -EINVAL;
>         }
>
> +       /*
> +        * The caller must be increasing the swap count from a direct
> +        * reference of the swap slot (e.g. a swap entry in page table).
> +        * So the swap count must be >= 1.
> +        */
> +       VM_WARN_ON_ONCE(!swap_entry_swapped(si, entry));
> +
>         return swap_dup_entries_cluster(si, swp_offset(entry), 1);
>  }
>
>
> --
> 2.52.0
>
>


  reply	other threads:[~2026-02-19  7:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-17 20:06 [PATCH v3 00/12] mm, swap: swap table phase III: remove swap_map Kairui Song via B4 Relay
2026-02-17 20:06 ` [PATCH v3 01/12] mm, swap: protect si->swap_file properly and use as a mount indicator Kairui Song via B4 Relay
2026-02-19  6:36   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 02/12] mm, swap: clean up swapon process and locking Kairui Song via B4 Relay
2026-02-19  6:45   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 03/12] mm, swap: remove redundant arguments and locking for enabling a device Kairui Song via B4 Relay
2026-02-19  6:48   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 04/12] mm, swap: consolidate bad slots setup and make it more robust Kairui Song via B4 Relay
2026-02-19  6:51   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 05/12] mm/workingset: leave highest bits empty for anon shadow Kairui Song via B4 Relay
2026-02-19  6:56   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 06/12] mm, swap: implement helpers for reserving data in the swap table Kairui Song via B4 Relay
2026-02-19  7:00   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 07/12] mm, swap: mark bad slots in swap table directly Kairui Song via B4 Relay
2026-02-19  7:01   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 08/12] mm, swap: simplify swap table sanity range check Kairui Song via B4 Relay
2026-02-19  7:02   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 09/12] mm, swap: use the swap table to track the swap count Kairui Song via B4 Relay
2026-02-18 10:40   ` kernel test robot
2026-02-18 12:22     ` Kairui Song
2026-02-19  7:06       ` Chris Li
2026-02-17 20:06 ` [PATCH v3 10/12] mm, swap: no need to truncate the scan border Kairui Song via B4 Relay
2026-02-19  7:10   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 11/12] mm, swap: simplify checking if a folio is swapped Kairui Song via B4 Relay
2026-02-19  7:18   ` Chris Li [this message]
2026-02-17 20:06 ` [PATCH v3 12/12] mm, swap: no need to clear the shadow explicitly Kairui Song via B4 Relay
2026-02-19  7:19   ` Chris Li
2026-02-17 20:10 ` [PATCH v3 00/12] mm, swap: swap table phase III: remove swap_map Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACePvbUWVvy68XGfRTKxG6cTSKSt5OGRy=FhO5_TYY=oCO=2FQ@mail.gmail.com' \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox