linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: kasong@tencent.com
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>,  Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Youngjun Park <youngjun.park@lge.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 06/12] mm, swap: implement helpers for reserving data in the swap table
Date: Wed, 18 Feb 2026 23:00:03 -0800	[thread overview]
Message-ID: <CACePvbUpCXTzNUFw1sHRao+pDZDhN9N7YUowwFQ_gOLihZx8+g@mail.gmail.com> (raw)
In-Reply-To: <20260218-swap-table-p3-v3-6-f4e34be021a7@tencent.com>

On Tue, Feb 17, 2026 at 12:06 PM Kairui Song via B4 Relay
<devnull+kasong.tencent.com@kernel.org> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> To prepare for using the swap table as the unified swap layer, introduce
> macros and helpers for storing multiple kinds of data in a swap table
> entry.
>
> From now on, we are storing PFN in the swap table to make space for
> extra counting bits (SWAP_COUNT). Shadows are still stored as they are,
> as the SWAP_COUNT is not used yet.
>
> Also, rename shadow_swp_to_tb to shadow_to_swp_tb. That's a spelling
> error, not really worth a separate fix.
>
> No behaviour change yet, just prepare the API.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>

Acked-by: Chris Li <chrisl@kernel.org>

Chris

> ---
>  mm/swap_state.c |   6 +--
>  mm/swap_table.h | 131 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 124 insertions(+), 13 deletions(-)
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 6d0eef7470be..e213ee35c1d2 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -148,7 +148,7 @@ void __swap_cache_add_folio(struct swap_cluster_info *ci,
>         VM_WARN_ON_ONCE_FOLIO(folio_test_swapcache(folio), folio);
>         VM_WARN_ON_ONCE_FOLIO(!folio_test_swapbacked(folio), folio);
>
> -       new_tb = folio_to_swp_tb(folio);
> +       new_tb = folio_to_swp_tb(folio, 0);
>         ci_start = swp_cluster_offset(entry);
>         ci_off = ci_start;
>         ci_end = ci_start + nr_pages;
> @@ -249,7 +249,7 @@ void __swap_cache_del_folio(struct swap_cluster_info *ci, struct folio *folio,
>         VM_WARN_ON_ONCE_FOLIO(folio_test_writeback(folio), folio);
>
>         si = __swap_entry_to_info(entry);
> -       new_tb = shadow_swp_to_tb(shadow);
> +       new_tb = shadow_to_swp_tb(shadow, 0);
>         ci_start = swp_cluster_offset(entry);
>         ci_end = ci_start + nr_pages;
>         ci_off = ci_start;
> @@ -331,7 +331,7 @@ void __swap_cache_replace_folio(struct swap_cluster_info *ci,
>         VM_WARN_ON_ONCE(!entry.val);
>
>         /* Swap cache still stores N entries instead of a high-order entry */
> -       new_tb = folio_to_swp_tb(new);
> +       new_tb = folio_to_swp_tb(new, 0);
>         do {
>                 old_tb = __swap_table_xchg(ci, ci_off, new_tb);
>                 WARN_ON_ONCE(!swp_tb_is_folio(old_tb) || swp_tb_to_folio(old_tb) != old);
> diff --git a/mm/swap_table.h b/mm/swap_table.h
> index 10e11d1f3b04..10762ac5f4f5 100644
> --- a/mm/swap_table.h
> +++ b/mm/swap_table.h
> @@ -12,17 +12,72 @@ struct swap_table {
>  };
>
>  #define SWP_TABLE_USE_PAGE (sizeof(struct swap_table) == PAGE_SIZE)
> -#define SWP_TB_COUNT_BITS              4
>
>  /*
>   * A swap table entry represents the status of a swap slot on a swap
>   * (physical or virtual) device. The swap table in each cluster is a
>   * 1:1 map of the swap slots in this cluster.
>   *
> - * Each swap table entry could be a pointer (folio), a XA_VALUE
> - * (shadow), or NULL.
> + * Swap table entry type and bits layouts:
> + *
> + * NULL:     |---------------- 0 ---------------| - Free slot
> + * Shadow:   | SWAP_COUNT |---- SHADOW_VAL ---|1| - Swapped out slot
> + * PFN:      | SWAP_COUNT |------ PFN -------|10| - Cached slot
> + * Pointer:  |----------- Pointer ----------|100| - (Unused)
> + * Bad:      |------------- 1 -------------|1000| - Bad slot
> + *
> + * SWAP_COUNT is `SWP_TB_COUNT_BITS` long, each entry is an atomic long.
> + *
> + * Usages:
> + *
> + * - NULL: Swap slot is unused, could be allocated.
> + *
> + * - Shadow: Swap slot is used and not cached (usually swapped out). It reuses
> + *   the XA_VALUE format to be compatible with working set shadows. SHADOW_VAL
> + *   part might be all 0 if the working shadow info is absent. In such a case,
> + *   we still want to keep the shadow format as a placeholder.
> + *
> + *   Memcg ID is embedded in SHADOW_VAL.
> + *
> + * - PFN: Swap slot is in use, and cached. Memcg info is recorded on the page
> + *   struct.
> + *
> + * - Pointer: Unused yet. `0b100` is reserved for potential pointer usage
> + *   because only the lower three bits can be used as a marker for 8 bytes
> + *   aligned pointers.
> + *
> + * - Bad: Swap slot is reserved, protects swap header or holes on swap devices.
>   */
>
> +#if defined(MAX_POSSIBLE_PHYSMEM_BITS)
> +#define SWAP_CACHE_PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)
> +#elif defined(MAX_PHYSMEM_BITS)
> +#define SWAP_CACHE_PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT)
> +#else
> +#define SWAP_CACHE_PFN_BITS (BITS_PER_LONG - PAGE_SHIFT)
> +#endif
> +
> +/* NULL Entry, all 0 */
> +#define SWP_TB_NULL            0UL
> +
> +/* Swapped out: shadow */
> +#define SWP_TB_SHADOW_MARK     0b1UL
> +
> +/* Cached: PFN */
> +#define SWP_TB_PFN_BITS                (SWAP_CACHE_PFN_BITS + SWP_TB_PFN_MARK_BITS)
> +#define SWP_TB_PFN_MARK                0b10UL
> +#define SWP_TB_PFN_MARK_BITS   2
> +#define SWP_TB_PFN_MARK_MASK   (BIT(SWP_TB_PFN_MARK_BITS) - 1)
> +
> +/* SWAP_COUNT part for PFN or shadow, the width can be shrunk or extended */
> +#define SWP_TB_COUNT_BITS      min(4, BITS_PER_LONG - SWP_TB_PFN_BITS)
> +#define SWP_TB_COUNT_MASK      (~((~0UL) >> SWP_TB_COUNT_BITS))
> +#define SWP_TB_COUNT_SHIFT     (BITS_PER_LONG - SWP_TB_COUNT_BITS)
> +#define SWP_TB_COUNT_MAX       ((1 << SWP_TB_COUNT_BITS) - 1)
> +
> +/* Bad slot: ends with 0b1000 and rests of bits are all 1 */
> +#define SWP_TB_BAD             ((~0UL) << 3)
> +
>  /* Macro for shadow offset calculation */
>  #define SWAP_COUNT_SHIFT       SWP_TB_COUNT_BITS
>
> @@ -35,18 +90,47 @@ static inline unsigned long null_to_swp_tb(void)
>         return 0;
>  }
>
> -static inline unsigned long folio_to_swp_tb(struct folio *folio)
> +static inline unsigned long __count_to_swp_tb(unsigned char count)
>  {
> +       /*
> +        * At least three values are needed to distinguish free (0),
> +        * used (count > 0 && count < SWP_TB_COUNT_MAX), and
> +        * overflow (count == SWP_TB_COUNT_MAX).
> +        */
> +       BUILD_BUG_ON(SWP_TB_COUNT_MAX < 2 || SWP_TB_COUNT_BITS < 2);
> +       VM_WARN_ON(count > SWP_TB_COUNT_MAX);
> +       return ((unsigned long)count) << SWP_TB_COUNT_SHIFT;
> +}
> +
> +static inline unsigned long pfn_to_swp_tb(unsigned long pfn, unsigned int count)
> +{
> +       unsigned long swp_tb;
> +
>         BUILD_BUG_ON(sizeof(unsigned long) != sizeof(void *));
> -       return (unsigned long)folio;
> +       BUILD_BUG_ON(SWAP_CACHE_PFN_BITS >
> +                    (BITS_PER_LONG - SWP_TB_PFN_MARK_BITS - SWP_TB_COUNT_BITS));
> +
> +       swp_tb = (pfn << SWP_TB_PFN_MARK_BITS) | SWP_TB_PFN_MARK;
> +       VM_WARN_ON_ONCE(swp_tb & SWP_TB_COUNT_MASK);
> +
> +       return swp_tb | __count_to_swp_tb(count);
> +}
> +
> +static inline unsigned long folio_to_swp_tb(struct folio *folio, unsigned int count)
> +{
> +       return pfn_to_swp_tb(folio_pfn(folio), count);
>  }
>
> -static inline unsigned long shadow_swp_to_tb(void *shadow)
> +static inline unsigned long shadow_to_swp_tb(void *shadow, unsigned int count)
>  {
>         BUILD_BUG_ON((BITS_PER_XA_VALUE + 1) !=
>                      BITS_PER_BYTE * sizeof(unsigned long));
> +       BUILD_BUG_ON((unsigned long)xa_mk_value(0) != SWP_TB_SHADOW_MARK);
> +
>         VM_WARN_ON_ONCE(shadow && !xa_is_value(shadow));
> -       return (unsigned long)shadow;
> +       VM_WARN_ON_ONCE(shadow && ((unsigned long)shadow & SWP_TB_COUNT_MASK));
> +
> +       return (unsigned long)shadow | __count_to_swp_tb(count) | SWP_TB_SHADOW_MARK;
>  }
>
>  /*
> @@ -59,7 +143,7 @@ static inline bool swp_tb_is_null(unsigned long swp_tb)
>
>  static inline bool swp_tb_is_folio(unsigned long swp_tb)
>  {
> -       return !xa_is_value((void *)swp_tb) && !swp_tb_is_null(swp_tb);
> +       return ((swp_tb & SWP_TB_PFN_MARK_MASK) == SWP_TB_PFN_MARK);
>  }
>
>  static inline bool swp_tb_is_shadow(unsigned long swp_tb)
> @@ -67,19 +151,44 @@ static inline bool swp_tb_is_shadow(unsigned long swp_tb)
>         return xa_is_value((void *)swp_tb);
>  }
>
> +static inline bool swp_tb_is_bad(unsigned long swp_tb)
> +{
> +       return swp_tb == SWP_TB_BAD;
> +}
> +
> +static inline bool swp_tb_is_countable(unsigned long swp_tb)
> +{
> +       return (swp_tb_is_shadow(swp_tb) || swp_tb_is_folio(swp_tb) ||
> +               swp_tb_is_null(swp_tb));
> +}
> +
>  /*
>   * Helpers for retrieving info from swap table.
>   */
>  static inline struct folio *swp_tb_to_folio(unsigned long swp_tb)
>  {
>         VM_WARN_ON(!swp_tb_is_folio(swp_tb));
> -       return (void *)swp_tb;
> +       return pfn_folio((swp_tb & ~SWP_TB_COUNT_MASK) >> SWP_TB_PFN_MARK_BITS);
>  }
>
>  static inline void *swp_tb_to_shadow(unsigned long swp_tb)
>  {
>         VM_WARN_ON(!swp_tb_is_shadow(swp_tb));
> -       return (void *)swp_tb;
> +       /* No shift needed, xa_value is stored as it is in the lower bits. */
> +       return (void *)(swp_tb & ~SWP_TB_COUNT_MASK);
> +}
> +
> +static inline unsigned char __swp_tb_get_count(unsigned long swp_tb)
> +{
> +       VM_WARN_ON(!swp_tb_is_countable(swp_tb));
> +       return ((swp_tb & SWP_TB_COUNT_MASK) >> SWP_TB_COUNT_SHIFT);
> +}
> +
> +static inline int swp_tb_get_count(unsigned long swp_tb)
> +{
> +       if (swp_tb_is_countable(swp_tb))
> +               return __swp_tb_get_count(swp_tb);
> +       return -EINVAL;
>  }
>
>  /*
> @@ -124,6 +233,8 @@ static inline unsigned long swap_table_get(struct swap_cluster_info *ci,
>         atomic_long_t *table;
>         unsigned long swp_tb;
>
> +       VM_WARN_ON_ONCE(off >= SWAPFILE_CLUSTER);
> +
>         rcu_read_lock();
>         table = rcu_dereference(ci->table);
>         swp_tb = table ? atomic_long_read(&table[off]) : null_to_swp_tb();
>
> --
> 2.52.0
>
>


  reply	other threads:[~2026-02-19  7:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-17 20:06 [PATCH v3 00/12] mm, swap: swap table phase III: remove swap_map Kairui Song via B4 Relay
2026-02-17 20:06 ` [PATCH v3 01/12] mm, swap: protect si->swap_file properly and use as a mount indicator Kairui Song via B4 Relay
2026-02-19  6:36   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 02/12] mm, swap: clean up swapon process and locking Kairui Song via B4 Relay
2026-02-19  6:45   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 03/12] mm, swap: remove redundant arguments and locking for enabling a device Kairui Song via B4 Relay
2026-02-19  6:48   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 04/12] mm, swap: consolidate bad slots setup and make it more robust Kairui Song via B4 Relay
2026-02-19  6:51   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 05/12] mm/workingset: leave highest bits empty for anon shadow Kairui Song via B4 Relay
2026-02-19  6:56   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 06/12] mm, swap: implement helpers for reserving data in the swap table Kairui Song via B4 Relay
2026-02-19  7:00   ` Chris Li [this message]
2026-02-17 20:06 ` [PATCH v3 07/12] mm, swap: mark bad slots in swap table directly Kairui Song via B4 Relay
2026-02-19  7:01   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 08/12] mm, swap: simplify swap table sanity range check Kairui Song via B4 Relay
2026-02-19  7:02   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 09/12] mm, swap: use the swap table to track the swap count Kairui Song via B4 Relay
2026-02-18 10:40   ` kernel test robot
2026-02-18 12:22     ` Kairui Song
2026-02-19  7:06       ` Chris Li
2026-02-17 20:06 ` [PATCH v3 10/12] mm, swap: no need to truncate the scan border Kairui Song via B4 Relay
2026-02-19  7:10   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 11/12] mm, swap: simplify checking if a folio is swapped Kairui Song via B4 Relay
2026-02-19  7:18   ` Chris Li
2026-02-17 20:06 ` [PATCH v3 12/12] mm, swap: no need to clear the shadow explicitly Kairui Song via B4 Relay
2026-02-19  7:19   ` Chris Li
2026-02-17 20:10 ` [PATCH v3 00/12] mm, swap: swap table phase III: remove swap_map Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACePvbUpCXTzNUFw1sHRao+pDZDhN9N7YUowwFQ_gOLihZx8+g@mail.gmail.com \
    --to=chrisl@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox