linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hughd@google.com,
	 yosry.ahmed@linux.dev, mhocko@kernel.org,
	roman.gushchin@linux.dev,  shakeel.butt@linux.dev,
	muchun.song@linux.dev, len.brown@intel.com,
	 chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org,
	 huang.ying.caritas@gmail.com, ryan.roberts@arm.com,
	viro@zeniv.linux.org.uk,  baohua@kernel.org, osalvador@suse.de,
	lorenzo.stoakes@oracle.com,  christophe.leroy@csgroup.eu,
	pavel@kernel.org, kernel-team@meta.com,
	 linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	 linux-pm@vger.kernel.org
Subject: Re: [RFC PATCH 04/14] mm: swap: swap cache support for virtualized swap
Date: Tue, 8 Apr 2025 08:43:54 -0700	[thread overview]
Message-ID: <CAKEwX=MVaS1RdaEePy_QsByGVN36YnKvzKzHPUUQ0woj7aDrKQ@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=PzsdYupTp=0pyHa48PbtmAwUe_JBKjV9N-fLvLwB0SwA@mail.gmail.com>

On Tue, Apr 8, 2025 at 8:34 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Apr 8, 2025 at 8:00 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Mon, Apr 07, 2025 at 04:42:05PM -0700, Nhat Pham wrote:
> > > Currently, the swap cache code assumes that the swap space is of a fixed
> > > size. The virtual swap space is dynamically sized, so the existing
> > > partitioning code cannot be easily reused.  A dynamic partitioning is
> > > planned, but for now keep the design simple and just use a flat
> > > swapcache for vswap.
> > >
> > > Since the vswap's implementation has begun to diverge from the old
> > > implementation, we also introduce a new build config
> > > (CONFIG_VIRTUAL_SWAP). Users who do not select this config will get the
> > > old implementation, with no behavioral change.
> > >
> > > Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> > > ---
> > >  mm/Kconfig      | 13 ++++++++++
> > >  mm/swap.h       | 22 ++++++++++------
> > >  mm/swap_state.c | 68 +++++++++++++++++++++++++++++++++++++++++--------
> > >  3 files changed, 85 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 1b501db06417..1a6acdb64333 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -22,6 +22,19 @@ menuconfig SWAP
> > >         used to provide more virtual memory than the actual RAM present
> > >         in your computer.  If unsure say Y.
> > >
> > > +config VIRTUAL_SWAP
> > > +     bool "Swap space virtualization"
> > > +     depends on SWAP
> > > +     default n
> > > +     help
> > > +             When this is selected, the kernel is built with the new swap
> > > +             design. This will allow us to decouple the swap backends
> > > +             (zswap, on-disk swapfile, etc.), and save disk space when we
> > > +             use zswap (or the zero-filled swap page optimization).
> > > +
> > > +             There might be more lock contentions with heavy swap use, since
> > > +             the swap cache is no longer range partitioned.
> > > +
> > >  config ZSWAP
> > >       bool "Compressed cache for swap pages"
> > >       depends on SWAP
> > > diff --git a/mm/swap.h b/mm/swap.h
> > > index d5f8effa8015..06e20b1d79c4 100644
> > > --- a/mm/swap.h
> > > +++ b/mm/swap.h
> > > @@ -22,22 +22,27 @@ void swap_write_unplug(struct swap_iocb *sio);
> > >  int swap_writepage(struct page *page, struct writeback_control *wbc);
> > >  void __swap_writepage(struct folio *folio, struct writeback_control *wbc);
> > >
> > > -/* linux/mm/swap_state.c */
> > > -/* One swap address space for each 64M swap space */
> > > +/* Return the swap device position of the swap slot. */
> > > +static inline loff_t swap_slot_pos(swp_slot_t slot)
> > > +{
> > > +     return ((loff_t)swp_slot_offset(slot)) << PAGE_SHIFT;
> > > +}
> >
> > In the same vein as the previous email, please avoid mixing moves,
> > renames and new code as much as possible. This makes it quite hard to
> > follow what's going on.
> >
> > I think it would be better if you structure the series as follows:
> >
> > 1. Prep patches. Separate patches for moves, renames, new code.
> >
> > 3. mm: vswap
> >    - config VIRTUAL_SWAP
> >    - mm/vswap.c with skeleton data structures, init/exit, Makefile hookup
> >
> > 4. (temporarily) flatten existing address spaces
> >
> >    IMO you can do the swapcache and zswap in one patch
> >
> > 5+. conversion patches
> >
> >     Grow mm/vswap.c as you add discrete components like the descriptor
> >     allocator, swapoff locking, the swap_cgroup tracker etc.
> >
> >     You're mostly doing this part already. But try to order them by
> >     complexity and on a "core to periphery" gradient. I.e. swapoff
> >     locking should probably come before cgroup stuff.
> >
> > Insert move and rename patches at points where they make the most
> > sense. I.e. if they can be understood in the current upstream code
> > already, put them with step 1 prep patches. If you find a move or a
> > rename can only be understood in the context of one of the components,
> > put them in a prep patch right before that one.
>
> Makes sense, yeah! I'll try to avoid mixing moves/renames/new code as
> much as I can.
>
> >
> > > @@ -260,6 +269,28 @@ void delete_from_swap_cache(struct folio *folio)
> > >       folio_ref_sub(folio, folio_nr_pages(folio));
> > >  }
> > >
> > > +#ifdef CONFIG_VIRTUAL_SWAP
> > > +void clear_shadow_from_swap_cache(int type, unsigned long begin,
> > > +                             unsigned long end)
> > > +{
> > > +     swp_slot_t slot = swp_slot(type, begin);
> > > +     swp_entry_t entry = swp_slot_to_swp_entry(slot);
> > > +     unsigned long index = swap_cache_index(entry);
> > > +     struct address_space *address_space = swap_address_space(entry);
> > > +     void *old;
> > > +     XA_STATE(xas, &address_space->i_pages, index);
> > > +
> > > +     xas_set_update(&xas, workingset_update_node);
> > > +
> > > +     xa_lock_irq(&address_space->i_pages);
> > > +     xas_for_each(&xas, old, entry.val + end - begin) {
> > > +             if (!xa_is_value(old))
> > > +                     continue;
> > > +             xas_store(&xas, NULL);
> > > +     }
> > > +     xa_unlock_irq(&address_space->i_pages);
> >
> > I don't think you need separate functions for this, init, exit etc. if
> > you tweak the macros to resolve to one tree. The current code already
> > works if swapfiles are smaller than SWAP_ADDRESS_SPACE_PAGES and there
> > is only one tree, after all.
>
> For clear_shadow_from_swap_cache(), I think I understand what you want
> - keep clear_shadow_from_swap_cache() the same for two
> implementations, but at caller sites, have the callers themselves
> determine the range in swap cache (i.e (begin, end)).
>
> I'm a bit confused with init and exit, but I assume there is a way to
> do it for them as well.
>
> I will note though, that it might increase the number of ifdefs
> sections (or alternatively, IS_ENABLED() checks), because these
> functions are called in different contexts for the two
> implementations:
>
> 1. init and exit are called in swapon/swapoff in the old
> implementation. They are called in swap initialization in the virtual
> swap implementation.
>
> 2. Similarly, we clear swap cache shadows when we free physical swap
> slots in the old implementation, and when we free virtual swap slots
> in the new implementation,
>
> I think it is good actually, because it makes the difference explicit
> rather than implicit. Also, it helps us know exactly which code block
> to target when we unify the two implementations :) Just putting it out
> there.

Actually, I think I was confused.

At this stage, we have no real difference in the implementations yet -
it's purely single tree vs multiple trees. So you're right - we
shouldn't even need two implementations of the code...

I'll fix this.

>
> >
> > This would save a lot of duplication and keep ifdefs more confined.


  reply	other threads:[~2025-04-08 15:44 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 23:42 [RFC PATCH 00/14] Virtual Swap Space Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 01/14] swapfile: rearrange functions Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 02/14] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 03/14] mm: swap: add a separate type for physical swap slots Nhat Pham
2025-04-08 14:15   ` Johannes Weiner
2025-04-08 15:11     ` Nhat Pham
2025-04-22 14:41     ` Yosry Ahmed
     [not found]     ` <6807ab09.670a0220.152ca3.502fSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 15:50       ` Nhat Pham
2025-04-22 18:50         ` Kairui Song
2025-04-07 23:42 ` [RFC PATCH 04/14] mm: swap: swap cache support for virtualized swap Nhat Pham
2025-04-08 15:00   ` Johannes Weiner
2025-04-08 15:34     ` Nhat Pham
2025-04-08 15:43       ` Nhat Pham [this message]
2025-04-07 23:42 ` [RFC PATCH 05/14] zswap: unify zswap tree " Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 06/14] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 07/14] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 08/14] swap: manage swap entry lifetime at the virtual swap layer Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 09/14] swap: implement locking out swapoff using virtual swap slot Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 10/14] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 11/14] memcg: swap: only charge physical swap slots Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 12/14] vswap: support THP swapin and batch free_swap_and_cache Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 13/14] swap: simplify swapoff using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 14/14] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2025-04-08 13:04 ` [RFC PATCH 00/14] Virtual Swap Space Usama Arif
2025-04-08 15:20   ` Nhat Pham
2025-04-08 15:45   ` Johannes Weiner
2025-04-08 16:25     ` Nhat Pham
2025-04-08 16:27       ` Nhat Pham
2025-04-08 16:22 ` Kairui Song
2025-04-08 16:47   ` Nhat Pham
2025-04-08 16:59     ` Kairui Song
2025-04-22 14:43       ` Yosry Ahmed
2025-04-22 14:56 ` Yosry Ahmed
     [not found] ` <6807afd0.a70a0220.2ae8b9.e07cSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 17:15   ` Nhat Pham
2025-04-22 19:29     ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=MVaS1RdaEePy_QsByGVN36YnKvzKzHPUUQ0woj7aDrKQ@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=hannes@cmpxchg.org \
    --cc=huang.ying.caritas@gmail.com \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=pavel@kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox