From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, hughd@google.com,
yosry.ahmed@linux.dev, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, len.brown@intel.com,
chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org,
huang.ying.caritas@gmail.com, ryan.roberts@arm.com,
viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de,
lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu,
pavel@kernel.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-pm@vger.kernel.org
Subject: Re: [RFC PATCH 04/14] mm: swap: swap cache support for virtualized swap
Date: Tue, 8 Apr 2025 08:43:54 -0700 [thread overview]
Message-ID: <CAKEwX=MVaS1RdaEePy_QsByGVN36YnKvzKzHPUUQ0woj7aDrKQ@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=PzsdYupTp=0pyHa48PbtmAwUe_JBKjV9N-fLvLwB0SwA@mail.gmail.com>
On Tue, Apr 8, 2025 at 8:34 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Apr 8, 2025 at 8:00 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Mon, Apr 07, 2025 at 04:42:05PM -0700, Nhat Pham wrote:
> > > Currently, the swap cache code assumes that the swap space is of a fixed
> > > size. The virtual swap space is dynamically sized, so the existing
> > > partitioning code cannot be easily reused. A dynamic partitioning is
> > > planned, but for now keep the design simple and just use a flat
> > > swapcache for vswap.
> > >
> > > Since the vswap's implementation has begun to diverge from the old
> > > implementation, we also introduce a new build config
> > > (CONFIG_VIRTUAL_SWAP). Users who do not select this config will get the
> > > old implementation, with no behavioral change.
> > >
> > > Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> > > ---
> > > mm/Kconfig | 13 ++++++++++
> > > mm/swap.h | 22 ++++++++++------
> > > mm/swap_state.c | 68 +++++++++++++++++++++++++++++++++++++++++--------
> > > 3 files changed, 85 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 1b501db06417..1a6acdb64333 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -22,6 +22,19 @@ menuconfig SWAP
> > > used to provide more virtual memory than the actual RAM present
> > > in your computer. If unsure say Y.
> > >
> > > +config VIRTUAL_SWAP
> > > + bool "Swap space virtualization"
> > > + depends on SWAP
> > > + default n
> > > + help
> > > + When this is selected, the kernel is built with the new swap
> > > + design. This will allow us to decouple the swap backends
> > > + (zswap, on-disk swapfile, etc.), and save disk space when we
> > > + use zswap (or the zero-filled swap page optimization).
> > > +
> > > + There might be more lock contentions with heavy swap use, since
> > > + the swap cache is no longer range partitioned.
> > > +
> > > config ZSWAP
> > > bool "Compressed cache for swap pages"
> > > depends on SWAP
> > > diff --git a/mm/swap.h b/mm/swap.h
> > > index d5f8effa8015..06e20b1d79c4 100644
> > > --- a/mm/swap.h
> > > +++ b/mm/swap.h
> > > @@ -22,22 +22,27 @@ void swap_write_unplug(struct swap_iocb *sio);
> > > int swap_writepage(struct page *page, struct writeback_control *wbc);
> > > void __swap_writepage(struct folio *folio, struct writeback_control *wbc);
> > >
> > > -/* linux/mm/swap_state.c */
> > > -/* One swap address space for each 64M swap space */
> > > +/* Return the swap device position of the swap slot. */
> > > +static inline loff_t swap_slot_pos(swp_slot_t slot)
> > > +{
> > > + return ((loff_t)swp_slot_offset(slot)) << PAGE_SHIFT;
> > > +}
> >
> > In the same vein as the previous email, please avoid mixing moves,
> > renames and new code as much as possible. This makes it quite hard to
> > follow what's going on.
> >
> > I think it would be better if you structure the series as follows:
> >
> > 1. Prep patches. Separate patches for moves, renames, new code.
> >
> > 3. mm: vswap
> > - config VIRTUAL_SWAP
> > - mm/vswap.c with skeleton data structures, init/exit, Makefile hookup
> >
> > 4. (temporarily) flatten existing address spaces
> >
> > IMO you can do the swapcache and zswap in one patch
> >
> > 5+. conversion patches
> >
> > Grow mm/vswap.c as you add discrete components like the descriptor
> > allocator, swapoff locking, the swap_cgroup tracker etc.
> >
> > You're mostly doing this part already. But try to order them by
> > complexity and on a "core to periphery" gradient. I.e. swapoff
> > locking should probably come before cgroup stuff.
> >
> > Insert move and rename patches at points where they make the most
> > sense. I.e. if they can be understood in the current upstream code
> > already, put them with step 1 prep patches. If you find a move or a
> > rename can only be understood in the context of one of the components,
> > put them in a prep patch right before that one.
>
> Makes sense, yeah! I'll try to avoid mixing moves/renames/new code as
> much as I can.
>
> >
> > > @@ -260,6 +269,28 @@ void delete_from_swap_cache(struct folio *folio)
> > > folio_ref_sub(folio, folio_nr_pages(folio));
> > > }
> > >
> > > +#ifdef CONFIG_VIRTUAL_SWAP
> > > +void clear_shadow_from_swap_cache(int type, unsigned long begin,
> > > + unsigned long end)
> > > +{
> > > + swp_slot_t slot = swp_slot(type, begin);
> > > + swp_entry_t entry = swp_slot_to_swp_entry(slot);
> > > + unsigned long index = swap_cache_index(entry);
> > > + struct address_space *address_space = swap_address_space(entry);
> > > + void *old;
> > > + XA_STATE(xas, &address_space->i_pages, index);
> > > +
> > > + xas_set_update(&xas, workingset_update_node);
> > > +
> > > + xa_lock_irq(&address_space->i_pages);
> > > + xas_for_each(&xas, old, entry.val + end - begin) {
> > > + if (!xa_is_value(old))
> > > + continue;
> > > + xas_store(&xas, NULL);
> > > + }
> > > + xa_unlock_irq(&address_space->i_pages);
> >
> > I don't think you need separate functions for this, init, exit etc. if
> > you tweak the macros to resolve to one tree. The current code already
> > works if swapfiles are smaller than SWAP_ADDRESS_SPACE_PAGES and there
> > is only one tree, after all.
>
> For clear_shadow_from_swap_cache(), I think I understand what you want
> - keep clear_shadow_from_swap_cache() the same for two
> implementations, but at caller sites, have the callers themselves
> determine the range in swap cache (i.e (begin, end)).
>
> I'm a bit confused with init and exit, but I assume there is a way to
> do it for them as well.
>
> I will note though, that it might increase the number of ifdefs
> sections (or alternatively, IS_ENABLED() checks), because these
> functions are called in different contexts for the two
> implementations:
>
> 1. init and exit are called in swapon/swapoff in the old
> implementation. They are called in swap initialization in the virtual
> swap implementation.
>
> 2. Similarly, we clear swap cache shadows when we free physical swap
> slots in the old implementation, and when we free virtual swap slots
> in the new implementation,
>
> I think it is good actually, because it makes the difference explicit
> rather than implicit. Also, it helps us know exactly which code block
> to target when we unify the two implementations :) Just putting it out
> there.
Actually, I think I was confused.
At this stage, we have no real difference in the implementations yet -
it's purely single tree vs multiple trees. So you're right - we
shouldn't even need two implementations of the code...
I'll fix this.
>
> >
> > This would save a lot of duplication and keep ifdefs more confined.
next prev parent reply other threads:[~2025-04-08 15:44 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-07 23:42 [RFC PATCH 00/14] Virtual Swap Space Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 01/14] swapfile: rearrange functions Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 02/14] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 03/14] mm: swap: add a separate type for physical swap slots Nhat Pham
2025-04-08 14:15 ` Johannes Weiner
2025-04-08 15:11 ` Nhat Pham
2025-04-22 14:41 ` Yosry Ahmed
[not found] ` <6807ab09.670a0220.152ca3.502fSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 15:50 ` Nhat Pham
2025-04-22 18:50 ` Kairui Song
2025-04-07 23:42 ` [RFC PATCH 04/14] mm: swap: swap cache support for virtualized swap Nhat Pham
2025-04-08 15:00 ` Johannes Weiner
2025-04-08 15:34 ` Nhat Pham
2025-04-08 15:43 ` Nhat Pham [this message]
2025-04-07 23:42 ` [RFC PATCH 05/14] zswap: unify zswap tree " Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 06/14] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 07/14] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 08/14] swap: manage swap entry lifetime at the virtual swap layer Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 09/14] swap: implement locking out swapoff using virtual swap slot Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 10/14] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 11/14] memcg: swap: only charge physical swap slots Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 12/14] vswap: support THP swapin and batch free_swap_and_cache Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 13/14] swap: simplify swapoff using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 14/14] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2025-04-08 13:04 ` [RFC PATCH 00/14] Virtual Swap Space Usama Arif
2025-04-08 15:20 ` Nhat Pham
2025-04-08 15:45 ` Johannes Weiner
2025-04-08 16:25 ` Nhat Pham
2025-04-08 16:27 ` Nhat Pham
2025-04-08 16:22 ` Kairui Song
2025-04-08 16:47 ` Nhat Pham
2025-04-08 16:59 ` Kairui Song
2025-04-22 14:43 ` Yosry Ahmed
2025-04-22 14:56 ` Yosry Ahmed
[not found] ` <6807afd0.a70a0220.2ae8b9.e07cSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 17:15 ` Nhat Pham
2025-04-22 19:29 ` Nhat Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKEwX=MVaS1RdaEePy_QsByGVN36YnKvzKzHPUUQ0woj7aDrKQ@mail.gmail.com' \
--to=nphamcs@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=christophe.leroy@csgroup.eu \
--cc=hannes@cmpxchg.org \
--cc=huang.ying.caritas@gmail.com \
--cc=hughd@google.com \
--cc=kasong@tencent.com \
--cc=kernel-team@meta.com \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pavel@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox