Re: [PATCH v5 00/21] Virtual Swap Space

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: YoungJun Park <youngjun.park@lge.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: kasong@tencent.com, Liam.Howlett@oracle.com,
	akpm@linux-foundation.org, apopple@nvidia.com,
	axelrasmussen@google.com, baohua@kernel.org,
	baolin.wang@linux.alibaba.com, bhe@redhat.com, byungchul@sk.com,
	cgroups@vger.kernel.org, chengming.zhou@linux.dev,
	chrisl@kernel.org, corbet@lwn.net, david@kernel.org,
	dev.jain@arm.com, gourry@gourry.net, hannes@cmpxchg.org,
	hughd@google.com, jannh@google.com, joshua.hahnjy@gmail.com,
	lance.yang@linux.dev, lenb@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-pm@vger.kernel.org, lorenzo.stoakes@oracle.com,
	matthew.brost@intel.com, mhocko@suse.com, muchun.song@linux.dev,
	npache@redhat.com, pavel@kernel.org, peterx@redhat.com,
	peterz@infradead.org, pfalcato@suse.de, rafael@kernel.org,
	rakie.kim@sk.com, roman.gushchin@linux.dev, rppt@kernel.org,
	ryan.roberts@arm.com, shakeel.butt@linux.dev,
	shikemeng@huaweicloud.com, surenb@google.com, tglx@kernel.org,
	vbabka@suse.cz, weixugc@google.com, ying.huang@linux.alibaba.com,
	yosry.ahmed@linux.dev, yuanchu@google.com,
	zhengqi.arch@bytedance.com, ziy@nvidia.com, kernel-team@meta.com,
	riel@surriel.com
Subject: Re: [PATCH v5 00/21] Virtual Swap Space
Date: Tue, 14 Apr 2026 11:50:08 +0900	[thread overview]
Message-ID: <ad2rYH9tUPthHFoj@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CAKEwX=NnHxpQKp9qBg2=r_euyjgxw2nHXjbgof3MymHTgJmRAQ@mail.gmail.com>

On Sat, Apr 11, 2026 at 06:40:44PM -0700, Nhat Pham wrote:

Hello Nhat!

> > 1. Modularization
> >
> > You removed CONFIG_* and went with a unified approach. I recall
> > you were also considering a module-based structure at some point.
> > What are your thoughts on that direction?
> >
>
> The CONFIG-based approach was a huge mess. It makes me not want to
> look at the code, and I'm the author :)
>
> > If we take that approach, we could extend the recent swap ops
> > patchset (https://lore.kernel.org/linux-mm/20260302104016.163542-1-bhe@redhat.com/)
> > as follows:
> > - Make vswap a swap module
> > - Have cluster allocation functions reside in swapops
> > - Enable vswap through swapon
>
> Hmmmmm.

I think this would be a happy world, but I wonder what others think.
Anyway, I'm looking forward to the future direction.

> > 2. Flash-friendly swap integration (for my use case)
> >
> > I've been thinking about the flash-friendly swap concept that
> > I mentioned before and recently proposed:
> > (https://lore.kernel.org/linux-mm/aZW0voL4MmnMQlaR@yjaykim-PowerEdge-T330/)
> >
> > One of its core functions requires buffering RAM-swapped pages
> > and writing them sequentially at an appropriate time -- not
> > immediately, but in proper block-sized units, sequentially.
> >
> > This means allocated offsets must essentially be virtual, and
> > physical offsets need to be managed separately at the actual
> > write time.
> >
> > If we integrate this into the current vswap, we would either
> > need vswap itself to handle the sequential writes (bypassing
> > the physical device and receiving pages directly), or swapon
> > a swap device and have vswap obtain physical offsets from it.
> > But since those offsets cannot be used directly (due to
> > buffering and sequential write requirements), they become
> > virtual too, resulting in:
> >
> >   virtual -> virtual -> physical
> >
> > This triple indirection is not ideal.
> >
> > However, if the modularization from point 1 is achieved and
> > vswap acts as a swap device itself, then we can cleanly
> > establish a:
> >
> >   virtual -> physical
>
> I read that thread sometimes ago. Some remarks:
>
> 1. I think Christoph has a point. Seems like some of your ideas ( are
> broadly applicable to swap in general. Maybe fixing swap infra
> generally would make a lot of sense?

Broadly speaking, there are two main ideas:
1. Swap I/O buffering (which is also tied to cluster management issues)
2. Deduplication

Are you leaning towards the view that these two should be placed in a
higher layer?

> 2. Why do we need to do two virtual layers here? For example, If you
> want to buffer multiple swap outs and turn them into a sequential
> request, you can:
>
> a. Allocate virtual swap space for them as you wish. They don't even
> need to be sequential.
>
> b. At swap_writeout() time, don't allocate physical swap space for
> them right away. Instead, accumulate them into a buffer. You can add a
> new virtual swap entry type to flag it if necessary.
>
> c. Once that buffer reaches a certain size, you can now allocate
> contiguous physical swap space for them. Then flush etc. You can flush
> at swap_writeout() time, or use a dedicated threads etc.

I initially thought implementing this in vswap would be complicated
(due to the ripple effects of altering behavior at swap_writeout timing),
but it seems entirely possible!

1. We could change the behavior (e.g., buffering) at vswap_alloc_swap_slot
   timing by checking things like the si type.
2. Additionally, if we can handle the cluster data structures and
   mechanisms in the swap_info_struct privately, a virtual-to-physical
   one-direction approach seems feasible.
   (Come to think of it, it might be better to refactor the infra to let
   other modules handle this, potentially removing the swap_info_struct
   mechanism entirely. Just imagination ;) )

> Deduplication sounds like something that should live at a lower layer
> - I was thinking about it for zswap/zsmalloc back then. I mean, I
> assume you don't want content sharing across different swap media? :)
> Something along the line of:
>
> 1. Maintain an content index for swapped out pages.
>
> 2. For the swap media that support deduplication, you'll need to add
> some sort of reference count (more overhead ew).
>
> 3. Each time we swapped out, we can content-check to see if the same
> piece of conent has been swapped out before. If so, set the vswap
> backend to the physical location of the data, increment some sort of
> reference count (perhaps we can use swap count) of the older entry,
> and have the swap type point to it.

As for reference count management, applying it loosely might be a good
approach. Instead of strictly managing the lifecycle of the dedup contents
with refcounts, we could just periodically clean up the hash. This also
has the benefit of reducing I/O for the same swap content compared to
deleting it immediately.

> But have you considered the implications of sharing swap data like
> this? I need to read the paper you cite - seems like a potential fun
> read. But what happen when these two pages that share the content
> belong to two different cgroups? How does the
> charging/uncharging/charge transferring story work? That's one of the
> things that made me pause when I wanted to implement deduplication for
> zswap/zsmalloc. Zram does not charge memory towards cgroup, but zswap
> does, so we'll need to handle this somehow, and at that point all the
> complexity might no longer be worth it.

Since our private swap device is similar to ZRAM, I hadn't considered
the charging aspect. It is indeed a complex issue.

If it goes into ZSWAP, there would definitely be a clear advantage of
seeing dedup benefits across all swap devices. It's a technically
interesting area, and I'd like to discuss it in a separate thread if
I have more ideas or thoughts.

Just a thought that comes to mind here: if vswap becomes modularized,
how about doing memcg charging for this entire area? 
(Come to think of it, to fully benefit from vswap modularization,
zswap should also be applied within its scope.)

Best regards,
Youngjun Park

next prev parent reply	other threads:[~2026-04-14  2:50 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 19:27 Nhat Pham
2026-03-20 19:27 ` [PATCH v5 01/21] mm/swap: decouple swap cache from physical swap infrastructure Nhat Pham
2026-03-20 19:27 ` [PATCH v5 02/21] swap: rearrange the swap header file Nhat Pham
2026-03-20 19:27 ` [PATCH v5 03/21] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2026-03-20 19:27 ` [PATCH v5 04/21] zswap: add new helpers for zswap entry operations Nhat Pham
2026-03-20 19:27 ` [PATCH v5 05/21] mm/swap: add a new function to check if a swap entry is in swap cached Nhat Pham
2026-03-20 19:27 ` [PATCH v5 06/21] mm: swap: add a separate type for physical swap slots Nhat Pham
2026-03-20 19:27 ` [PATCH v5 07/21] mm: create scaffolds for the new virtual swap implementation Nhat Pham
2026-03-20 19:27 ` [PATCH v5 08/21] zswap: prepare zswap for swap virtualization Nhat Pham
2026-03-20 19:27 ` [PATCH v5 09/21] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2026-03-20 19:27 ` [PATCH v5 10/21] swap: move swap cache to virtual swap descriptor Nhat Pham
2026-03-20 19:27 ` [PATCH v5 11/21] zswap: move zswap entry management to the " Nhat Pham
2026-03-20 19:27 ` [PATCH v5 12/21] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2026-03-20 19:27 ` [PATCH v5 13/21] swap: manage swap entry lifecycle at the virtual swap layer Nhat Pham
2026-03-20 19:27 ` [PATCH v5 14/21] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2026-03-20 19:27 ` [PATCH v5 15/21] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2026-03-20 19:27 ` [PATCH v5 16/21] swap: do not unnecesarily pin readahead swap entries Nhat Pham
2026-03-20 19:27 ` [PATCH v5 17/21] swapfile: remove zeromap bitmap Nhat Pham
2026-03-20 19:27 ` [PATCH v5 18/21] memcg: swap: only charge physical swap slots Nhat Pham
2026-03-20 19:27 ` [PATCH v5 19/21] swap: simplify swapoff using virtual swap Nhat Pham
2026-03-20 19:27 ` [PATCH v5 20/21] swapfile: replace the swap map with bitmaps Nhat Pham
2026-03-20 19:27 ` [PATCH v5 21/21] vswap: batch contiguous vswap free calls Nhat Pham
2026-03-21 18:22 ` [PATCH v5 00/21] Virtual Swap Space Andrew Morton
2026-03-22  2:18   ` Roman Gushchin
     [not found] ` <CAMgjq7AiUr_Ntj51qoqvV+=XbEATjr7S4MH+rgD32T5pHfF7mg@mail.gmail.com>
2026-03-23 15:32   ` Nhat Pham
2026-03-23 16:40     ` Kairui Song
2026-03-23 20:05       ` Nhat Pham
2026-03-25 18:53     ` YoungJun Park
2026-04-12  1:03       ` Nhat Pham
2026-04-14  3:09         ` YoungJun Park
2026-03-24 13:19 ` Askar Safin
2026-03-24 17:23   ` Nhat Pham
2026-03-25  2:35     ` Askar Safin
2026-03-25 18:36 ` YoungJun Park
2026-04-12  1:40   ` Nhat Pham
2026-04-14  2:50     ` YoungJun Park [this message]
2026-04-14  3:28       ` Kairui Song
2026-04-14  7:52     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad2rYH9tUPthHFoj@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=byungchul@sk.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=lance.yang@linux.dev \
    --cc=lenb@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=npache@redhat.com \
    --cc=nphamcs@gmail.com \
    --cc=pavel@kernel.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pfalcato@suse.de \
    --cc=rafael@kernel.org \
    --cc=rakie.kim@sk.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=tglx@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox