linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, hughd@google.com, yosry.ahmed@linux.dev,
	mhocko@kernel.org, roman.gushchin@linux.dev,
	shakeel.butt@linux.dev, muchun.song@linux.dev,
	len.brown@intel.com, chengming.zhou@linux.dev,
	kasong@tencent.com, chrisl@kernel.org,
	huang.ying.caritas@gmail.com, ryan.roberts@arm.com,
	viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de,
	lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu,
	pavel@kernel.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-pm@vger.kernel.org, peterx@redhat.com, gunho.lee@lge.com,
	taejoon.song@lge.com, iamjoonsoo.kim@lge.com
Subject: Re: [RFC PATCH v2 00/18] Virtual Swap Space
Date: Tue, 3 Jun 2025 00:03:52 +0900	[thread overview]
Message-ID: <aD29WJLms0CJVh8A@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CAKEwX=NFrWyFkyd5XhXEb_qYtWBk4yPUMPPJN0qHPAvzPUq_Dg@mail.gmail.com>

On Sun, Jun 01, 2025 at 02:08:22PM -0700, Nhat Pham wrote:
> On Sun, Jun 1, 2025 at 5:56 AM YoungJun Park <youngjun.park@lge.com> wrote:
> >
> > On Fri, May 30, 2025 at 09:52:42AM -0700, Nhat Pham wrote:
> > > On Thu, May 29, 2025 at 11:47 PM YoungJun Park <youngjun.park@lge.com> wrote:
> > > >
> > > > On Tue, Apr 29, 2025 at 04:38:28PM -0700, Nhat Pham wrote:
> > > > > Changelog:
> > > > > * v2:
> > > > >       * Use a single atomic type (swap_refs) for reference counting
> > > > >         purpose. This brings the size of the swap descriptor from 64 KB
> > > > >         down to 48 KB (25% reduction). Suggested by Yosry Ahmed.
> > > > >       * Zeromap bitmap is removed in the virtual swap implementation.
> > > > >         This saves one bit per phyiscal swapfile slot.
> > > > >       * Rearrange the patches and the code change to make things more
> > > > >         reviewable. Suggested by Johannes Weiner.
> > > > >       * Update the cover letter a bit.
> > > >
> > > > Hi Nhat,
> > > >
> > > > Thank you for sharing this patch series.
> > > > I’ve read through it with great interest.
> > > >
> > > > I’m part of a kernel team working on features related to multi-tier swapping,
> > > > and this patch set appears quite relevant
> > > > to our ongoing discussions and early-stage implementation.
> > >
> > > May I ask - what's the use case you're thinking of here? Remote swapping?
> > >
> >
> > Yes, that's correct.
> > Our usage scenario includes remote swap,
> > and we're experimenting with assigning swap tiers per cgroup
> > in order to improve specific scene of our target device performance.
> 
> Hmm, that can be a start. Right now, we have only 2 swap tiers
> essentially, so memory.(z)swap.max and memory.zswap.writeback is
> usually sufficient to describe the tiering interface. But if you have
> an alternative use case in mind feel free to send a RFC to explore
> this!
>

Yes, sounds good.
I've organized the details of our swap tiering approach 
including the specific use case we are trying to solve.
This approach is based on leveraging 
the existing priority mechanism in the swap subsystem.
I’ll be sharing it as an RFC shortly.
 
> >
> > We’ve explored several approaches and PoCs around this,
> > and in the process of evaluating
> > whether our direction could eventually be aligned
> > with the upstream kernel,
> > I came across your patchset and wanted to ask whether
> > similar efforts have been discussed or attempted before.
> 
> I think it is occasionally touched upon in discussion, but AFAICS
> there has not been really an actual upstream patch to add such an
> interface.
> 
> >
> > > >
> > > > I had a couple of questions regarding the future direction.
> > > >
> > > > > * Multi-tier swapping (as mentioned in [5]), with transparent
> > > > >   transferring (promotion/demotion) of pages across tiers (see [8] and
> > > > >   [9]). Similar to swapoff, with the old design we would need to
> > > > >   perform the expensive page table walk.
> > > >
> > > > Based on the discussion in [5], it seems there was some exploration
> > > > around enabling per-cgroup selection of multiple tiers.
> > > > Do you envision the current design evolving in a similar direction
> > > > to those past discussions, or is there a different direction you're aiming for?
> > >
> > > IIRC, that past design focused on the interface aspect of the problem,
> > > but never actually touched the mechanism to implement a multi-tier
> > > swapping solution.
> > >
> > > The simple reason is it's impossible, or at least highly inefficient
> > > to do it in the current design, i.e without virtualizing swap. Storing
> >
> > As you pointed out, there are certainly inefficiencies
> > in supporting this use case with the current design,
> > but if there is a valid use case,
> > I believe there’s room for it to be supported in the current model
> > —possibly in a less optimized form—
> > until a virtual swap device becomes available
> > and provides a more efficient solution.
> > What do you think about?
> 
> Which less optimized form are you thinking of?
>

Just mentioning current swap design would be less optimized regardless 
of the form of tiering applied.
Not meaninig my approach is less optimized.
That may have come across differently than I intended.
Please feel free to disregard that assumption — 
I believe it would be more appropriate 
to evaluate this based on the RFC I plan to share soon.
 
> >
> > > the physical swap location in PTEs means that changing the swap
> > > backend requires a full page table walk to update all the PTEs that
> > > refer to the old physical swap location. So you have to pick your
> > > poison - either:
> > > 1. Pick your backend at swap out time, and never change it. You might
> > > not have sufficient information to decide at that time. It prevents
> > > you from adapting to the change in workload dynamics and working set -
> > > the access frequency of pages might change, so their physical location
> > > should change accordingly.
> > >
> > > 2. Reserve the space in every tier, and associate them with the same
> > > handle. This is kinda what zswap is doing. It is space efficient, and
> > > create a lot of operational issues in production.
> > >
> > > 3. Bite the bullet and perform the page table walk. This is what
> > > swapoff is doing, basically. Raise your hands if you're excited about
> > > a full page table walk every time you want to evict a page from zswap
> > > to disk swap. Booo.
> > >
> > > This new design will give us an efficient way to perform tier transfer
> > > - you need to figure out how to obtain the right to perform the
> > > transfer (for now, through the swap cache - but you can perhaps
> > > envision some sort of locks), and then you can simply make the change
> > > at the virtual layer.
> > >
> >
> > One idea that comes to mind is whether the backend swap tier for
> > a page could be lazily adjusted at runtime—either reactively
> > or via an explicit interface—before the tier changes.
> > Alternatively, if it's preferable to leave pages untouched
> > when the tier configuration changes at runtime,
> > perhaps we could consider making this behavior configurable as well.
> >
> 
> I don't quite understand - could you expand on this?
>

Regarding your point, 
my understanding was that you were referring
to an immediate migration once a new swap tier is selected at runtime. 
I was suggesting whether a lazy migration approach
—or even skipping migration altogether—might 
be worth considering as alternatives.
I only mentioned it because, from our use case perspective, 
immediate migration is not strictly necessary.

Best regards,
YoungJun Park


      reply	other threads:[~2025-06-02 15:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-29 23:38 Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 01/18] swap: rearrange the swap header file Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 02/18] swapfile: rearrange functions Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 03/18] swapfile: rearrange freeing steps Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 04/18] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 05/18] mm: swap: add a separate type for physical swap slots Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 06/18] mm: create scaffolds for the new virtual swap implementation Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 07/18] mm: swap: zswap: swap cache and zswap support for virtualized swap Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 08/18] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 09/18] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 10/18] swap: manage swap entry lifetime at the virtual swap layer Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 11/18] mm: swap: temporarily disable THP swapin and batched freeing swap Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 12/18] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 13/18] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 14/18] memcg: swap: only charge " Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 15/18] vswap: support THP swapin and batch free_swap_and_cache Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 16/18] swap: simplify swapoff using virtual swap Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 17/18] swapfile: move zeromap setup out of enable_swap_info Nhat Pham
2025-04-29 23:38 ` [RFC PATCH v2 18/18] swapfile: remove zeromap in virtual swap implementation Nhat Pham
2025-04-29 23:51 ` [RFC PATCH v2 00/18] Virtual Swap Space Nhat Pham
2025-05-30  6:47 ` YoungJun Park
2025-05-30 16:52   ` Nhat Pham
2025-05-30 16:54     ` Nhat Pham
2025-06-01 12:56     ` YoungJun Park
2025-06-01 16:14       ` Kairui Song
2025-06-02 15:17         ` YoungJun Park
2025-06-02 18:29         ` Nhat Pham
2025-06-03  9:50           ` Kairui Song
2025-06-01 21:08       ` Nhat Pham
2025-06-02 15:03         ` YoungJun Park [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aD29WJLms0CJVh8A@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=huang.ying.caritas@gmail.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=osalvador@suse.de \
    --cc=pavel@kernel.org \
    --cc=peterx@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=taejoon.song@lge.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox