linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Chris Li <chrisl@kernel.org>,
	Kairui Song <kasong@tencent.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	gunho.lee@lge.com, taejoon.song@lge.com, austin.kim@lge.com
Subject: Re: [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Fri, 27 Feb 2026 11:43:50 +0900	[thread overview]
Message-ID: <aaEE5kdAgcRcBheY@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <aZvX0HZy1PDylL8A@linux.dev>

On Sun, Feb 22, 2026 at 09:56:13PM -0800, Shakeel Butt wrote:

While I await your response on the other thread, 
I thought I would answer these questions first :)

> Hi YoungJun,
> 
> I see you have sent a separate email on BPF specific questions to which I will
> respond separately, here I will respond to other questions/comments.
> 
> On Sat, Feb 21, 2026 at 11:30:59PM +0900, YoungJun Park wrote:
> > On Fri, Feb 20, 2026 at 07:47:22PM -0800, Shakeel Butt wrote:
> [...]
> > 
> > > Taking a step back, can you describe your use-case a bit more and share
> > > requirements?
> > 
> > Our use case is simple at now. 
> > We have two swap devices with different performance
> > characteristics and want to assign different swap devices to different
> > workloads (cgroups).
> 
> If you don't mind, can you share a bit more about the cgroup hierarchy structure
> of your deployment. Do you use cgroup v1 or v2 on your production environment?

We are primarily targeting Cgroup v2 at now.

> > >
> > For some background, when I initially proposed this, I suggested allowing
> > per-cgroup swap device priorities so that it could also accommodate the
> > broader scenarios you mentioned. However, since even our own use case
> > does not require reversing swap priorities within a cgroup, we pivoted
> > to the "swap tier" mechanism that Chris proposed.
> > 
> > > 1. If more than one device is assign to a workload, do you want to have
> > >    some kind of ordering between them for the worklod or do you want option to
> > >    have round robin kind of policy?
> > 
> > Both. If devices are in the same tier with the same priority, round robin.
> > If they are in the same tier with different priorities, or in different
> > tiers, ordering applies. The current tier structure should be able to
> > satisfy either preference.
> 
> I assume this is the same swap priorities as of today, right? You want similar
> priority behavior within a tier.

That is correct; the swap priority behavior remains unchanged. 

While this is slightly tangential, I see a potential use case for swap tiers to
improve the current swap device selection logic, which is currently tightly
coupled with priority.

> > > 2. What's the reason to use 'tiers' in the name? Is it similar to memory tiers
> > >    and you want promotion/demotion among the tiers?
> > 
> > This was originally Chris's idea. I think he explained the rationale
> > well in his reply.
> > 
> > > 3. If a workload has multiple swap devices assigned, can you describe the
> > >    scenario where such workloads need to partition/divide given devices to their
> > >    sub-workloads?
> > 
> > One possible scenario is reducing lock contention by partitioning swap
> > devices between parent and child cgroups.
> 
> The lock contention is orthogonal (and distraction here).

Understood. It was just a hypothetical scenario where I thought there might be
an additional benefit, but I agree we can set it aside for now.

> > 
> > > Let's start with these questions. Please note that I want us to not just look at
> > > the current use-case but brainstorm more future use-cases and then come up with
> > > the solution which is more future proof.
> > 
> > We have clear production use cases from both us and Chris, and I also
> > presented a deployment example in the cover letter.
> > 
> > I think it is hard to design concretely for future use cases at this
> > point. When those needs become clearer, BPF with its flexibility
> > would be a better fit then. I see BPF as a natural extension path
> > rather than a starting point.
> > 
> > For now, guarding the memcg & tier behind a CONFIG option would
> > let us move forward without committing to a stable interface, and
> > we can always pivot to BPF later if needed
> 
> I think your use-case is very clear. Before committing to any options, I want us
> to brainstorm all options and gather pros/cons and then make an informed

This relates to my response in the other email thread, and I think it would be
good to discuss it further there.

It seems the concern is that distributing swap devices using the memcg
hierarchy might be seen as over-engineering (overspec) since there isn't a
concrete use case for it yet. I have included a proposal to mitigate this
concern in the other thread.

> decision. Anyways I will respond to your other email (in a day or two).

Sounds good. I look forward to your explanation.

Best regards,
Youngjun Park


      reply	other threads:[~2026-02-27  2:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-26  6:52 Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 1/5] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-12  9:07   ` Chris Li
2026-02-13  2:18     ` YoungJun Park
2026-02-13 14:33     ` YoungJun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 2/5] mm: swap: associate swap devices with tiers Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 3/5] mm: memcontrol: add interface for swap tier selection Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 4/5] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2026-02-12  7:37   ` Chris Li
2026-01-26  6:52 ` [RFC PATCH v2 v2 5/5] mm, swap: introduce percpu swap device cache to avoid fragmentation Youngjun Park
2026-02-12  6:12 ` [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2026-02-12  9:22   ` Chris Li
2026-02-13  2:26     ` YoungJun Park
2026-02-13  1:59   ` YoungJun Park
2026-02-12 17:57 ` Nhat Pham
2026-02-12 17:58   ` Nhat Pham
2026-02-13  2:43   ` YoungJun Park
2026-02-12 18:33 ` Shakeel Butt
2026-02-13  3:58   ` YoungJun Park
2026-02-21  3:47     ` Shakeel Butt
2026-02-21  6:07       ` Chris Li
2026-02-21 17:44         ` Shakeel Butt
2026-02-22  1:16           ` YoungJun Park
2026-02-21 14:30       ` YoungJun Park
2026-02-23  5:56         ` Shakeel Butt
2026-02-27  2:43           ` YoungJun Park [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaEE5kdAgcRcBheY@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=austin.kim@lge.com \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox