From: YoungJun Park <youngjun.park@lge.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Chris Li <chrisl@kernel.org>,
Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
gunho.lee@lge.com, taejoon.song@lge.com, austin.kim@lge.com
Subject: Re: [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Fri, 27 Feb 2026 11:43:50 +0900 [thread overview]
Message-ID: <aaEE5kdAgcRcBheY@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <aZvX0HZy1PDylL8A@linux.dev>
On Sun, Feb 22, 2026 at 09:56:13PM -0800, Shakeel Butt wrote:
While I await your response on the other thread,
I thought I would answer these questions first :)
> Hi YoungJun,
>
> I see you have sent a separate email on BPF specific questions to which I will
> respond separately, here I will respond to other questions/comments.
>
> On Sat, Feb 21, 2026 at 11:30:59PM +0900, YoungJun Park wrote:
> > On Fri, Feb 20, 2026 at 07:47:22PM -0800, Shakeel Butt wrote:
> [...]
> >
> > > Taking a step back, can you describe your use-case a bit more and share
> > > requirements?
> >
> > Our use case is simple at now.
> > We have two swap devices with different performance
> > characteristics and want to assign different swap devices to different
> > workloads (cgroups).
>
> If you don't mind, can you share a bit more about the cgroup hierarchy structure
> of your deployment. Do you use cgroup v1 or v2 on your production environment?
We are primarily targeting Cgroup v2 at now.
> > >
> > For some background, when I initially proposed this, I suggested allowing
> > per-cgroup swap device priorities so that it could also accommodate the
> > broader scenarios you mentioned. However, since even our own use case
> > does not require reversing swap priorities within a cgroup, we pivoted
> > to the "swap tier" mechanism that Chris proposed.
> >
> > > 1. If more than one device is assign to a workload, do you want to have
> > > some kind of ordering between them for the worklod or do you want option to
> > > have round robin kind of policy?
> >
> > Both. If devices are in the same tier with the same priority, round robin.
> > If they are in the same tier with different priorities, or in different
> > tiers, ordering applies. The current tier structure should be able to
> > satisfy either preference.
>
> I assume this is the same swap priorities as of today, right? You want similar
> priority behavior within a tier.
That is correct; the swap priority behavior remains unchanged.
While this is slightly tangential, I see a potential use case for swap tiers to
improve the current swap device selection logic, which is currently tightly
coupled with priority.
> > > 2. What's the reason to use 'tiers' in the name? Is it similar to memory tiers
> > > and you want promotion/demotion among the tiers?
> >
> > This was originally Chris's idea. I think he explained the rationale
> > well in his reply.
> >
> > > 3. If a workload has multiple swap devices assigned, can you describe the
> > > scenario where such workloads need to partition/divide given devices to their
> > > sub-workloads?
> >
> > One possible scenario is reducing lock contention by partitioning swap
> > devices between parent and child cgroups.
>
> The lock contention is orthogonal (and distraction here).
Understood. It was just a hypothetical scenario where I thought there might be
an additional benefit, but I agree we can set it aside for now.
> >
> > > Let's start with these questions. Please note that I want us to not just look at
> > > the current use-case but brainstorm more future use-cases and then come up with
> > > the solution which is more future proof.
> >
> > We have clear production use cases from both us and Chris, and I also
> > presented a deployment example in the cover letter.
> >
> > I think it is hard to design concretely for future use cases at this
> > point. When those needs become clearer, BPF with its flexibility
> > would be a better fit then. I see BPF as a natural extension path
> > rather than a starting point.
> >
> > For now, guarding the memcg & tier behind a CONFIG option would
> > let us move forward without committing to a stable interface, and
> > we can always pivot to BPF later if needed
>
> I think your use-case is very clear. Before committing to any options, I want us
> to brainstorm all options and gather pros/cons and then make an informed
This relates to my response in the other email thread, and I think it would be
good to discuss it further there.
It seems the concern is that distributing swap devices using the memcg
hierarchy might be seen as over-engineering (overspec) since there isn't a
concrete use case for it yet. I have included a proposal to mitigate this
concern in the other thread.
> decision. Anyways I will respond to your other email (in a day or two).
Sounds good. I look forward to your explanation.
Best regards,
Youngjun Park
prev parent reply other threads:[~2026-02-27 2:43 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-26 6:52 Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 1/5] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-12 9:07 ` Chris Li
2026-02-13 2:18 ` YoungJun Park
2026-02-13 14:33 ` YoungJun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 2/5] mm: swap: associate swap devices with tiers Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 3/5] mm: memcontrol: add interface for swap tier selection Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 4/5] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2026-02-12 7:37 ` Chris Li
2026-01-26 6:52 ` [RFC PATCH v2 v2 5/5] mm, swap: introduce percpu swap device cache to avoid fragmentation Youngjun Park
2026-02-12 6:12 ` [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2026-02-12 9:22 ` Chris Li
2026-02-13 2:26 ` YoungJun Park
2026-02-13 1:59 ` YoungJun Park
2026-02-12 17:57 ` Nhat Pham
2026-02-12 17:58 ` Nhat Pham
2026-02-13 2:43 ` YoungJun Park
2026-02-12 18:33 ` Shakeel Butt
2026-02-13 3:58 ` YoungJun Park
2026-02-21 3:47 ` Shakeel Butt
2026-02-21 6:07 ` Chris Li
2026-02-21 17:44 ` Shakeel Butt
2026-02-22 1:16 ` YoungJun Park
2026-02-21 14:30 ` YoungJun Park
2026-02-23 5:56 ` Shakeel Butt
2026-02-27 2:43 ` YoungJun Park [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaEE5kdAgcRcBheY@yjaykim-PowerEdge-T330 \
--to=youngjun.park@lge.com \
--cc=akpm@linux-foundation.org \
--cc=austin.kim@lge.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox