From: YoungJun Park <youngjun.park@lge.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Chris Li <chrisl@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
gunho.lee@lge.com, taejoon.song@lge.com, austin.kim@lge.com
Subject: Re: [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Wed, 4 Mar 2026 16:27:37 +0900 [thread overview]
Message-ID: <aafe6VXoCLlajp8b@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <aaXM58EsMtbGri2B@linux.dev>
On Mon, Mar 02, 2026 at 01:27:31PM -0800, Shakeel Butt wrote:
>
> Hi YoungJun,
>
> Sorry for the late response.
Hi Shakeel, No problem :)
> On Sun, Feb 22, 2026 at 10:16:04AM +0900, YoungJun Park wrote:
> [...]
>
> Let me summarize our discussion first:
>
> You have a use-case where they have systems running multiple workloads and
> have multiple swap devices. Those swap devices have different performance
> capabilities and they want to restrict/assign swap devices to the workloads. For
> example assigning a low latency SSD swap device to latency sensitive workload
> and slow disk swap to latency tolerant workload. (please correct me if
> I misunderstood something).
Thanks for the summary! Yes, that describes our use case accurately.
> The use-case seems reasonable to me but I have concerns related to adding an
> interface to memory cgroups. Mainly I am not clear how hierarchical semantics on
> such interface would look like. In addition, I think it would be too rigid and
> will be very hard to evolve for future features. To me enabling this
> functionality through BPF would give much more flexibility and will be more
> future proof.
> >
> > After reading the reply and re-think more of it.
> >
> > I have a few questions regarding the BPF-first approach you
> > suggested, if you don't mind. Some of them I am re-asking
> > because I feel they have not been clearly addressed yet.
> >
> > - We are in an embedded environment where enabling additional
> > kernel compile options is costly. BPF is disabled by
> > default in some of our production configurations. From a
> > trade-off perspective, does it make sense to enable BPF
> > just for swap device control?
>
> To me, it is reasonable to enable BPF for environment running multiple
> workloads and having multiple swap devices.
I agree with the value and flexibility of BPF, but there is still some debate
on whether it should be the mandatory prerequisite for this feature.
While I understand your point that BPF is reasonable for complex environments,
requiring BPF solely for this feature in resource-constrained embedded systems
(where it might be disabled by default) is a significant hurdle. This trade-off
is a key reason why I am advocating for the memcg interface.
> >
> > - You suggest starting with BPF and discussing a stable
> > interface later. I am genuinely curious, are there actual
> > precedents where a BPF prototype graduated into a stable
> > kernel interface?
>
> After giving some thought, I think once we have BPF working, adding another
> interface for the same feature would not be an option. So, we have decide
> upfront which route to take.
Yes, effectively, if BPF is applied, it opens up the possibility to control
swap logic at various levels (process unit, cgroup unit, etc.), so it would
likely become a superset of other mechanisms.
However, despite BPF's benefits, I would like to propose proceeding with a more
restricted memcg approach, which I will detail below.
> >
> > - You raised that stable interfaces are hard to remove. Would
> > gating it behind a CONFIG option or marking it experimental
> > be an acceptable compromise?
> I think hiding behind CONFIG options do not really protect against the usage and
> the rule of no API breakage usually apply.
Understood. My suggestion regarding the CONFIG option was to mark it as
'experimental' to allow usage for specific cases like ours while reserving
broader API stabilization for later, but I accept your point about API breakage
rules.
> > - You already acknowledged the use-case for assigning
> > different swap devices to different workloads. Your
> > objection is specifically about hierarchical parent-child
> > partitioning. If the interface enforced uniform policy
> > within a subtree, would that be acceptable?
>
> Let's start with that or maybe comeup with concrete examples on how that would
> look like.
So, just to clarify, are you open to discussing this restricted direction?
To reiterate, this would mean enforcing a uniform policy for all children
within a memcg where the swap tier is configured.
For our use case, this is currently sufficient.
We deal with memcg's tree itself as one workload.
This workload can use its specific swap device selectively.
This is my view.
Chris, would you be okay with proceeding in this direction as a starting point?
> Beside, give a bit more thought on potential future features e.g. demotion and
> reason about how you would incorporate those features.
Regarding demotion (assuming you refer to migration based on swap device
tiers), I don't foresee issues if we apply tiered swap devices per memcg.
In fact, the 'tier' concept was proposed specifically as an abstraction layer
to structure hierarchical swap devices. Since the current direction treats it
as a unified tier view configured by the parent memcg, features like demotion
should fit naturally.
Regarding future extensibility, I would like to add:
1. From the memcg perspective:
Applying memcg in this restricted manner minimizes complexity. While future
expansions (such as complex tier inheritance rules or handling setting
differences between parent and child) will require careful discussion, the
restricted approach avoids immediate conflicts and side effects.
2. The swap tier abstraction itself:
The introduction of swap tiers primarily enables swap device assignment.
However, this abstraction also opens the door for extended use cases such as
inter-tier migration (demotion), round-robin policies between tiers, tier-based
VMA swap, or even per-process swap controls in the future.
I believe this patch set is acceptable as it introduces this foundational
concept and applies a specific use case that does not contradict core memcg
principles.
Best regards,
Youngjun Park
next prev parent reply other threads:[~2026-03-04 7:27 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-26 6:52 Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 1/5] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-12 9:07 ` Chris Li
2026-02-13 2:18 ` YoungJun Park
2026-02-13 14:33 ` YoungJun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 2/5] mm: swap: associate swap devices with tiers Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 3/5] mm: memcontrol: add interface for swap tier selection Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 4/5] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2026-02-12 7:37 ` Chris Li
2026-01-26 6:52 ` [RFC PATCH v2 v2 5/5] mm, swap: introduce percpu swap device cache to avoid fragmentation Youngjun Park
2026-02-12 6:12 ` [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2026-02-12 9:22 ` Chris Li
2026-02-13 2:26 ` YoungJun Park
2026-02-13 1:59 ` YoungJun Park
2026-02-12 17:57 ` Nhat Pham
2026-02-12 17:58 ` Nhat Pham
2026-02-13 2:43 ` YoungJun Park
2026-02-12 18:33 ` Shakeel Butt
2026-02-13 3:58 ` YoungJun Park
2026-02-21 3:47 ` Shakeel Butt
2026-02-21 6:07 ` Chris Li
2026-02-21 17:44 ` Shakeel Butt
2026-02-22 1:16 ` YoungJun Park
2026-03-02 21:27 ` Shakeel Butt
2026-03-04 7:27 ` YoungJun Park [this message]
2026-02-21 14:30 ` YoungJun Park
2026-02-23 5:56 ` Shakeel Butt
2026-02-27 2:43 ` YoungJun Park
2026-03-02 14:50 ` YoungJun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aafe6VXoCLlajp8b@yjaykim-PowerEdge-T330 \
--to=youngjun.park@lge.com \
--cc=akpm@linux-foundation.org \
--cc=austin.kim@lge.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox