From: Shakeel Butt <shakeel.butt@linux.dev>
To: YoungJun Park <youngjun.park@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Chris Li <chrisl@kernel.org>,
Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
gunho.lee@lge.com, taejoon.song@lge.com, austin.kim@lge.com
Subject: Re: [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Fri, 20 Feb 2026 19:47:22 -0800 [thread overview]
Message-ID: <aZjxP2sTavBRGC1l@linux.dev> (raw)
In-Reply-To: <aY6hcPNxiolf5jj6@yjaykim-PowerEdge-T330>
Please don't send a new version of the series before concluding the discussion
on the previous one.
On Fri, Feb 13, 2026 at 12:58:40PM +0900, YoungJun Park wrote:
> >
> > One of the LPC feedback you missed is to not add memcg interface for
> > this functionality and explore BPF way instead.
> >
> > We are normally very conservative to add new interfaces to cgroup.
> > However I am not even convinced that memcg interface is the right way to
> > expose this functionality. Swap is currently global and the idea to
> > limit or assign specific swap devices to specific cgroups makes sense
> > but that is the decision for the job orchestator or node controller.
> > Allowing workloads to pick and choose swap devices do not make sense to
> > me.
>
> Apologies for overlooking the feedback regarding the BPF approach. Thank you
> for the suggestion.
No need for apologies. These things take time and multiple iterations.
>
> I agree that using BPF would provide greater flexibility, allowing control not
> just at the memcg level, but also per-process or for complex workloads.
> (As like orchestrator and node controller)
Yes it provides the flexibility but that is not the main reason I am pushing for
it. The reason I want you to first try the BPF approach without introducing any
stable interfaces. Show how swap tiers will be used and configured in production
environment and then we can talk if a stable interface is needed. I am still not
convinced that swap tiers need to be controlled hierarchically and the non-root
should be able to control it.
>
> However, I am concerned that this level of freedom might introduce logical
> contradictions, particularly regarding cgroup hierarchy semantics.
>
> For example, BPF might allow a topology that violates hierarchical constraints
> (a concern that was also touched upon during LPC)
Yes BPF provides more power but it is controlled by admin and admin can shoot
their foot in multiple ways.
>
> - Group A (Parent): Assigned to SSD1
> - Group B (Child of A): Assigned to SSD2
>
> If Group A has a `memory.swap.max` limit, and Group B swaps out to SSD2, it
> creates a consistency issue. Group B consumes Group A's swap quota, but it is
> utilizing a device (SSD2) that is distinct from the Parent's assignment. This
> could lead to situations where the Parent's limit is exhausted by usage on a
> device it effectively doesn't "own" or shouldn't be using.
>
> One might suggest restricting BPF to strictly adhere to these hierarchical
> constraints.
No need to constraint anything.
Taking a step back, can you describe your use-case a bit more and share
requirements?
You have multiple swap devices of different properties and you want to assign
those swap devices to different workloads. Now couple of questions:
1. If more than one device is assign to a workload, do you want to have
some kind of ordering between them for the worklod or do you want option to
have round robin kind of policy?
2. What's the reason to use 'tiers' in the name? Is it similar to memory tiers
and you want promotion/demotion among the tiers?
3. If a workload has multiple swap devices assigned, can you describe the
scenario where such workloads need to partition/divide given devices to their
sub-workloads?
Let's start with these questions. Please note that I want us to not just look at
the current use-case but brainstorm more future use-cases and then come up with
the solution which is more future proof.
thanks,
Shakeel
next prev parent reply other threads:[~2026-02-21 3:47 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-26 6:52 Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 1/5] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-12 9:07 ` Chris Li
2026-02-13 2:18 ` YoungJun Park
2026-02-13 14:33 ` YoungJun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 2/5] mm: swap: associate swap devices with tiers Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 3/5] mm: memcontrol: add interface for swap tier selection Youngjun Park
2026-01-26 6:52 ` [RFC PATCH v2 v2 4/5] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2026-02-12 7:37 ` Chris Li
2026-01-26 6:52 ` [RFC PATCH v2 v2 5/5] mm, swap: introduce percpu swap device cache to avoid fragmentation Youngjun Park
2026-02-12 6:12 ` [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2026-02-12 9:22 ` Chris Li
2026-02-13 2:26 ` YoungJun Park
2026-02-13 1:59 ` YoungJun Park
2026-02-12 17:57 ` Nhat Pham
2026-02-12 17:58 ` Nhat Pham
2026-02-13 2:43 ` YoungJun Park
2026-02-12 18:33 ` Shakeel Butt
2026-02-13 3:58 ` YoungJun Park
2026-02-21 3:47 ` Shakeel Butt [this message]
2026-02-21 6:07 ` Chris Li
2026-02-21 17:44 ` Shakeel Butt
2026-02-22 1:16 ` YoungJun Park
2026-02-21 14:30 ` YoungJun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZjxP2sTavBRGC1l@linux.dev \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=austin.kim@lge.com \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox