linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Chris Li <chrisl@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Kairui Song <kasong@tencent.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	gunho.lee@lge.com, taejoon.song@lge.com, austin.kim@lge.com
Subject: Re: [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Sun, 22 Feb 2026 10:16:04 +0900	[thread overview]
Message-ID: <aZpY1FIjYLtLdu5F@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <20260221163043.GA35350@shakeel.butt@linux.dev>

On Sat, Feb 21, 2026 at 09:44:01AM -0800, Shakeel Butt wrote:
> On Fri, Feb 20, 2026 at 10:07:44PM -0800, Chris Li wrote:
> > >
> [...]
> > > >
> > > > I agree that using BPF would provide greater flexibility, allowing control not
> > > > just at the memcg level, but also per-process or for complex workloads.
> > > > (As like orchestrator and node controller)
> > >
> > > Yes it provides the flexibility but that is not the main reason I am pushing for
> > > it. The reason I want you to first try the BPF approach without introducing any
> > > stable interfaces. Show how swap tiers will be used and configured in production
> > 
> > Is that your biggest concern?
> 
> No, that is secondary because I am not seeing the real use-case of
> controlling/partitioning swap devices among sub-workloads. Until that is
> figured out, adding a stable API is not good.
> 
> > Many different ways exist to solve that
> > problem. e.g. We can put a config option protecting it and mark it as
> > experimental. This will unblock the development allow experiment. We
> > can have more people to try it out and give feedback.
> > 
> > > environment and then we can talk if a stable interface is needed. I am still not
> > > convinced that swap tiers need to be controlled hierarchically and the non-root
> > > should be able to control it.
> > 
> > Yes, my company uses a different swap device at different cgroup
> > level. I did ask my coworker to confirm that usage. Control at the non
> > root level is a real need.
> 
> I am assuming you meant Google and particularly Prodkernel team and not
> Android or ChromeOS. Google's prodkernel used to have per-cgroup
> swapfiles exposed through memory.swapfiles (if I remember correctly
> Suleiman implemented this along with ghost swapfiles). Later this was
> deprecated (by Yu Zhao) and global (ghost) swapfiles were being used.
> The memory.swapfiles interface instead of supporting real swapfiles
> started having select options among default, ghost/zswap and real
> (something like that). However such interface was used to just disable
> or enable zswap for a workload and never about hierarchically
> controlling the swap devices (Google prodkernel only have zswap). Has
> something changed?
> 
> > 
> > >
> > > >
> > > > However, I am concerned that this level of freedom might introduce logical
> > > > contradictions, particularly regarding cgroup hierarchy semantics.
> > > >
> > > > For example, BPF might allow a topology that violates hierarchical constraints
> > > > (a concern that was also touched upon during LPC)
> > >
> > > Yes BPF provides more power but it is controlled by admin and admin can shoot
> > > their foot in multiple ways.
> > 
> > I think this swap device control is a very basic need.
> 
> Please explain that very basic need.
> 
> > All your
> > objections to swapping control in the group can equally apply to
> > zswap.writeback. Unlike zswap.writeback, which only control from the
> > zswap behavior. This is a more generic version control swap device
> > other than zswap as well. BTW, I raised that concern about
> > zswap.writeback was not generic enough as swap control was limited
> > when zswap was proposed. We did hold back zswap.writeback. The
> > consensers is interface can be improved as later iterations. So here
> > we are.
> 
> This just motivates me to pushback even harder on adding a new interface
> without a clear use-case.
> 
....

After reading the reply and re-think more of it.

I have a few questions regarding the BPF-first approach you
suggested, if you don't mind. Some of them I am re-asking
because I feel they have not been clearly addressed yet.

- We are in an embedded environment where enabling additional
  kernel compile options is costly. BPF is disabled by
  default in some of our production configurations. From a
  trade-off perspective, does it make sense to enable BPF
  just for swap device control?

- You suggest starting with BPF and discussing a stable
  interface later. I am genuinely curious, are there actual
  precedents where a BPF prototype graduated into a stable
  kernel interface? 

- You raised that stable interfaces are hard to remove. Would
  gating it behind a CONFIG option or marking it experimental
  be an acceptable compromise?

- You already acknowledged the use-case for assigning
  different swap devices to different workloads. Your
  objection is specifically about hierarchical parent-child
  partitioning. If the interface enforced uniform policy
  within a subtree, would that be acceptable?

- We already run a modified kernel with internal swap control
  in production and have real feedback from it. Requiring BPF
  as a prerequisite to gather production experience seems
  unnecessary when we are already doing that.

To be honest, I am having trouble understanding the motivation
behind the BPF-first validation approach. If the real point is
that BPF enables more flexible swap-out policies than any fixed
interface can, that would make much more sense to me. I would
appreciate it if you could share more on this.

Thanks,
Youngjun Park


  reply	other threads:[~2026-02-22  1:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-26  6:52 Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 1/5] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-12  9:07   ` Chris Li
2026-02-13  2:18     ` YoungJun Park
2026-02-13 14:33     ` YoungJun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 2/5] mm: swap: associate swap devices with tiers Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 3/5] mm: memcontrol: add interface for swap tier selection Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 4/5] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2026-02-12  7:37   ` Chris Li
2026-01-26  6:52 ` [RFC PATCH v2 v2 5/5] mm, swap: introduce percpu swap device cache to avoid fragmentation Youngjun Park
2026-02-12  6:12 ` [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2026-02-12  9:22   ` Chris Li
2026-02-13  2:26     ` YoungJun Park
2026-02-13  1:59   ` YoungJun Park
2026-02-12 17:57 ` Nhat Pham
2026-02-12 17:58   ` Nhat Pham
2026-02-13  2:43   ` YoungJun Park
2026-02-12 18:33 ` Shakeel Butt
2026-02-13  3:58   ` YoungJun Park
2026-02-21  3:47     ` Shakeel Butt
2026-02-21  6:07       ` Chris Li
2026-02-21 17:44         ` Shakeel Butt
2026-02-22  1:16           ` YoungJun Park [this message]
2026-02-21 14:30       ` YoungJun Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZpY1FIjYLtLdu5F@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=austin.kim@lge.com \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox