linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Chris Li <chrisl@kernel.org>,
	Kairui Song <kasong@tencent.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	gunho.lee@lge.com, taejoon.song@lge.com, austin.kim@lge.com
Subject: Re: [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control
Date: Sat, 21 Feb 2026 23:30:59 +0900	[thread overview]
Message-ID: <aZnBo+P3ifskts9J@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <aZjxP2sTavBRGC1l@linux.dev>

On Fri, Feb 20, 2026 at 07:47:22PM -0800, Shakeel Butt wrote:
> Please don't send a new version of the series before concluding the discussion
> on the previous one.

Understood. Let's continue the discussion. :D

Chris has already provided a thorough response, but I would like to
add my perspective as well.

> Yes it provides the flexibility but that is not the main reason I am pushing for
> it. The reason I want you to first try the BPF approach without introducing any
> stable interfaces. Show how swap tiers will be used and configured in production
> environment and then we can talk if a stable interface is needed.

I understand your concern about committing to a stable interface too
early. As Chris suggested, we could reduce this concern by guarding
the interface behind a build-time config option or marking it as
experimental, which I will also touch on further below.

On that note, if BPF were to become the primary control mechanism,
I am not sure a memcg interface would still be needed at all, since
BPF already provides a high degree of freedom. However, that level
of freedom is also what concerns me -- BPF-driven swap device
assignments could subtly conflict with memcg hierarchy semantics in
ways that are hard to predict or debug. A more constrained memcg-based
approach might actually be safer in that regard.

> I am still not convinced that swap tiers need to be controlled
> hierarchically and the non-root should be able to control it.

I think this concern is closely tied to your question #3 below about
concrete use cases for partitioning devices across sub-workloads.
I hope my answer there helps clarify this.

> Yes BPF provides more power but it is controlled by admin and admin can shoot
> their foot in multiple ways.

As I mentioned above, I think guarding the feature behind a build-time
config or runtime constraints could keep the usage well-defined and
predictable, while still being useful.

> Taking a step back, can you describe your use-case a bit more and share
> requirements?

Our use case is simple at now. 
We have two swap devices with different performance
characteristics and want to assign different swap devices to different
workloads (cgroups).

For some background, when I initially proposed this, I suggested allowing
per-cgroup swap device priorities so that it could also accommodate the
broader scenarios you mentioned. However, since even our own use case
does not require reversing swap priorities within a cgroup, we pivoted
to the "swap tier" mechanism that Chris proposed.

> 1. If more than one device is assign to a workload, do you want to have
>    some kind of ordering between them for the worklod or do you want option to
>    have round robin kind of policy?

Both. If devices are in the same tier with the same priority, round robin.
If they are in the same tier with different priorities, or in different
tiers, ordering applies. The current tier structure should be able to
satisfy either preference.

> 2. What's the reason to use 'tiers' in the name? Is it similar to memory tiers
>    and you want promotion/demotion among the tiers?

This was originally Chris's idea. I think he explained the rationale
well in his reply.

> 3. If a workload has multiple swap devices assigned, can you describe the
>    scenario where such workloads need to partition/divide given devices to their
>    sub-workloads?

One possible scenario is reducing lock contention by partitioning swap
devices between parent and child cgroups.

> Let's start with these questions. Please note that I want us to not just look at
> the current use-case but brainstorm more future use-cases and then come up with
> the solution which is more future proof.

We have clear production use cases from both us and Chris, and I also
presented a deployment example in the cover letter.

I think it is hard to design concretely for future use cases at this
point. When those needs become clearer, BPF with its flexibility
would be a better fit then. I see BPF as a natural extension path
rather than a starting point.

For now, guarding the memcg & tier behind a CONFIG option would
let us move forward without committing to a stable interface, and
we can always pivot to BPF later if needed

Thanks,
YoungJun Park


      parent reply	other threads:[~2026-02-21 14:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-26  6:52 Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 1/5] mm: swap: introduce swap tier infrastructure Youngjun Park
2026-02-12  9:07   ` Chris Li
2026-02-13  2:18     ` YoungJun Park
2026-02-13 14:33     ` YoungJun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 2/5] mm: swap: associate swap devices with tiers Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 3/5] mm: memcontrol: add interface for swap tier selection Youngjun Park
2026-01-26  6:52 ` [RFC PATCH v2 v2 4/5] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2026-02-12  7:37   ` Chris Li
2026-01-26  6:52 ` [RFC PATCH v2 v2 5/5] mm, swap: introduce percpu swap device cache to avoid fragmentation Youngjun Park
2026-02-12  6:12 ` [RFC PATCH v2 0/5] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2026-02-12  9:22   ` Chris Li
2026-02-13  2:26     ` YoungJun Park
2026-02-13  1:59   ` YoungJun Park
2026-02-12 17:57 ` Nhat Pham
2026-02-12 17:58   ` Nhat Pham
2026-02-13  2:43   ` YoungJun Park
2026-02-12 18:33 ` Shakeel Butt
2026-02-13  3:58   ` YoungJun Park
2026-02-21  3:47     ` Shakeel Butt
2026-02-21  6:07       ` Chris Li
2026-02-21 17:44         ` Shakeel Butt
2026-02-22  1:16           ` YoungJun Park
2026-02-21 14:30       ` YoungJun Park [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZnBo+P3ifskts9J@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=austin.kim@lge.com \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox