From: YoungJun Park <youngjun.park@lge.com>
To: Chris Li <chrisl@kernel.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
kasong@tencent.com, hannes@cmpxchg.org, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, shikemeng@huaweicloud.com,
nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org,
gunho.lee@lge.com, taejoon.song@lge.com
Subject: Re: [PATCH 3/3] mm/swap: integrate swap tier infrastructure into swap subsystem
Date: Thu, 13 Nov 2025 13:07:08 +0900 [thread overview]
Message-ID: <aRVZbMNgvDbZDnDK@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CACePvbU+NviPmpXQAJUrY4rTqmY_rvYy2JvDBAfT290GmQmfZg@mail.gmail.com>
On Wed, Nov 12, 2025 at 06:44:23AM -0800, Chris Li wrote:
> It seems you should introduce the tiers first. Allow users to define tiers.
> Then the follow up patches use tiers defined here.
>
> The patch order seems reversed to me.
>
> See some feedback below, to be continued.
>
> Chris
Ack. As I already mentioned, I will change it properly :D
> On Sun, Nov 9, 2025 at 4:50 AM Youngjun Park <youngjun.park@lge.com> wrote:
> >
> > Integrate the swap tier infrastructure into the existing swap subsystem
> > to enable selective swap device usage based on tier configuration.
> >
> > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > ---
> > mm/memcontrol.c | 69 ++++++++++++++++++++++++++++++++++++
> > mm/page_io.c | 21 ++++++++++-
> > mm/swap_state.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
> > mm/swapfile.c | 15 ++++++--
> > 4 files changed, 194 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index bfc986da3289..33c7cc069754 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -68,6 +68,7 @@
> > #include <net/ip.h>
> > #include "slab.h"
> > #include "memcontrol-v1.h"
> > +#include "swap_tier.h"
> >
> > #include <linux/uaccess.h>
> >
> > @@ -3730,6 +3731,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
> > {
> > lru_gen_exit_memcg(memcg);
> > memcg_wb_domain_exit(memcg);
> > + swap_tiers_put_mask(memcg);
> > __mem_cgroup_free(memcg);
> > }
> >
> > @@ -3842,6 +3844,11 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> > page_counter_init(&memcg->kmem, &parent->kmem, false);
> > page_counter_init(&memcg->tcpmem, &parent->tcpmem, false);
> > #endif
> > +#ifdef CONFIG_SWAP_TIER
> > + memcg->tiers_mask = 0;
> > + memcg->tiers_onoff = 0;
> > +#endif
> > +
> > } else {
> > init_memcg_stats();
> > init_memcg_events();
> > @@ -3850,6 +3857,10 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> > #ifdef CONFIG_MEMCG_V1
> > page_counter_init(&memcg->kmem, NULL, false);
> > page_counter_init(&memcg->tcpmem, NULL, false);
> > +#endif
> > +#ifdef CONFIG_SWAP_TIER
> Again, don't need this config.
Ack.
> > + memcg->tiers_mask = DEFAULT_FULL_MASK;
>
> Is this memcg->tiers_mask a cached value after evaluating the
> swap.tiers onoff list by looking up the parent?
It is updated when configured through the memcg interface.
The tiers_onoff field represents which tiers are turned on or off,
and tiers_mask is the mask that includes both on and off bits for
those tiers. This mask is used in swap_tier_collect_mask logic to
avoid recalculating it every time.
> I was thinking of starting with always evaluating the tiers_mask. Then
> we don't need to store it here. How do you indicate the tiers_mask is
> out of date?
swap_tier_collect_mask does not cache it. The effective mask is
calculated at swap I/O time. It only changes through the user
interface.
From your mention of “evaluation,” I understand you are referring to
a dynamically computed mask that depends on the parent’s settings.
However, in my implementation, tiers_mask is simply the mask of the
selected tiers. tiers_onoff indicates on/off state, and tiers_mask
represents the full mask (both on and off bits) needed for
swap_tier_collect_mask calculation. Therefore, tiers_mask can be
derived from tiers_onoff and could potentially be removed.
> > + memcg->tiers_onoff = DEFAULT_ON_MASK;
> > #endif
> > root_mem_cgroup = memcg;
> > return &memcg->css;
> > @@ -5390,6 +5401,56 @@ static int swap_events_show(struct seq_file *m, void *v)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_SWAP_TIER
> > +static int swap_tier_show(struct seq_file *m, void *v)
> > +{
> > + struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> > +
> > + swap_tiers_show_memcg(m, memcg);
> > + return 0;
> > +}
> > +
> > +static ssize_t swap_tier_write(struct kernfs_open_file *of,
> > + char *buf, size_t nbytes, loff_t off)
> > +{
> > + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> > + struct tiers_desc desc[MAX_SWAPTIER] = {};
> > + char *pos = buf, *token;
> > + int nr = 0;
> > + int ret;
> > +
> > + while ((token = strsep(&pos, " \t\n")) != NULL) {
>
> Not allowing plain space " "? Compare pointer != NULL is redundant.
You mean just allow " "?
I will change it to:
while ((token = strsep(" "))) {
> > + if (!*token)
> > + continue;
> > +
> > + if (nr >= MAX_SWAPTIER)
> > + return -E2BIG;
> > +
> > + if (token[0] != '+' && token[0] != '-')
> > + return -EINVAL;
> > +
> > + desc[nr].ops = (token[0] == '+') ? TIER_ON_MASK : TIER_OFF_MASK;
> > +
> > + if (strlen(token) <= 1) {
> > + strscpy(desc[nr].name, DEFAULT_TIER_NAME);
> > + nr++;
> > + continue;
> > + }
> > +
> > + if (strscpy(desc[nr].name, token + 1, MAX_TIERNAME) < 0)
> > + return -EINVAL;
> > +
> > + nr++;
> I don't think you need this nr, you will reach to the end of the
> string any way. What if the user specifies the same tier more than
> once? It is not optimal but the kernel should take it.
>
> OK, I see what is going on now, this whole desc thing can be greatly
> simplified. You shouldn't need to maintain the desc[nr], that desc
> array is the onoff mask in my mind. You just need to keep the tier
> bits in order.
>
> Notice in the memory.swap.tiers. Except for the default tier pattern,
> which always the first one if exists. The rest of the tier + - order
> does not matter. You look up the tier name into the tier mask bit.
> Just set the onoff bits for that tier.
The desc array is currently used as an
intermediate structure before applying the bitmask in swap_tier.c.
but as you mentioned, it might be unnecessary. I will review this
part to see if it can be simplified.
Best Regards,
Youngjun Park
next prev parent reply other threads:[~2025-11-13 4:07 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 12:49 [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Youngjun Park
2025-11-09 12:49 ` [PATCH 1/3] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2025-11-13 6:07 ` Kairui Song
2025-11-13 11:45 ` YoungJun Park
2025-11-14 1:05 ` Baoquan He
2025-11-14 15:52 ` Kairui Song
2025-11-15 9:28 ` YoungJun Park
2025-11-09 12:49 ` [PATCH 2/3] mm: swap: introduce swap tier infrastructure Youngjun Park
2025-11-12 14:20 ` Chris Li
2025-11-13 2:01 ` YoungJun Park
2025-11-09 12:49 ` [PATCH 3/3] mm/swap: integrate swap tier infrastructure into swap subsystem Youngjun Park
2025-11-10 11:40 ` kernel test robot
2025-11-10 12:12 ` kernel test robot
2025-11-10 13:26 ` kernel test robot
2025-11-12 14:44 ` Chris Li
2025-11-13 4:07 ` YoungJun Park [this message]
2025-11-12 13:34 ` [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2025-11-13 1:33 ` YoungJun Park
2025-11-15 1:22 ` SeongJae Park
2025-11-15 9:44 ` YoungJun Park
2025-11-15 16:56 ` SeongJae Park
2025-11-15 15:13 ` Chris Li
2025-11-15 17:24 ` SeongJae Park
2025-11-17 22:17 ` Chris Li
2025-11-18 1:11 ` SeongJae Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRVZbMNgvDbZDnDK@yjaykim-PowerEdge-T330 \
--to=youngjun.park@lge.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=gunho.lee@lge.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=taejoon.song@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox