linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Chris Li <chrisl@kernel.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	kasong@tencent.com, hannes@cmpxchg.org, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	muchun.song@linux.dev, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org,
	gunho.lee@lge.com, taejoon.song@lge.com
Subject: Re: [PATCH 3/3] mm/swap: integrate swap tier infrastructure into swap subsystem
Date: Thu, 13 Nov 2025 13:07:08 +0900	[thread overview]
Message-ID: <aRVZbMNgvDbZDnDK@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CACePvbU+NviPmpXQAJUrY4rTqmY_rvYy2JvDBAfT290GmQmfZg@mail.gmail.com>

On Wed, Nov 12, 2025 at 06:44:23AM -0800, Chris Li wrote:
> It seems you should introduce the tiers first. Allow users to define tiers.
> Then the follow up patches use tiers defined here.
> 
> The patch order seems reversed to me.
> 
> See some feedback below, to be continued.
> 
> Chris

Ack. As I already mentioned, I will change it properly :D

> On Sun, Nov 9, 2025 at 4:50 AM Youngjun Park &lt;youngjun.park@lge.com&gt; wrote:
> >
> > Integrate the swap tier infrastructure into the existing swap subsystem
> > to enable selective swap device usage based on tier configuration.
> >
> > Signed-off-by: Youngjun Park &lt;youngjun.park@lge.com&gt;
> > ---
> >  mm/memcontrol.c | 69 ++++++++++++++++++++++++++++++++++++
> >  mm/page_io.c    | 21 ++++++++++-
> >  mm/swap_state.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  mm/swapfile.c   | 15 ++++++--
> >  4 files changed, 194 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index bfc986da3289..33c7cc069754 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -68,6 +68,7 @@
> >  #include <net/ip.h>
> >  #include "slab.h"
> >  #include "memcontrol-v1.h"
> > +#include "swap_tier.h"
> >
> >  #include <linux/uaccess.h>
> >
> > @@ -3730,6 +3731,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
> >  {
> >         lru_gen_exit_memcg(memcg);
> >         memcg_wb_domain_exit(memcg);
> > +       swap_tiers_put_mask(memcg);
> >         __mem_cgroup_free(memcg);
> >  }
> >
> > @@ -3842,6 +3844,11 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >                 page_counter_init(&memcg->kmem, &parent->kmem, false);
> >                 page_counter_init(&memcg->tcpmem, &parent->tcpmem, false);
> >  #endif
> > +#ifdef CONFIG_SWAP_TIER
> > +               memcg->tiers_mask = 0;
> > +               memcg->tiers_onoff = 0;
> > +#endif
> > +
> >         } else {
> >                 init_memcg_stats();
> >                 init_memcg_events();
> > @@ -3850,6 +3857,10 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >  #ifdef CONFIG_MEMCG_V1
> >                 page_counter_init(&memcg->kmem, NULL, false);
> >                 page_counter_init(&memcg->tcpmem, NULL, false);
> > +#endif
> > +#ifdef CONFIG_SWAP_TIER
> Again, don't need this config.

Ack.

> > +               memcg->tiers_mask = DEFAULT_FULL_MASK;
>
> Is this memcg->tiers_mask a cached value after evaluating the
> swap.tiers onoff list by looking up the parent?

It is updated when configured through the memcg interface.
The tiers_onoff field represents which tiers are turned on or off,
and tiers_mask is the mask that includes both on and off bits for
those tiers. This mask is used in swap_tier_collect_mask logic to
avoid recalculating it every time.

> I was thinking of starting with always evaluating the tiers_mask. Then
> we don't need to store it here. How do you indicate the tiers_mask is
> out of date?

swap_tier_collect_mask does not cache it. The effective mask is
calculated at swap I/O time. It only changes through the user
interface.

From your mention of “evaluation,” I understand you are referring to
a dynamically computed mask that depends on the parent’s settings.
However, in my implementation, tiers_mask is simply the mask of the
selected tiers. tiers_onoff indicates on/off state, and tiers_mask
represents the full mask (both on and off bits) needed for
swap_tier_collect_mask calculation. Therefore, tiers_mask can be
derived from tiers_onoff and could potentially be removed.

> > +               memcg->tiers_onoff = DEFAULT_ON_MASK;
> >  #endif
> >                 root_mem_cgroup = memcg;
> >                 return &memcg->css;
> > @@ -5390,6 +5401,56 @@ static int swap_events_show(struct seq_file *m, void *v)
> >         return 0;
> >  }
> >
> > +#ifdef CONFIG_SWAP_TIER
> > +static int swap_tier_show(struct seq_file *m, void *v)
> > +{
> > +       struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> > +
> > +       swap_tiers_show_memcg(m, memcg);
> > +       return 0;
> > +}
> > +
> > +static ssize_t swap_tier_write(struct kernfs_open_file *of,
> > +                               char *buf, size_t nbytes, loff_t off)
> > +{
> > +       struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> > +       struct tiers_desc desc[MAX_SWAPTIER] = {};
> > +       char *pos = buf, *token;
> > +       int nr = 0;
> > +       int ret;
> > +
> > +       while ((token = strsep(&pos, " \t\n")) != NULL) {
>
> Not allowing plain space " "? Compare pointer != NULL is redundant.

You mean just allow " "?

I will change it to:
while ((token = strsep(" "))) {

> > +               if (!*token)
> > +                       continue;
> > +
> > +               if (nr >= MAX_SWAPTIER)
> > +                       return -E2BIG;
> > +
> > +               if (token[0] != '+' && token[0] != '-')
> > +                       return -EINVAL;
> > +
> > +               desc[nr].ops = (token[0] == '+') ? TIER_ON_MASK : TIER_OFF_MASK;
> > +
> > +               if (strlen(token) <= 1) {
> > +                       strscpy(desc[nr].name, DEFAULT_TIER_NAME);
> > +                       nr++;
> > +                       continue;
> > +               }
> > +
> > +               if (strscpy(desc[nr].name, token + 1, MAX_TIERNAME) < 0)
> > +                       return -EINVAL;
> > +
> > +               nr++;
> I don't think you need this nr, you will reach to the end of the
> string any way. What if the user specifies the same tier more than
> once? It is not optimal but the kernel should take it.
>
> OK, I see what is going on now, this whole desc thing can be greatly
> simplified. You shouldn't need to maintain the desc[nr], that desc
> array is the onoff mask in my mind. You just need to keep the tier
> bits in order.
>
> Notice in the memory.swap.tiers. Except for the default tier pattern,
> which always the first one if exists. The rest of the tier + - order
> does not matter. You look up the tier name into the tier mask bit.
> Just set the onoff bits for that tier.

The desc array is currently used as an
intermediate structure before applying the bitmask in swap_tier.c.
but as you mentioned, it might be unnecessary. I will review this
part to see if it can be simplified.

Best Regards,
Youngjun Park


  reply	other threads:[~2025-11-13  4:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-09 12:49 [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Youngjun Park
2025-11-09 12:49 ` [PATCH 1/3] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2025-11-13  6:07   ` Kairui Song
2025-11-13 11:45     ` YoungJun Park
2025-11-14  1:05       ` Baoquan He
2025-11-14 15:52         ` Kairui Song
2025-11-15  9:28           ` YoungJun Park
2025-11-09 12:49 ` [PATCH 2/3] mm: swap: introduce swap tier infrastructure Youngjun Park
2025-11-12 14:20   ` Chris Li
2025-11-13  2:01     ` YoungJun Park
2025-11-09 12:49 ` [PATCH 3/3] mm/swap: integrate swap tier infrastructure into swap subsystem Youngjun Park
2025-11-10 11:40   ` kernel test robot
2025-11-10 12:12   ` kernel test robot
2025-11-10 13:26   ` kernel test robot
2025-11-12 14:44   ` Chris Li
2025-11-13  4:07     ` YoungJun Park [this message]
2025-11-12 13:34 ` [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2025-11-13  1:33   ` YoungJun Park
2025-11-15  1:22 ` SeongJae Park
2025-11-15  9:44   ` YoungJun Park
2025-11-15 16:56     ` SeongJae Park
2025-11-15 15:13   ` Chris Li
2025-11-15 17:24     ` SeongJae Park
2025-11-17 22:17       ` Chris Li
2025-11-18  1:11         ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRVZbMNgvDbZDnDK@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox