linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Chris Li <chrisl@kernel.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	kasong@tencent.com, hannes@cmpxchg.org, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	muchun.song@linux.dev, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org,
	gunho.lee@lge.com, taejoon.song@lge.com
Subject: Re: [PATCH 2/3] mm: swap: introduce swap tier infrastructure
Date: Thu, 13 Nov 2025 11:01:34 +0900	[thread overview]
Message-ID: <aRU7/ibyrPTn93qV@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CACePvbUvDfZgDx-sXkZ+Oa7DwtKg6b1t=owQ1sRZ_FpFrGxA3g@mail.gmail.com>

On Wed, Nov 12, 2025 at 06:20:22AM -0800, Chris Li wrote:
> First of all, for RFC series, please include RFC in each patch subject as well.
> 
> For the real patch submission, please consider split it into smaller
> chunks and have incremental milestones.
> Only introduce the function needed for each milestone, not define them
> all together then use it in later patches.
> 
> See some feedback as follows.
> 
> This patch is too big, to be continued.
> 
> Chris

Sure, I will take care of it. will make better on real patch submissions.

> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index 966f7c1a0128..1224029620ed 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -283,6 +283,10 @@ struct mem_cgroup {
> >         /* per-memcg mm_struct list */
> >         struct lru_gen_mm_list mm_list;
> >  #endif
> > +#ifdef CONFIG_SWAP_TIER
> 
> I think we don't need this CONFIG_SWAP_TIER. Making it part of the
> default swap is fine.
> By default the memory.swap.tiers is empty and matches the previous
> swap behavior. The user doesn't need to do anything if they are not
> using swap tiers. I see no reason to have a separate config option.

Okay I will change it to default kernel option.

> > +       int tiers_onoff;
> > +       int tiers_mask;
> > +#endif
> >
> >  #ifdef CONFIG_MEMCG_V1
> >         /* Legacy consumer-oriented counters */
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 90fa27bb7796..8911eff9d37a 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -271,6 +271,9 @@ struct swap_info_struct {
> >         struct percpu_ref users;        /* indicate and keep swap device valid. */
> >         unsigned long   flags;          /* SWP_USED etc: see above */
> >         signed short    prio;           /* swap priority of this type */
> > +#ifdef CONFIG_SWAP_TIER
> > +       int tier_idx;
> > +#endif
> >         struct plist_node list;         /* entry in swap_active_head */
> >         signed char     type;           /* strange name for an index */
> >         unsigned int    max;            /* extent of the swap_map */
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index e1fb11f36ede..78173ffe65d6 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -163,6 +163,19 @@ endmenu
> >
> >  endif
> >
> > +config SWAP_TIER
> Same, I think we can remove the SWAP_TIER, just turn it on when swap is enabled.

Ack

> > diff --git a/mm/swap.h b/mm/swap.h
> > index d034c13d8dd2..b116282690c8 100644
> > --- a/mm/swap.h
> > +++ b/mm/swap.h
> > @@ -16,6 +16,10 @@ extern int page_cluster;
> >  #define swap_entry_order(order)        0
> >  #endif
> >
> > +#define DEF_SWAP_PRIO  -1
> > +
> > +extern spinlock_t swap_lock;
> > +extern struct plist_head swap_active_head;
> >  extern struct swap_info_struct *swap_info[];
> >
> >  /*
> > diff --git a/mm/swap_tier.c b/mm/swap_tier.c
> > new file mode 100644
> > index 000000000000..4301e3c766b9
> > --- /dev/null
> > +++ b/mm/swap_tier.c
> > @@ -0,0 +1,602 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include <linux/swap.h>
> > +#include <linux/rcupdate.h>
> > +#include <linux/memcontrol.h>
> > +#include <linux/plist.h>
> > +#include <linux/sysfs.h>
> > +#include <linux/sort.h>
> > +
> > +#include "swap_tier.h"
> > +
> > +/*
> > + * struct swap_tier - Structure representing a swap tier.
> > + *
> > + * @name:              Name of the swap_tier.
> > + * @prio_st:           Starting value of priority.
> > + * @prio_end:          Ending value of priority.
> > + * @list:              Linked list of active tiers.
> > + * @inactive_list:     Linked list of inactive tiers.
> > + * @kref:              Reference count (held by swap device and memory cgroup).
> > + */
> > +static struct swap_tier {
> > +       char name[MAX_TIERNAME];
> > +       short prio_st;
> > +       short prio_end;
> 
> I see a lot of complexity of the code come from this priority range.
> Having both start and end.
> We should be able to just keep just one of the start and end,  e.g.
> high end of the priority, then keep all tier in a sorted list or
> array. Then use the next tier's priority to indicate the other end of
> the current tier.

After checking my code, I decide to remove st or end.

> > +       union {
> > +               struct plist_node list;
> > +               struct list_head inactive_list;
> > +       };
> 
> Is this the list of swapfiles?
> Why union, how does it indicate which field of the union is valid?

It is swap_tier itself. The 'list' maintains active tiers, and
'inactive_list' maintains inactive tiers. One tier exists on either
'list' or 'inactive_list'. The code ensures that a tier must be on one
of them.

> > +
> > +/* swap_tiers initialization */
> > +void swap_tiers_init(void)
> > +{
> > +       struct swap_tier *tier;
> > +       int idx;
> > +
> > +       BUILD_BUG_ON(BITS_PER_TYPE(int) < MAX_SWAPTIER * 2);
> > +
> > +       for_each_tier(tier, idx) {
> > +               if (idx < SWAP_TIER_RESERVED_CNT) {
> > +                       /* for display fisrt */
> > +                       plist_node_init(&tier->list, -SHRT_MAX);
> > +                       plist_add(&tier->list, &swap_tier_active_list);
> > +                       kref_init(&tier->ref);
> > +               } else {
> > +                       INIT_LIST_HEAD(&tier->inactive_list);
> > +                       list_add_tail(&tier->inactive_list, &swap_tier_inactive_list);
> > +               }
> > +       }
> > +
> > +       strscpy(swap_tiers[SWAP_TIER_DEFAULT].name, DEFAULT_TIER_NAME);
> 
> The default tier is not a real tier. It shouldn't show up in the
> swap_tiers array.
> The default tier is only a wide cast for memory.swap.tiers to select
> tiers to turn on/off swap tiers. It is a wide cast pattern, not an
> actual tier.

Yeah, as I commented in the previous mail, I will change it to a
logical concept.

> > +void swap_tiers_show_memcg(struct seq_file *m, struct mem_cgroup *memcg)
> > +{
> > +       spin_lock(&swap_tier_lock);
> > +       if (memcg->tiers_onoff)
> > +               swap_tier_show_mask(m, memcg->tiers_onoff);
> > +       else
> > +               seq_puts(m, "\n");
> > +       swap_tier_show_mask(m, swap_tier_collect_mask(memcg));
> > +       spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +void swap_tiers_assign(struct swap_info_struct *swp)
> > +{
> > +       struct swap_tier *tier;
> > +
> > +       spin_lock(&swap_tier_lock);
> > +       swp->tier_idx = NULL_TIER;
> > +
> > +       for_each_active_tier(tier) {
> > +               if (swap_tier_is_default(tier))
> > +                       continue;
> > +               if (swap_tier_prio_in_range(tier, swp->prio)) {
> > +                       swp->tier_idx = TIER_IDX(tier);
> > +                       swap_tier_get(tier);
> > +                       break;
> > +               }
> > +       }
> > +       spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +void swap_tiers_release(struct swap_info_struct *swp)
> > +{
> > +       spin_lock(&swap_tier_lock);
> > +       if (swp->tier_idx != NULL_TIER)
> > +               swap_tier_put(&swap_tiers[swp->tier_idx]);
> > +       spin_unlock(&swap_tier_lock);
> > +}
> > +
> > +/* not incremental, but reset. */
> > +int swap_tiers_get_mask(struct tiers_desc *desc, int nr, struct mem_cgroup *memcg)
> 
> Who is using this function? I can't find the user of this function in
> this patch.
> Please introduce this function together with the patch that uses it.
> Don't introduce a function without a user.

swapoff calls swap_tiers_release. I must improve the readability of my
RFC patch series.

> 
> > +{
> > +       struct swap_tier *tier;
> > +       int ret = 0;
> > +       int tiers_mask = 0;
> > +       int tiers_onoff = 0;
> > +       int cnt = 0;
> > +
> > +       for (int i = 0; i < nr; i++) {
> > +               for (int j = i+1; j < nr; j++) {
> > +                       if (!strcmp(desc[i].name, desc[j].name))
> 
> These two nested for loops look suspicious. Again because I don't see

For assuring, unique tier name input.
I will think of making simple.

> the caller, I can't reason what it is doing here.

When a swap tier is designated by the memcg sys interface, it is
called. I will introduce this logic together with the caller
implementation.

Best Regards,
Youngjun Park


  reply	other threads:[~2025-11-13  2:01 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-09 12:49 [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Youngjun Park
2025-11-09 12:49 ` [PATCH 1/3] mm, swap: change back to use each swap device's percpu cluster Youngjun Park
2025-11-13  6:07   ` Kairui Song
2025-11-13 11:45     ` YoungJun Park
2025-11-14  1:05       ` Baoquan He
2025-11-14 15:52         ` Kairui Song
2025-11-15  9:28           ` YoungJun Park
2025-11-09 12:49 ` [PATCH 2/3] mm: swap: introduce swap tier infrastructure Youngjun Park
2025-11-12 14:20   ` Chris Li
2025-11-13  2:01     ` YoungJun Park [this message]
2025-11-09 12:49 ` [PATCH 3/3] mm/swap: integrate swap tier infrastructure into swap subsystem Youngjun Park
2025-11-10 11:40   ` kernel test robot
2025-11-10 12:12   ` kernel test robot
2025-11-10 13:26   ` kernel test robot
2025-11-12 14:44   ` Chris Li
2025-11-13  4:07     ` YoungJun Park
2025-11-12 13:34 ` [RFC] mm/swap, memcg: Introduce swap tiers for cgroup based swap control Chris Li
2025-11-13  1:33   ` YoungJun Park
2025-11-15  1:22 ` SeongJae Park
2025-11-15  9:44   ` YoungJun Park
2025-11-15 16:56     ` SeongJae Park
2025-11-15 15:13   ` Chris Li
2025-11-15 17:24     ` SeongJae Park
2025-11-17 22:17       ` Chris Li
2025-11-18  1:11         ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRU7/ibyrPTn93qV@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox