Re: [RFC PATCH 2/2] mm: swap: apply per cgroup swap priority mechansim on swap layer

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kairui Song <ryncsn@gmail.com>
To: YoungJun Park <youngjun.park@lge.com>
Cc: Nhat Pham <nphamcs@gmail.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	 hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev,
	 shakeel.butt@linux.dev, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org,  shikemeng@huaweicloud.com,
	bhe@redhat.com, baohua@kernel.org,  chrisl@kernel.org,
	muchun.song@linux.dev, iamjoonsoo.kim@lge.com,
	 taejoon.song@lge.com, gunho.lee@lge.com
Subject: Re: [RFC PATCH 2/2] mm: swap: apply per cgroup swap priority mechansim on swap layer
Date: Fri, 13 Jun 2025 15:36:49 +0800	[thread overview]
Message-ID: <CAMgjq7BzQ8bKKXuHB=TiQnkdSdCuABXrRf8Z8w2QkjpD44jdgA@mail.gmail.com> (raw)
In-Reply-To: <aEvPBSObBrrQCsa3@yjaykim-PowerEdge-T330>

On Fri, Jun 13, 2025 at 3:11 PM YoungJun Park <youngjun.park@lge.com> wrote:
>
> On Thu, Jun 12, 2025 at 01:08:08PM -0700, Nhat Pham wrote:
> > On Thu, Jun 12, 2025 at 11:20 AM Kairui Song <ryncsn@gmail.com> wrote:
> > >
> > > On Fri, Jun 13, 2025 at 1:28 AM Nhat Pham <nphamcs@gmail.com> wrote:
> > > >
> > > > On Thu, Jun 12, 2025 at 4:14 AM Kairui Song <ryncsn@gmail.com> wrote:
> > > > >
> > > > > On Thu, Jun 12, 2025 at 6:43 PM <youngjun.park@lge.com> wrote:
> > > > > >
> > > > > > From: "youngjun.park" <youngjun.park@lge.com>
> > > > > >
> > > > >
> > > > > Hi, Youngjun,
> > > > >
> > > > > Thanks for sharing this series.
> > > > >
> > > > > > This patch implements swap device selection and swap on/off propagation
> > > > > > when a cgroup-specific swap priority is set.
> > > > > >
> > > > > > There is one workaround to this implementation as follows.
> > > > > > Current per-cpu swap cluster enforces swap device selection based solely
> > > > > > on CPU locality, overriding the swap cgroup's configured priorities.
> > > > >
> > > > > I've been thinking about this, we can switch to a per-cgroup-per-cpu
> > > > > next cluster selector, the problem with current code is that swap
> > > >
> > > > What about per-cpu-per-order-per-swap-device :-? Number of swap
> > > > devices is gonna be smaller than number of cgroups, right?
> > >
> > > Hi Nhat,
> > >
> > > The problem is per cgroup makes more sense (I was suggested to use
> > > cgroup level locality at the very beginning of the implementation of
> > > the allocator in the mail list, but it was hard to do so at that
> > > time), for container environments, a cgroup is a container that runs
> > > one type of workload, so it has its own locality. Things like systemd
> > > also organize different desktop workloads into cgroups. The whole
> > > point is about cgroup.
> >
> > Yeah I know what cgroup represents. Which is why I mentioned in the
> > next paragraph that are still making decisions based per-cgroup - we
> > just organize the per-cpu cache based on swap devices. This way, two
> > cgroups with similar/same priority list can share the clusters, for
> > each swapfile, in each CPU. There will be a lot less duplication and
> > overhead. And two cgroups with different priority lists won't
> > interfere with each other, since they'll target different swapfiles.
> >
> > Unless we want to nudge the swapfiles/clusters to be self-partitioned
> > among the cgroups? :) IOW, each cluster contains pages mostly from a
> > single cgroup (with some stranglers mixed in). I suppose that will be
> > very useful for swap on rotational drives where read contiguity is
> > imperative, but not sure about other backends :-?
> > Anyway, no strong opinions to be completely honest :) Was just
> > throwing out some ideas. Per-cgroup-per-cpu-per-order sounds good to
> > me too, if it's easy to do.
>
> Good point!
> I agree with the mention that self-partitioned clusters and duplicated priority.
> One concern is the cost of synchronization.
> Specifically the one incurred when accessing the prioritized swap device
> From a simple performance perspective, a per-cgroup-per-CPU implementation
> seems favorable - in line with the current swap allocation fastpath.
>
> It seems most reasonable to carefully compare the pros and cons of the
> tow approaches.
>
> To summaraize,
>
> Option 1. per-cgroup-per-cpu
> Pros: upstream fit. performance.
> Cons: duplicate priority(some memory structure consumtion cost),
> self partioned cluster
>
> Option 2. per-cpu-per-order(per-device)
> Pros: Cons of Option1
> Cons: Pros of Option1
>
> It's not easy to draw a definitive conclusion right away,
> I should also evaluate other pros and cons that may arise during actual
> implementation.
> so I'd like to take some time to review things in more detail
> and share my thoughs and conclusions in the next patch series.
>
> What do you think, Nhat and Kairui?

Ah, I think what might be best fits here is, each cgroup have a pcp
device list,  and each device have a pcp cluster list:

folio -> mem_cgroup -> swap_priority (maybe a more generic name is
better?) -> swap_device_pcp (recording only the *si per order)
swap_device_info -> swap_cluster_pcp (cluster offset per order)

And if mem_cgroup -> swap_priority is NULL, fallback to a global
swap_device_pcp.

This seems to fit what Nhat suggested, and easy to implement, since
both si and folio->memcg are accessible easily.

next prev parent reply	other threads:[~2025-06-13  7:37 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-12 10:37 [RFC PATCH 0/2] mm/swap, memcg: Support per-cgroup swap device prioritization youngjun.park
2025-06-12 10:37 ` [RFC PATCH 1/2] mm/swap, memcg: basic structure and logic for per cgroup swap priority control youngjun.park
2025-06-17 12:23   ` Michal Koutný
2025-06-18  0:32     ` YoungJun Park
2025-06-18  9:11       ` Michal Koutný
2025-06-18 12:07         ` YoungJun Park
2025-06-30 17:39           ` Michal Koutný
2025-07-01 13:08             ` YoungJun Park
2025-07-07  9:59               ` Michal Koutný
2025-07-07 14:45                 ` YoungJun Park
2025-07-07 14:57                   ` YoungJun Park
2025-06-12 10:37 ` [RFC PATCH 2/2] mm: swap: apply per cgroup swap priority mechansim on swap layer youngjun.park
2025-06-12 11:14   ` Kairui Song
2025-06-12 11:16     ` Kairui Song
2025-06-12 17:28     ` Nhat Pham
2025-06-12 18:20       ` Kairui Song
2025-06-12 20:08         ` Nhat Pham
2025-06-13  7:11           ` YoungJun Park
2025-06-13  7:36             ` Kairui Song [this message]
2025-06-13  7:38               ` Kairui Song
2025-06-13 10:45                 ` YoungJun Park
2025-06-13  6:49     ` YoungJun Park
2025-06-12 12:24 ` [RFC PATCH 0/2] mm/swap, memcg: Support per-cgroup swap device prioritization Kairui Song
2025-06-12 21:32   ` Nhat Pham
2025-06-13  6:56   ` YoungJun Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMgjq7BzQ8bKKXuHB=TiQnkdSdCuABXrRf8Z8w2QkjpD44jdgA@mail.gmail.com' \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=gunho.lee@lge.com \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=taejoon.song@lge.com \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox