linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org>,
	linux-mm@kvack.org,  Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Zi Yan <ziy@nvidia.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	Barry Song <baohua@kernel.org>,  Hugh Dickins <hughd@google.com>,
	Chris Li <chrisl@kernel.org>,
	 Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>,  Baoquan He <bhe@redhat.com>,
	Yosry Ahmed <yosry.ahmed@linux.dev>,
	 Youngjun Park <youngjun.park@lge.com>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Muchun Song <muchun.song@linux.dev>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	 linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH RFC 08/15] mm, swap: store and check memcg info in the swap table
Date: Tue, 24 Feb 2026 16:34:00 +0800	[thread overview]
Message-ID: <CAMgjq7Aq5ckraKtNtet8+1ANuqnitFsXxefbDJQZpBxNmaW7Cg@mail.gmail.com> (raw)
In-Reply-To: <aZyCJ6pH4hey-ZoU@cmpxchg.org>

On Tue, Feb 24, 2026 at 12:46 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Feb 20, 2026 at 07:42:09AM +0800, Kairui Song via B4 Relay wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > To prepare for merging the swap_cgroup_ctrl into the swap table, store
> > the memcg info in the swap table on swapout.
> >
> > This is done by using the existing shadow format.
> >
> > Note this also changes the refault counting at the nearest online memcg
> > level:
> >
> > Unlike file folios, anon folios are mostly exclusive to one mem cgroup,
> > and each cgroup is likely to have different characteristics.
>
> This is not correct.
>
> As much as I like the idea of storing the swap_cgroup association
> inside the shadow entry, the refault evaluation needs to happen at the
> level that drove eviction.
>
> Consider a workload that is split into cgroups purely for accounting,
> not for setting different limits:
>
> workload (limit domain)
> `- component A
> `- component B
>
> This means the two components must compete freely, and it must behave
> as if there is only one LRU. When pages get reclaimed in a round-robin
> fashion, both A and B get aged at the same pace. Likewise, when pages
> in A refault, they must challenge the *combined* workingset of both A
> and B, not just the local pages.
>
> Otherwise, you risk retaining stale workingset in one subgroup while
> the other one is thrashing. This breaks userspace expectations.
>

Hi Johannes, thanks for pointing this out.

I'm just not sure how much of a real problem this is. The refault
challenge change was made in commit b910718a948a which was before anon
shadow was introduced. And shadows could get reclaimed, especially
when under pressure (and we could be doing that again by reclaiming
full_clusters with swap tables). And MGLRU simply ignores the
target_memcg here yet it performs surprisingly well with multiple
memcg setups. And I did find a comment in workingset.c saying the
kernel used to activate all pages, which is also fine. And that commit
also mentioned the active list shrinking, but anon active list gets
shrinked just fine without refault feedback in shrink_lruvec under
can_age_anon_pages.

So in this RFC I just be a bit aggressive and changed it. I can do
some tests with different memory size setup.

If we are not OK with it, then just use a ci->memcg_table then we are
fine, everything is still dynamic but single slot usage could be a bit
higher, 8 bytes to 10 bytes: and maybe find a way later to make
ci->memcg_table NULL and shrink back to 8 bytes with, e.g. MGLRU and
balance the memcg with things like aging feed back maybe (the later
part is just idea but seems doable?).


  reply	other threads:[~2026-02-24  8:34 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 23:42 [PATCH RFC 00/15] mm, swap: swap table phase IV with dynamic ghost swapfile Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 01/15] mm: move thp_limit_gfp_mask to header Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 02/15] mm, swap: simplify swap_cache_alloc_folio Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 03/15] mm, swap: move conflict checking logic of out swap cache adding Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 04/15] mm, swap: add support for large order folios in swap cache directly Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 05/15] mm, swap: unify large folio allocation Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 06/15] memcg, swap: reparent the swap entry on swapin if swapout cgroup is dead Kairui Song via B4 Relay
2026-02-23 16:22   ` Johannes Weiner
2026-02-24  5:44   ` Shakeel Butt
2026-02-24  8:08     ` Kairui Song
2026-02-19 23:42 ` [PATCH RFC 07/15] memcg, swap: defer the recording of memcg info and reparent flexibly Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 08/15] mm, swap: store and check memcg info in the swap table Kairui Song via B4 Relay
2026-02-23 16:36   ` Johannes Weiner
2026-02-24  8:34     ` Kairui Song [this message]
2026-02-19 23:42 ` [PATCH RFC 09/15] mm, swap: support flexible batch freeing of slots in different memcg Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 10/15] mm, swap: always retrieve memcg id from swap table Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 11/15] mm/swap, memcg: remove swap cgroup array Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 12/15] mm, swap: merge zeromap into swap table Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 13/15] mm: ghost swapfile support for zswap Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 14/15] mm, swap: add a special device for ghost swap setup Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 15/15] mm, swap: allocate cluster dynamically for ghost swapfile Kairui Song via B4 Relay
2026-02-21  8:15 ` [PATCH RFC 00/15] mm, swap: swap table phase IV with dynamic " Barry Song
2026-02-21  9:07   ` Kairui Song
2026-02-21  9:30     ` Barry Song
2026-02-23 16:52 ` Johannes Weiner
2026-02-24  2:10   ` Kairui Song
2026-02-23 18:22 ` Nhat Pham
2026-02-24  3:34   ` Kairui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMgjq7Aq5ckraKtNtet8+1ANuqnitFsXxefbDJQZpBxNmaW7Cg@mail.gmail.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=devnull+kasong.tencent.com@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=shikemeng@huaweicloud.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=youngjun.park@lge.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox