From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Qi Zheng <qi.zheng@linux.dev>
Cc: "Michal Koutný" <mkoutny@suse.com>,
hannes@cmpxchg.org, hughd@google.com, mhocko@suse.com,
roman.gushchin@linux.dev, shakeel.butt@linux.dev,
muchun.song@linux.dev, david@kernel.org,
lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com,
imran.f.khan@oracle.com, kamalesh.babulal@oracle.com,
axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
chenridong@huaweicloud.com, akpm@linux-foundation.org,
hamzamahfooz@linux.microsoft.com, apais@linux.microsoft.com,
lance.yang@linux.dev, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
"Muchun Song" <songmuchun@bytedance.com>,
"Qi Zheng" <zhengqi.arch@bytedance.com>
Subject: Re: [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios
Date: Tue, 6 Jan 2026 16:51:54 +0000 [thread overview]
Message-ID: <rga5pjrjnrzzminflnwfd2lckedg4pdzaypwsa4ad2ovyjkavt@kegiy7cthefu> (raw)
In-Reply-To: <d016b76d-581a-4582-920d-21f64318090a@linux.dev>
On Tue, Jan 06, 2026 at 03:08:57PM +0800, Qi Zheng wrote:
>
>
> On 1/6/26 12:14 AM, Yosry Ahmed wrote:
> > On Mon, Jan 05, 2026 at 11:41:46AM +0100, Michal Koutný wrote:
> > > Hi Qi.
> > >
> > > On Wed, Dec 17, 2025 at 03:27:51PM +0800, Qi Zheng <qi.zheng@linux.dev> wrote:
> > >
> > > > @@ -5200,22 +5238,27 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry)
> > > > unsigned int nr_pages = folio_nr_pages(folio);
> > > > struct page_counter *counter;
> > > > struct mem_cgroup *memcg;
> > > > + struct obj_cgroup *objcg;
> > > > if (do_memsw_account())
> > > > return 0;
> > > > - memcg = folio_memcg(folio);
> > > > -
> > > > - VM_WARN_ON_ONCE_FOLIO(!memcg, folio);
> > > > - if (!memcg)
> > > > + objcg = folio_objcg(folio);
> > > > + VM_WARN_ON_ONCE_FOLIO(!objcg, folio);
> > > > + if (!objcg)
> > > > return 0;
> > > > + rcu_read_lock();
> > > > + memcg = obj_cgroup_memcg(objcg);
> > > > if (!entry.val) {
> > > > memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
> > > > + rcu_read_unlock();
> > > > return 0;
> > > > }
> > > > memcg = mem_cgroup_id_get_online(memcg);
> > > > + /* memcg is pined by memcg ID. */
> > > > + rcu_read_unlock();
> > > > if (!mem_cgroup_is_root(memcg) &&
> > > > !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
> > >
> > > Later there is:
> > > swap_cgroup_record(folio, mem_cgroup_id(memcg), entry);
> > >
> > > As per the comment memcg remains pinned by the ID which is associated
> > > with a swap slot, i.e. theoretically time unbound (shmem).
> > > (This was actually brought up by Yosry in stats subthread [1])
> > >
> > > I think that should be tackled too to eliminate the problem completely.
> >
> > FWIW, I am not sure if swap entries is the last cause of pinning memcgs,
> > I am pretty sure there will be others that we haven't found yet. This is
>
> Agree.
>
> > why I think we shouldn't assume that the time between offlining and
> > releasing a memcg is short or bounded when fixing the stats problem.
>
> If I have not misunderstood your suggestion in the other thread, I plan
> to do the following in v3:
>
> 1. define a memcgv1-only function:
>
> void memcg1_reparent_state_local(struct mem_cgroup *memcg, struct mem_cgroup
> *parent)
> {
> int i;
>
> synchronize_rcu();
>
> for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) {
> int idx = memcg1_stats[i];
> unsigned long value = memcg_page_state_local(memcg, idx);
>
> mod_memcg_page_state_local(parent, idx, value);
> }
> }
>
> 2. call it after reparent_unlocks():
>
> memcg_reparent_objcgs
> --> objcg = __memcg_reparent_objcgs(memcg, parent);
> reparent_unlocks(memcg, parent);
> reparent_state_local(memcg, parent);
> --> memcg1_reparent_state_local()
Something like that, yeah. I think we can avoid introducing
mod_memcg_page_state_local() if we just use mod_memcg_state() to
subtract the stat from the child then add it to the parent.
We should probably also flush the stats before reading them to
aggregate all per-CPU counters.
I think we also need to ensure that all stat updates happen within the
same RCU read section where we read the memcg pointer from the page,
ideally with safeguards to prevent misuse.
>
> >
> > >
> > > As I look at the code, these memcg IDs (private [2]) could be converted
> > > to objcg IDs so that reparenting applies also to folios that are
> > > currently swapped out. (Or convert to swap_cgroup_ctrl from the vector
> > > of IDs to a vector of objcg pointers, depending on space.)
> >
> > I think we can do objcg IDs, but be careful to keep the same behavior as
> > today and avoid overexhausting the 16 bit ID space. So we need to also
> > drop the ref to the objcg ID when the memcg is offlined and the objcg is
> > reparented, such that the objcg ID is deleted unless there are swapped
> > out entries.
> >
> > I think this can be done on top of this series, not necessarily as part
> > of it.
>
> Agree, I prefer to address this issue in a separate patchset.
>
> Thanks,
> Qi
>
> >
> > >
> > > Thanks,
> > > Michal
> > >
> > > [1] https://lore.kernel.org/r/ebdhvcwygvnfejai5azhg3sjudsjorwmlcvmzadpkhexoeq3tb@5gj5y2exdhpn
> > > [2] https://lore.kernel.org/r/20251225232116.294540-1-shakeel.butt@linux.dev
> >
> >
>
next prev parent reply other threads:[~2026-01-06 16:52 UTC|newest]
Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-17 7:27 [PATCH v2 00/28] Eliminate Dying Memory Cgroup Qi Zheng
2025-12-17 7:27 ` [PATCH v2 01/28] mm: memcontrol: remove dead code of checking parent memory cgroup Qi Zheng
2025-12-18 23:31 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 02/28] mm: workingset: use folio_lruvec() in workingset_refault() Qi Zheng
2025-12-18 23:32 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 03/28] mm: rename unlock_page_lruvec_irq and its variants Qi Zheng
2025-12-18 9:00 ` David Hildenbrand (Red Hat)
2025-12-18 23:34 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 04/28] mm: vmscan: prepare for the refactoring the move_folios_to_lru() Qi Zheng
2025-12-17 21:13 ` Johannes Weiner
2025-12-18 9:04 ` David Hildenbrand (Red Hat)
2025-12-18 9:31 ` Qi Zheng
2025-12-18 23:39 ` Shakeel Butt
2025-12-25 3:45 ` Chen Ridong
2025-12-17 7:27 ` [PATCH v2 05/28] mm: vmscan: refactor move_folios_to_lru() Qi Zheng
2025-12-19 0:04 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 06/28] mm: memcontrol: allocate object cgroup for non-kmem case Qi Zheng
2025-12-17 21:22 ` Johannes Weiner
2025-12-18 6:25 ` Qi Zheng
2025-12-19 0:23 ` Shakeel Butt
2025-12-25 6:23 ` Chen Ridong
2025-12-17 7:27 ` [PATCH v2 07/28] mm: memcontrol: return root object cgroup for root memory cgroup Qi Zheng
2025-12-17 21:28 ` Johannes Weiner
2025-12-19 0:39 ` Shakeel Butt
2025-12-26 1:03 ` Chen Ridong
2025-12-26 3:10 ` Muchun Song
2025-12-26 3:50 ` Chen Ridong
2025-12-26 3:58 ` Chen Ridong
2025-12-17 7:27 ` [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Qi Zheng
2025-12-17 21:45 ` Johannes Weiner
2025-12-18 6:31 ` Qi Zheng
2025-12-19 2:09 ` Shakeel Butt
2025-12-19 3:53 ` Johannes Weiner
2025-12-19 3:56 ` Johannes Weiner
2025-12-17 7:27 ` [PATCH v2 09/28] buffer: prevent memory cgroup release in folio_alloc_buffers() Qi Zheng
2025-12-17 21:45 ` Johannes Weiner
2025-12-19 2:14 ` Shakeel Butt
2025-12-26 2:01 ` Chen Ridong
2025-12-17 7:27 ` [PATCH v2 10/28] writeback: prevent memory cgroup release in writeback module Qi Zheng
2025-12-17 22:08 ` Johannes Weiner
2025-12-19 2:30 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 11/28] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Qi Zheng
2025-12-17 22:11 ` Johannes Weiner
2025-12-19 23:31 ` Shakeel Butt
2025-12-26 2:12 ` Chen Ridong
2025-12-17 7:27 ` [PATCH v2 12/28] mm: page_io: prevent memory cgroup release in page_io module Qi Zheng
2025-12-17 22:12 ` Johannes Weiner
2025-12-19 23:44 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 13/28] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Qi Zheng
2025-12-17 22:14 ` Johannes Weiner
2025-12-18 9:09 ` David Hildenbrand (Red Hat)
2025-12-18 9:36 ` Qi Zheng
2025-12-18 9:43 ` David Hildenbrand (Red Hat)
2025-12-18 11:40 ` Qi Zheng
2025-12-18 11:56 ` David Hildenbrand (Red Hat)
2025-12-18 13:00 ` Qi Zheng
2025-12-18 13:04 ` David Hildenbrand (Red Hat)
2025-12-18 13:16 ` Qi Zheng
2025-12-19 4:12 ` Harry Yoo
2025-12-19 6:18 ` David Hildenbrand (Red Hat)
2025-12-18 14:26 ` Johannes Weiner
2025-12-22 3:42 ` Qi Zheng
2025-12-30 20:07 ` David Hildenbrand (Red Hat)
2025-12-19 23:51 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 14/28] mm: mglru: prevent memory cgroup release in mglru Qi Zheng
2025-12-17 22:18 ` Johannes Weiner
2025-12-18 6:50 ` Qi Zheng
2025-12-20 0:58 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 15/28] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Qi Zheng
2025-12-17 22:21 ` Johannes Weiner
2025-12-20 1:05 ` Shakeel Butt
2025-12-22 4:02 ` Qi Zheng
2025-12-26 2:29 ` Chen Ridong
2025-12-17 7:27 ` [PATCH v2 16/28] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Qi Zheng
2025-12-17 22:23 ` Johannes Weiner
2025-12-20 1:06 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 17/28] mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}() Qi Zheng
2025-12-17 22:27 ` Johannes Weiner
2025-12-20 1:11 ` Shakeel Butt
2025-12-22 3:33 ` Qi Zheng
2025-12-18 9:10 ` David Hildenbrand (Red Hat)
2025-12-17 7:27 ` [PATCH v2 18/28] mm: zswap: prevent memory cgroup release in zswap_compress() Qi Zheng
2025-12-17 22:27 ` Johannes Weiner
2025-12-20 1:14 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 19/28] mm: workingset: prevent lruvec release in workingset_refault() Qi Zheng
2025-12-17 22:30 ` Johannes Weiner
2025-12-18 6:57 ` Qi Zheng
2025-12-17 7:27 ` [PATCH v2 20/28] mm: zswap: prevent lruvec release in zswap_folio_swapin() Qi Zheng
2025-12-17 22:33 ` Johannes Weiner
2025-12-18 7:09 ` Qi Zheng
2025-12-18 13:02 ` Johannes Weiner
2025-12-20 1:23 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 21/28] mm: swap: prevent lruvec release in lru_gen_clear_refs() Qi Zheng
2025-12-17 22:34 ` Johannes Weiner
2025-12-20 1:24 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 22/28] mm: workingset: prevent lruvec release in workingset_activation() Qi Zheng
2025-12-17 22:36 ` Johannes Weiner
2025-12-20 1:25 ` Shakeel Butt
2025-12-17 7:27 ` [PATCH v2 23/28] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Qi Zheng
2025-12-18 13:00 ` Johannes Weiner
2025-12-18 13:17 ` Qi Zheng
2025-12-20 2:03 ` Shakeel Butt
2025-12-23 6:14 ` Qi Zheng
2025-12-17 7:27 ` [PATCH v2 24/28] mm: vmscan: prepare for reparenting traditional LRU folios Qi Zheng
2025-12-18 13:32 ` Johannes Weiner
2025-12-22 3:55 ` Qi Zheng
2025-12-17 7:27 ` [PATCH v2 25/28] mm: vmscan: prepare for reparenting MGLRU folios Qi Zheng
2025-12-17 7:27 ` [PATCH v2 26/28] mm: memcontrol: refactor memcg_reparent_objcgs() Qi Zheng
2025-12-18 13:45 ` Johannes Weiner
2025-12-22 3:56 ` Qi Zheng
2025-12-17 7:27 ` [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Qi Zheng
2025-12-18 14:06 ` Johannes Weiner
2025-12-22 3:59 ` Qi Zheng
2026-01-05 10:41 ` Michal Koutný
2026-01-05 16:14 ` Yosry Ahmed
2026-01-06 7:08 ` Qi Zheng
2026-01-06 16:51 ` Yosry Ahmed [this message]
2025-12-17 7:27 ` [PATCH v2 28/28] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Qi Zheng
2025-12-18 14:07 ` Johannes Weiner
2025-12-23 20:04 ` [PATCH v2 00/28] Eliminate Dying Memory Cgroup Yosry Ahmed
2025-12-23 23:20 ` Shakeel Butt
2025-12-24 0:07 ` Yosry Ahmed
2025-12-24 0:36 ` Shakeel Butt
2025-12-24 0:43 ` Yosry Ahmed
2025-12-24 0:58 ` Shakeel Butt
2025-12-29 9:42 ` Qi Zheng
2025-12-29 10:52 ` Michal Koutný
2026-01-02 18:21 ` Yosry Ahmed
2025-12-29 7:48 ` Qi Zheng
2025-12-29 9:35 ` Harry Yoo
2025-12-29 9:46 ` Qi Zheng
2025-12-29 10:53 ` Michal Koutný
2025-12-24 8:43 ` Harry Yoo
2025-12-24 14:51 ` Yosry Ahmed
2025-12-26 11:24 ` Harry Yoo
2025-12-30 1:36 ` Roman Gushchin
2025-12-30 2:44 ` Qi Zheng
2025-12-30 4:20 ` Roman Gushchin
2025-12-30 4:25 ` Qi Zheng
2025-12-30 4:48 ` Shakeel Butt
2025-12-30 16:46 ` Zi Yan
2025-12-30 18:13 ` Shakeel Butt
2025-12-30 19:18 ` Chris Mason
2025-12-30 20:51 ` Matthew Wilcox
2025-12-30 21:10 ` Chris Mason
2025-12-30 22:30 ` Roman Gushchin
2025-12-30 22:03 ` Roman Gushchin
2025-12-30 21:07 ` Zi Yan
2025-12-30 19:34 ` Roman Gushchin
2025-12-30 21:13 ` Zi Yan
2025-12-30 4:01 ` Shakeel Butt
2025-12-30 4:11 ` Roman Gushchin
2025-12-30 18:36 ` Shakeel Butt
2025-12-30 20:47 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=rga5pjrjnrzzminflnwfd2lckedg4pdzaypwsa4ad2ovyjkavt@kegiy7cthefu \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=apais@linux.microsoft.com \
--cc=axelrasmussen@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=david@kernel.org \
--cc=hamzamahfooz@linux.microsoft.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=imran.f.khan@oracle.com \
--cc=kamalesh.babulal@oracle.com \
--cc=lance.yang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=songmuchun@bytedance.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox