linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Qi Zheng <qi.zheng@linux.dev>
Cc: hughd@google.com, mhocko@suse.com, roman.gushchin@linux.dev,
	shakeel.butt@linux.dev, muchun.song@linux.dev, david@kernel.org,
	lorenzo.stoakes@oracle.com, ziy@nvidia.com, harry.yoo@oracle.com,
	imran.f.khan@oracle.com, kamalesh.babulal@oracle.com,
	axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com,
	chenridong@huaweicloud.com, mkoutny@suse.com,
	akpm@linux-foundation.org, hamzamahfooz@linux.microsoft.com,
	apais@linux.microsoft.com, lance.yang@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org, Muchun Song <songmuchun@bytedance.com>,
	Qi Zheng <zhengqi.arch@bytedance.com>
Subject: Re: [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios
Date: Thu, 18 Dec 2025 09:06:45 -0500	[thread overview]
Message-ID: <aUQKdZsMclicBDYx@cmpxchg.org> (raw)
In-Reply-To: <c08f964513f9eb6a04f80f1a900e3494a99b7e0d.1765956026.git.zhengqi.arch@bytedance.com>

On Wed, Dec 17, 2025 at 03:27:51PM +0800, Qi Zheng wrote:
> From: Muchun Song <songmuchun@bytedance.com>
> 
> Pagecache pages are charged at allocation time and hold a reference
> to the original memory cgroup until reclaimed. Depending on memory
> pressure, page sharing patterns between different cgroups and cgroup
> creation/destruction rates, many dying memory cgroups can be pinned
> by pagecache pages, reducing page reclaim efficiency and wasting
> memory. Converting LRU folios and most other raw memory cgroup pins
> to the object cgroup direction can fix this long-living problem.

This is already in the coverletter. Please describe here what the
patch itself does. IOW, now that everything is set up, switch
folio->memcg_data pointers to objcgs, update the accessors, and
execute reparenting on cgroup death.

> Finally, folio->memcg_data of LRU folios and kmem folios will always
> point to an object cgroup pointer. The folio->memcg_data of slab
> folios will point to an vector of object cgroups.

> @@ -223,22 +223,55 @@ static inline void __memcg_reparent_objcgs(struct mem_cgroup *src,
>  
>  static inline void reparent_locks(struct mem_cgroup *src, struct mem_cgroup *dst)
>  {
> +	int nid, nest = 0;
> +
>  	spin_lock_irq(&objcg_lock);
> +	for_each_node(nid) {
> +		spin_lock_nested(&mem_cgroup_lruvec(src,
> +				 NODE_DATA(nid))->lru_lock, nest++);
> +		spin_lock_nested(&mem_cgroup_lruvec(dst,
> +				 NODE_DATA(nid))->lru_lock, nest++);
> +	}
>  }

Looks okay to me. If this should turn out to be a scalability problem
in practice, we can make objcgs per-node, and then reparent lru/objcg
pairs on a per-node basis without nesting locks.

>  static inline void reparent_unlocks(struct mem_cgroup *src, struct mem_cgroup *dst)
>  {
> +	int nid;
> +
> +	for_each_node(nid) {
> +		spin_unlock(&mem_cgroup_lruvec(dst, NODE_DATA(nid))->lru_lock);
> +		spin_unlock(&mem_cgroup_lruvec(src, NODE_DATA(nid))->lru_lock);
> +	}
>  	spin_unlock_irq(&objcg_lock);
>  }
>  
> +static void memcg_reparent_lru_folios(struct mem_cgroup *src,
> +				      struct mem_cgroup *dst)
> +{
> +	if (lru_gen_enabled())
> +		lru_gen_reparent_memcg(src, dst);
> +	else
> +		lru_reparent_memcg(src, dst);
> +}
> +
>  static void memcg_reparent_objcgs(struct mem_cgroup *src)
>  {
>  	struct obj_cgroup *objcg = rcu_dereference_protected(src->objcg, true);
>  	struct mem_cgroup *dst = parent_mem_cgroup(src);
>  
> +retry:
> +	if (lru_gen_enabled())
> +		max_lru_gen_memcg(dst);
> +
>  	reparent_locks(src, dst);
> +	if (lru_gen_enabled() && !recheck_lru_gen_max_memcg(dst)) {
> +		reparent_unlocks(src, dst);
> +		cond_resched();
> +		goto retry;
> +	}
>  
>  	__memcg_reparent_objcgs(src, dst);
> +	memcg_reparent_lru_folios(src, dst);

Please inline memcg_reparent_lru_folios() here, to keep the lru vs
lrugen switching as "flat" as possible:

	if (lru_gen_enabled()) {
		if (!recheck_lru_gen_max_memcgs(parent)) {
			reparent_unlocks(memcg, parent);
			cond_resched();
			goto retry;
		}
		lru_gen_reparent_memcg(memcg, parent);
	} else {
		lru_reparent_memcg(memcg, parent);
	}

> @@ -989,6 +1022,8 @@ struct mem_cgroup *get_mem_cgroup_from_current(void)
>  /**
>   * get_mem_cgroup_from_folio - Obtain a reference on a given folio's memcg.
>   * @folio: folio from which memcg should be extracted.
> + *
> + * The folio and objcg or memcg binding rules can refer to folio_memcg().

      See folio_memcg() for folio->objcg/memcg binding rules.


  reply	other threads:[~2025-12-18 14:06 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17  7:27 [PATCH v2 00/28] Eliminate Dying Memory Cgroup Qi Zheng
2025-12-17  7:27 ` [PATCH v2 01/28] mm: memcontrol: remove dead code of checking parent memory cgroup Qi Zheng
2025-12-18 23:31   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 02/28] mm: workingset: use folio_lruvec() in workingset_refault() Qi Zheng
2025-12-18 23:32   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 03/28] mm: rename unlock_page_lruvec_irq and its variants Qi Zheng
2025-12-18  9:00   ` David Hildenbrand (Red Hat)
2025-12-18 23:34   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 04/28] mm: vmscan: prepare for the refactoring the move_folios_to_lru() Qi Zheng
2025-12-17 21:13   ` Johannes Weiner
2025-12-18  9:04   ` David Hildenbrand (Red Hat)
2025-12-18  9:31     ` Qi Zheng
2025-12-18 23:39   ` Shakeel Butt
2025-12-25  3:45   ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 05/28] mm: vmscan: refactor move_folios_to_lru() Qi Zheng
2025-12-19  0:04   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 06/28] mm: memcontrol: allocate object cgroup for non-kmem case Qi Zheng
2025-12-17 21:22   ` Johannes Weiner
2025-12-18  6:25     ` Qi Zheng
2025-12-19  0:23   ` Shakeel Butt
2025-12-25  6:23   ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 07/28] mm: memcontrol: return root object cgroup for root memory cgroup Qi Zheng
2025-12-17 21:28   ` Johannes Weiner
2025-12-19  0:39   ` Shakeel Butt
2025-12-26  1:03   ` Chen Ridong
2025-12-26  3:10     ` Muchun Song
2025-12-26  3:50       ` Chen Ridong
2025-12-26  3:58         ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 08/28] mm: memcontrol: prevent memory cgroup release in get_mem_cgroup_from_folio() Qi Zheng
2025-12-17 21:45   ` Johannes Weiner
2025-12-18  6:31     ` Qi Zheng
2025-12-19  2:09     ` Shakeel Butt
2025-12-19  3:53       ` Johannes Weiner
2025-12-19  3:56         ` Johannes Weiner
2025-12-17  7:27 ` [PATCH v2 09/28] buffer: prevent memory cgroup release in folio_alloc_buffers() Qi Zheng
2025-12-17 21:45   ` Johannes Weiner
2025-12-19  2:14   ` Shakeel Butt
2025-12-26  2:01     ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 10/28] writeback: prevent memory cgroup release in writeback module Qi Zheng
2025-12-17 22:08   ` Johannes Weiner
2025-12-19  2:30   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 11/28] mm: memcontrol: prevent memory cgroup release in count_memcg_folio_events() Qi Zheng
2025-12-17 22:11   ` Johannes Weiner
2025-12-19 23:31   ` Shakeel Butt
2025-12-26  2:12   ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 12/28] mm: page_io: prevent memory cgroup release in page_io module Qi Zheng
2025-12-17 22:12   ` Johannes Weiner
2025-12-19 23:44   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 13/28] mm: migrate: prevent memory cgroup release in folio_migrate_mapping() Qi Zheng
2025-12-17 22:14   ` Johannes Weiner
2025-12-18  9:09   ` David Hildenbrand (Red Hat)
2025-12-18  9:36     ` Qi Zheng
2025-12-18  9:43       ` David Hildenbrand (Red Hat)
2025-12-18 11:40         ` Qi Zheng
2025-12-18 11:56           ` David Hildenbrand (Red Hat)
2025-12-18 13:00             ` Qi Zheng
2025-12-18 13:04               ` David Hildenbrand (Red Hat)
2025-12-18 13:16                 ` Qi Zheng
2025-12-19  4:12                   ` Harry Yoo
2025-12-19  6:18                     ` David Hildenbrand (Red Hat)
2025-12-18 14:26     ` Johannes Weiner
2025-12-22  3:42       ` Qi Zheng
2025-12-30 20:07       ` David Hildenbrand (Red Hat)
2025-12-19 23:51   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 14/28] mm: mglru: prevent memory cgroup release in mglru Qi Zheng
2025-12-17 22:18   ` Johannes Weiner
2025-12-18  6:50     ` Qi Zheng
2025-12-20  0:58     ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 15/28] mm: memcontrol: prevent memory cgroup release in mem_cgroup_swap_full() Qi Zheng
2025-12-17 22:21   ` Johannes Weiner
2025-12-20  1:05   ` Shakeel Butt
2025-12-22  4:02     ` Qi Zheng
2025-12-26  2:29     ` Chen Ridong
2025-12-17  7:27 ` [PATCH v2 16/28] mm: workingset: prevent memory cgroup release in lru_gen_eviction() Qi Zheng
2025-12-17 22:23   ` Johannes Weiner
2025-12-20  1:06   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 17/28] mm: thp: prevent memory cgroup release in folio_split_queue_lock{_irqsave}() Qi Zheng
2025-12-17 22:27   ` Johannes Weiner
2025-12-20  1:11     ` Shakeel Butt
2025-12-22  3:33       ` Qi Zheng
2025-12-18  9:10   ` David Hildenbrand (Red Hat)
2025-12-17  7:27 ` [PATCH v2 18/28] mm: zswap: prevent memory cgroup release in zswap_compress() Qi Zheng
2025-12-17 22:27   ` Johannes Weiner
2025-12-20  1:14   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 19/28] mm: workingset: prevent lruvec release in workingset_refault() Qi Zheng
2025-12-17 22:30   ` Johannes Weiner
2025-12-18  6:57     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 20/28] mm: zswap: prevent lruvec release in zswap_folio_swapin() Qi Zheng
2025-12-17 22:33   ` Johannes Weiner
2025-12-18  7:09     ` Qi Zheng
2025-12-18 13:02       ` Johannes Weiner
2025-12-20  1:23   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 21/28] mm: swap: prevent lruvec release in lru_gen_clear_refs() Qi Zheng
2025-12-17 22:34   ` Johannes Weiner
2025-12-20  1:24   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 22/28] mm: workingset: prevent lruvec release in workingset_activation() Qi Zheng
2025-12-17 22:36   ` Johannes Weiner
2025-12-20  1:25   ` Shakeel Butt
2025-12-17  7:27 ` [PATCH v2 23/28] mm: memcontrol: prepare for reparenting LRU pages for lruvec lock Qi Zheng
2025-12-18 13:00   ` Johannes Weiner
2025-12-18 13:17     ` Qi Zheng
2025-12-20  2:03   ` Shakeel Butt
2025-12-23  6:14     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 24/28] mm: vmscan: prepare for reparenting traditional LRU folios Qi Zheng
2025-12-18 13:32   ` Johannes Weiner
2025-12-22  3:55     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 25/28] mm: vmscan: prepare for reparenting MGLRU folios Qi Zheng
2025-12-17  7:27 ` [PATCH v2 26/28] mm: memcontrol: refactor memcg_reparent_objcgs() Qi Zheng
2025-12-18 13:45   ` Johannes Weiner
2025-12-22  3:56     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying memory cgroup for LRU folios Qi Zheng
2025-12-18 14:06   ` Johannes Weiner [this message]
2025-12-22  3:59     ` Qi Zheng
2025-12-17  7:27 ` [PATCH v2 28/28] mm: lru: add VM_WARN_ON_ONCE_FOLIO to lru maintenance helpers Qi Zheng
2025-12-18 14:07   ` Johannes Weiner
2025-12-23 20:04 ` [PATCH v2 00/28] Eliminate Dying Memory Cgroup Yosry Ahmed
2025-12-23 23:20   ` Shakeel Butt
2025-12-24  0:07     ` Yosry Ahmed
2025-12-24  0:36       ` Shakeel Butt
2025-12-24  0:43         ` Yosry Ahmed
2025-12-24  0:58           ` Shakeel Butt
2025-12-29  9:42             ` Qi Zheng
2025-12-29 10:52               ` Michal Koutný
2025-12-29  7:48     ` Qi Zheng
2025-12-29  9:35       ` Harry Yoo
2025-12-29  9:46         ` Qi Zheng
2025-12-29 10:53         ` Michal Koutný
2025-12-24  8:43   ` Harry Yoo
2025-12-24 14:51     ` Yosry Ahmed
2025-12-26 11:24       ` Harry Yoo
2025-12-30  1:36 ` Roman Gushchin
2025-12-30  2:44   ` Qi Zheng
2025-12-30  4:20     ` Roman Gushchin
2025-12-30  4:25       ` Qi Zheng
2025-12-30  4:48         ` Shakeel Butt
2025-12-30 16:46           ` Zi Yan
2025-12-30 18:13             ` Shakeel Butt
2025-12-30 19:18               ` Chris Mason
2025-12-30 20:51                 ` Matthew Wilcox
2025-12-30 21:10                   ` Chris Mason
2025-12-30 22:30                     ` Roman Gushchin
2025-12-30 22:03                   ` Roman Gushchin
2025-12-30 21:07                 ` Zi Yan
2025-12-30 19:34             ` Roman Gushchin
2025-12-30 21:13               ` Zi Yan
2025-12-30  4:01   ` Shakeel Butt
2025-12-30  4:11     ` Roman Gushchin
2025-12-30 18:36       ` Shakeel Butt
2025-12-30 20:47         ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUQKdZsMclicBDYx@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=apais@linux.microsoft.com \
    --cc=axelrasmussen@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=david@kernel.org \
    --cc=hamzamahfooz@linux.microsoft.com \
    --cc=harry.yoo@oracle.com \
    --cc=hughd@google.com \
    --cc=imran.f.khan@oracle.com \
    --cc=kamalesh.babulal@oracle.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=qi.zheng@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox