From: Alice Ryhl <aliceryhl@google.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Nhat Pham <nphamcs@gmail.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Linux Memory Management List <linux-mm@kvack.org>,
Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
cgroups@vger.kernel.org,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [QUESTION] What memcg lifetime is required by list_lru_add?
Date: Thu, 28 Nov 2024 13:27:03 +0100 [thread overview]
Message-ID: <CAH5fLggKrb4LZk6JL5A0jJODA1zJs+94AU5NMmyV9ksraigF7A@mail.gmail.com> (raw)
In-Reply-To: <Z0eXrllVhRI9Ag5b@dread.disaster.area>
On Wed, Nov 27, 2024 at 11:05 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Nov 27, 2024 at 10:04:51PM +0100, Alice Ryhl wrote:
> > Dear SHRINKER and MEMCG experts,
> >
> > When using list_lru_add() and list_lru_del(), it seems to be required
> > that you pass the same value of nid and memcg to both calls, since
> > list_lru_del() might otherwise try to delete it from the wrong list /
> > delete it while holding the wrong spinlock. I'm trying to understand
> > the implications of this requirement on the lifetime of the memcg.
> >
> > Now, looking at list_lru_add_obj() I noticed that it uses rcu locking
> > to keep the memcg object alive for the duration of list_lru_add().
> > That rcu locking is used here seems to imply that without it, the
> > memcg could be deallocated during the list_lru_add() call, which is of
> > course bad. But rcu is not enough on its own to keep the memcg alive
> > all the way until the list_lru_del_obj() call, so how does it ensure
> > that the memcg stays valid for that long?
>
> We don't care if the memcg goes away whilst there are objects on the
> LRU. memcg destruction will reparent the objects to a different
> memcg via memcg_reparent_list_lrus() before the memcg is torn down.
> New objects should not be added to the memcg LRUs once the memcg
> teardown process starts, so there should never be add vs reparent
> races during teardown.
>
> Hence all the list_lru_add_obj() function needs to do is ensure that
> the locking/lifecycle rules for the memcg object that
> mem_cgroup_from_slab_obj() returns are obeyed.
>
> > And if there is a mechanism
> > to keep the memcg alive for the entire duration between add and del,
>
> It's enforced by the -complex- state machine used to tear down
> control groups.
>
> tl;dr: If the memcg gets torn down, it will reparent the objects on
> the LRU to it's parent memcg during the teardown process.
>
> This reparenting happens in the cgroup ->css_offline() method, which
> only happens after the cgroup reference count goes to zero and is
> waited on via:
>
> kill_css
> percpu_ref_kill_and_confirm(css_killed_ref_fn)
> <wait>
> css_killed_ref_fn
> offline_css
> mem_cgroup_css_offline
> memcg_offline_kmem
> {
> .....
> memcg_reparent_objcgs(memcg, parent);
>
> /*
> * After we have finished memcg_reparent_objcgs(), all list_lrus
> * corresponding to this cgroup are guaranteed to remain empty.
> * The ordering is imposed by list_lru_node->lock taken by
> * memcg_reparent_list_lrus().
> */
> memcg_reparent_list_lrus(memcg, parent)
> }
>
> Then the cgroup teardown control code then schedules the freeing
> of the memcg container via a RCU work callback when the reference
> count is globally visible as killed and the reference count has gone
> to zero.
>
> Hence the cgroup infrastructure requires RCU protection for the
> duration of unreferenced cgroup object accesses. This allows for
> subsystems to perform operations on the cgroup object without
> needing to holding cgroup references for every access. The complex,
> multi-stage teardown process allows for cgroup objects to release
> objects that it tracks hence avoiding the need for every object the
> cgroup tracks to hold a reference count on the cgroup.
>
> See the comment above css_free_rwork_fn() for more details about the
> teardown process:
>
> /*
> * css destruction is four-stage process.
> *
> * 1. Destruction starts. Killing of the percpu_ref is initiated.
> * Implemented in kill_css().
> *
> * 2. When the percpu_ref is confirmed to be visible as killed on all CPUs
> * and thus css_tryget_online() is guaranteed to fail, the css can be
> * offlined by invoking offline_css(). After offlining, the base ref is
> * put. Implemented in css_killed_work_fn().
> *
> * 3. When the percpu_ref reaches zero, the only possible remaining
> * accessors are inside RCU read sections. css_release() schedules the
> * RCU callback.
> *
> * 4. After the grace period, the css can be freed. Implemented in
> * css_free_rwork_fn().
> *
> * It is actually hairier because both step 2 and 4 require process context
> * and thus involve punting to css->destroy_work adding two additional
> * steps to the already complex sequence.
> */
Thanks a lot Dave, this clears it up for me.
I sent a patch containing some additional docs for list_lru:
https://lore.kernel.org/all/20241128-list_lru_memcg_docs-v1-1-7e4568978f4e@google.com/
Alice
next prev parent reply other threads:[~2024-11-28 12:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 21:04 Alice Ryhl
2024-11-27 22:05 ` Dave Chinner
2024-11-28 12:27 ` Alice Ryhl [this message]
2024-12-03 10:44 ` Michal Koutný
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAH5fLggKrb4LZk6JL5A0jJODA1zJs+94AU5NMmyV9ksraigF7A@mail.gmail.com \
--to=aliceryhl@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox