From: Matt Bobrowski <mattbobrowski@google.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: bpf@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, JP Kobryn <inwardvessel@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc
Date: Wed, 31 Dec 2025 07:41:58 +0000 [thread overview]
Message-ID: <aVTTxjwgNgWMF-9Q@google.com> (raw)
In-Reply-To: <7ia4ms2zwuqb.fsf@castle.c.googlers.com>
On Tue, Dec 30, 2025 at 09:00:28PM +0000, Roman Gushchin wrote:
> Matt Bobrowski <mattbobrowski@google.com> writes:
>
> > On Mon, Dec 22, 2025 at 08:41:53PM -0800, Roman Gushchin wrote:
> >> Introduce a BPF kfunc to get a trusted pointer to the root memory
> >> cgroup. It's very handy to traverse the full memcg tree, e.g.
> >> for handling a system-wide OOM.
> >>
> >> It's possible to obtain this pointer by traversing the memcg tree
> >> up from any known memcg, but it's sub-optimal and makes BPF programs
> >> more complex and less efficient.
> >>
> >> bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
> >> however in reality it's not necessary to bump the corresponding
> >> reference counter - root memory cgroup is immortal, reference counting
> >> is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
> >> memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
> >> obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
> >>
> >> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
> >> ---
> >> mm/bpf_memcontrol.c | 20 ++++++++++++++++++++
> >> 1 file changed, 20 insertions(+)
> >>
> >> diff --git a/mm/bpf_memcontrol.c b/mm/bpf_memcontrol.c
> >> index 82eb95de77b7..187919eb2fe2 100644
> >> --- a/mm/bpf_memcontrol.c
> >> +++ b/mm/bpf_memcontrol.c
> >> @@ -10,6 +10,25 @@
> >>
> >> __bpf_kfunc_start_defs();
> >>
> >> +/**
> >> + * bpf_get_root_mem_cgroup - Returns a pointer to the root memory cgroup
> >> + *
> >> + * The function has KF_ACQUIRE semantics, even though the root memory
> >> + * cgroup is never destroyed after being created and doesn't require
> >> + * reference counting. And it's perfectly safe to pass it to
> >> + * bpf_put_mem_cgroup()
> >> + *
> >> + * Return: A pointer to the root memory cgroup.
> >> + */
> >> +__bpf_kfunc struct mem_cgroup *bpf_get_root_mem_cgroup(void)
> >> +{
> >> + if (mem_cgroup_disabled())
> >> + return NULL;
> >> +
> >> + /* css_get() is not needed */
> >> + return root_mem_cgroup;
> >> +}
> >> +
> >> /**
> >> * bpf_get_mem_cgroup - Get a reference to a memory cgroup
> >> * @css: pointer to the css structure
> >> @@ -64,6 +83,7 @@ __bpf_kfunc void bpf_put_mem_cgroup(struct mem_cgroup *memcg)
> >> __bpf_kfunc_end_defs();
> >>
> >> BTF_KFUNCS_START(bpf_memcontrol_kfuncs)
> >> +BTF_ID_FLAGS(func, bpf_get_root_mem_cgroup, KF_ACQUIRE | KF_RET_NULL)
> >
> > I feel as though relying on KF_ACQUIRE semantics here is somewhat
> > odd. Users of this BPF kfunc will now be forced to call
> > bpf_put_mem_cgroup() on the returned root_mem_cgroup, despite it being
> > completely unnecessary.
>
> A agree that it's annoying, but I doubt this extra call makes any
> difference in the real world.
Sure, that certainly holds true.
> Also, the corresponding kernel code designed to hide the special
> handling of the root cgroup. css_get()/css_put() are simple no-ops for
> the root cgroup, but are totally valid.
Yes, I do see that.
> So in most places the root cgroup is handled as any other, which
> simplifies the code. I guess the same will be true for many bpf
> programs.
I see, however the same might not necessarily hold for all other
global pointers which end up being handed out by a BPF kfunc (not
necessarily bpf_get_root_mem_cgroup()). This is why I was wondering
whether there's some sense to introducing another KF flag (or
something similar) which allows returned values from BPF kfuncs to be
implicitly treated as trusted.
next prev parent reply other threads:[~2025-12-31 7:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-23 4:41 [PATCH bpf-next v4 0/6] mm: bpf kfuncs to access memcg data Roman Gushchin
2025-12-23 4:41 ` [PATCH bpf-next v4 1/6] mm: declare memcg_page_state_output() in memcontrol.h Roman Gushchin
2025-12-23 4:41 ` [PATCH bpf-next v4 2/6] mm: introduce BPF kfuncs to deal with memcg pointers Roman Gushchin
2025-12-23 4:41 ` [PATCH bpf-next v4 3/6] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc Roman Gushchin
2025-12-30 20:27 ` Matt Bobrowski
2025-12-30 21:00 ` Roman Gushchin
2025-12-31 7:41 ` Matt Bobrowski [this message]
2025-12-31 17:02 ` Roman Gushchin
2025-12-31 17:32 ` Alexei Starovoitov
2025-12-23 4:41 ` [PATCH bpf-next v4 4/6] mm: introduce BPF kfuncs to access memcg statistics and events Roman Gushchin
2025-12-23 4:41 ` [PATCH bpf-next v4 5/6] bpf: selftests: selftests for memcg stat kfuncs Roman Gushchin
2025-12-23 4:41 ` [PATCH bpf-next v4 6/6] MAINTAINERS: add an entry for MM BPF extensions Roman Gushchin
2025-12-23 19:25 ` [PATCH bpf-next v4 0/6] mm: bpf kfuncs to access memcg data Alexei Starovoitov
2025-12-23 19:57 ` Roman Gushchin
2025-12-24 3:41 ` Konstantin Ryabitsev
2025-12-23 19:46 ` Shakeel Butt
2025-12-24 3:01 ` Yafang Shao
2025-12-25 1:16 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aVTTxjwgNgWMF-9Q@google.com \
--to=mattbobrowski@google.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=hannes@cmpxchg.org \
--cc=inwardvessel@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox