linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Matt Bobrowski <mattbobrowski@google.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	JP Kobryn <inwardvessel@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Suren Baghdasaryan <surenb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v3 07/17] mm: introduce BPF OOM struct ops
Date: Mon, 2 Feb 2026 12:27:10 -0800	[thread overview]
Message-ID: <0ca47c0b-16a2-4a41-8990-2ec73e19563a@linux.dev> (raw)
In-Reply-To: <87qzr6znl0.fsf@linux.dev>



On 1/30/26 3:29 PM, Roman Gushchin wrote:
> Martin KaFai Lau <martin.lau@linux.dev> writes:
> 
>> On 1/26/26 6:44 PM, Roman Gushchin wrote:
>>> +bool bpf_handle_oom(struct oom_control *oc)
>>> +{
>>> +	struct bpf_struct_ops_link *st_link;
>>> +	struct bpf_oom_ops *bpf_oom_ops;
>>> +	struct mem_cgroup *memcg;
>>> +	struct bpf_map *map;
>>> +	int ret = 0;
>>> +
>>> +	/*
>>> +	 * System-wide OOMs are handled by the struct ops attached
>>> +	 * to the root memory cgroup
>>> +	 */
>>> +	memcg = oc->memcg ? oc->memcg : root_mem_cgroup;
>>> +
>>> +	rcu_read_lock_trace();
>>> +
>>> +	/* Find the nearest bpf_oom_ops traversing the cgroup tree upwards */
>>> +	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
>>> +		st_link = rcu_dereference_check(memcg->css.cgroup->bpf.bpf_oom_link,
>>> +						rcu_read_lock_trace_held());
>>> +		if (!st_link)
>>> +			continue;
>>> +
>>> +		map = rcu_dereference_check((st_link->map),
>>> +					    rcu_read_lock_trace_held());
>>> +		if (!map)
>>> +			continue;
>>> +
>>> +		/* Call BPF OOM handler */
>>> +		bpf_oom_ops = bpf_struct_ops_data(map);
>>> +		ret = bpf_ops_handle_oom(bpf_oom_ops, st_link, oc);
>>> +		if (ret && oc->bpf_memory_freed)
>>> +			break;
>>> +		ret = 0;
>>> +	}
>>> +
>>> +	rcu_read_unlock_trace();
>>> +
>>> +	return ret && oc->bpf_memory_freed;
>>> +}
>>> +
>>
>> [ ... ]
>>
>>> +static int bpf_oom_ops_reg(void *kdata, struct bpf_link *link)
>>> +{
>>> +	struct bpf_struct_ops_link *st_link = (struct bpf_struct_ops_link *)link;
>>> +	struct cgroup *cgrp;
>>> +
>>> +	/* The link is not yet fully initialized, but cgroup should be set */
>>> +	if (!link)
>>> +		return -EOPNOTSUPP;
>>> +
>>> +	cgrp = st_link->cgroup;
>>> +	if (!cgrp)
>>> +		return -EINVAL;
>>> +
>>> +	if (cmpxchg(&cgrp->bpf.bpf_oom_link, NULL, st_link))
>>> +		return -EEXIST;
>> iiuc, this will allow only one oom_ops to be attached to a
>> cgroup. Considering oom_ops is the only user of the
>> cgrp->bpf.struct_ops_links (added in patch 2), the list should have
>> only one element for now.
>>
>> Copy some context from the patch 2 commit log.
> 
> Hi Martin!
> 
> Sorry, I'm not quite sure what do you mean, can you please elaborate
> more?
> 
> We decided (in conversations at LPC) that 1 bpf oom policy for
> memcg is good for now (with a potential to extend in the future, if
> there will be use cases). But it seems like there is a lot of interest
> to attach struct ops'es to cgroups (there are already a couple of
> patchsets posted based on my earlier v2 patches), so I tried to make the
> bpf link mechanics suitable for multiple use cases from scratch.
> 
> Did I answer your question?

Got it. The link list is for the future struct_ops implementations to 
attach to a cgroup.

I should have mentioned the context. My bad.

BPF_PROG_TYPE_SOCK_OPS is currently a cgroup BPF prog. I am thinking of 
adding a bpf_struct_ops support to have similar hooks as in the 
BPF_PROG_TYPE_SOCK_OPS. There are some issues that need to be worked 
out. A major one is that the current cgroup progs have expectations on 
the ordering and override behavior based on the BPF_F_* and the runtime 
cgroup hierarchy. I was trying to see if there are pieces in this set 
that can be built upon. The linked list is a start but will need more 
work to make it performant for networking use.

> 
>>
>>> This change doesn't answer the question how bpf programs belonging
>>> to these struct ops'es will be executed. It will be done individually
>>> for every bpf struct ops which supports this.
>>>
>>> Please, note that unlike "normal" bpf programs, struct ops'es
>>> are not propagated to cgroup sub-trees.
>>
>> There are NONE, BPF_F_ALLOW_OVERRIDE, and BPF_F_ALLOW_MULTI, which one
>> may be closer to the bpf_handle_oom() semantic. If it needs to change
>> the ordering (or allow multi) in the future, does it need a new flag
>> or the existing BPF_F_xxx flags can be used.
> 
> I hope that existing flags can be used, but also I'm not sure we ever
> would need multiple oom handlers per cgroup. Do you have any specific
> concerns here?

Another question that I have is the default behavior when none of the 
BPF_F_* is specified when attaching a struct_ops to a cgroup.

 From uapi/bpf.h:

* NONE (default): No further BPF programs allowed in the subtree

iiuc, the bpf_handle_oom() is not the same as NONE. Should each 
struct_ops implementation have its own default policy? For the 
BPF_PROG_TYPE_SOCK_OPS work, I am thinking the default policy should be 
BPF_F_ALLOW_MULTI which is always on/set now in the 
cgroup_bpf_link_attach().




  reply	other threads:[~2026-02-02 20:27 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-27  2:44 [PATCH bpf-next v3 00/17] mm: BPF OOM Roman Gushchin
2026-01-27  2:44 ` [PATCH bpf-next v3 01/17] bpf: move bpf_struct_ops_link into bpf.h Roman Gushchin
2026-01-27  5:50   ` Yafang Shao
2026-01-28 11:28   ` Matt Bobrowski
2026-01-27  2:44 ` [PATCH bpf-next v3 02/17] bpf: allow attaching struct_ops to cgroups Roman Gushchin
2026-01-27  3:08   ` bot+bpf-ci
2026-01-27  5:49   ` Yafang Shao
2026-01-28  3:10   ` Josh Don
2026-01-28 18:52     ` Roman Gushchin
2026-01-28 11:25   ` Matt Bobrowski
2026-01-28 19:18     ` Roman Gushchin
2026-01-27  2:44 ` [PATCH bpf-next v3 03/17] libbpf: fix return value on memory allocation failure Roman Gushchin
2026-01-27  5:52   ` Yafang Shao
2026-01-27  2:44 ` [PATCH bpf-next v3 04/17] libbpf: introduce bpf_map__attach_struct_ops_opts() Roman Gushchin
2026-01-27  3:08   ` bot+bpf-ci
2026-01-27  2:44 ` [PATCH bpf-next v3 05/17] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Roman Gushchin
2026-01-27  6:06   ` Yafang Shao
2026-02-02  4:56   ` Matt Bobrowski
2026-01-27  2:44 ` [PATCH bpf-next v3 06/17] mm: define mem_cgroup_get_from_ino() outside of CONFIG_SHRINKER_DEBUG Roman Gushchin
2026-01-27  6:12   ` Yafang Shao
2026-02-02  3:50   ` Shakeel Butt
2026-01-27  2:44 ` [PATCH bpf-next v3 07/17] mm: introduce BPF OOM struct ops Roman Gushchin
2026-01-27  9:38   ` Michal Hocko
2026-01-27 21:12     ` Roman Gushchin
2026-01-28  8:00       ` Michal Hocko
2026-01-28 18:44         ` Roman Gushchin
2026-02-02  4:06       ` Matt Bobrowski
2026-01-28  3:26   ` Josh Don
2026-01-28 19:03     ` Roman Gushchin
2026-01-28 11:19   ` Michal Hocko
2026-01-28 18:53     ` Roman Gushchin
2026-01-29 21:00   ` Martin KaFai Lau
2026-01-30 23:29     ` Roman Gushchin
2026-02-02 20:27       ` Martin KaFai Lau [this message]
2026-01-27  2:44 ` [PATCH bpf-next v3 08/17] mm: introduce bpf_oom_kill_process() bpf kfunc Roman Gushchin
2026-01-27 20:21   ` Martin KaFai Lau
2026-01-27 20:47     ` Roman Gushchin
2026-02-02  4:49   ` Matt Bobrowski
2026-01-27  2:44 ` [PATCH bpf-next v3 09/17] mm: introduce bpf_out_of_memory() BPF kfunc Roman Gushchin
2026-01-28 20:21   ` Matt Bobrowski
2026-01-27  2:44 ` [PATCH bpf-next v3 10/17] mm: introduce bpf_task_is_oom_victim() kfunc Roman Gushchin
2026-02-02  5:39   ` Matt Bobrowski
2026-02-02 17:30     ` Alexei Starovoitov
2026-02-03  0:14       ` Roman Gushchin
2026-02-03 13:23         ` Michal Hocko
2026-02-03 16:31           ` Alexei Starovoitov
2026-02-04  9:02             ` Michal Hocko
2026-02-05  0:12               ` Alexei Starovoitov
2026-01-27  2:44 ` [PATCH bpf-next v3 11/17] bpf: selftests: introduce read_cgroup_file() helper Roman Gushchin
2026-01-27  3:08   ` bot+bpf-ci
2026-01-27  2:44 ` [PATCH bpf-next v3 12/17] bpf: selftests: BPF OOM struct ops test Roman Gushchin
2026-01-27  2:44 ` [PATCH bpf-next v3 13/17] sched: psi: add a trace point to psi_avgs_work() Roman Gushchin
2026-01-27  2:44 ` [PATCH bpf-next v3 14/17] sched: psi: add cgroup_id field to psi_group structure Roman Gushchin
2026-01-27  2:44 ` [PATCH bpf-next v3 15/17] bpf: allow calling bpf_out_of_memory() from a PSI tracepoint Roman Gushchin
2026-01-27  9:02 ` [PATCH bpf-next v3 00/17] mm: BPF OOM Michal Hocko
2026-01-27 21:01   ` Roman Gushchin
2026-01-28  8:06     ` Michal Hocko
2026-01-28 16:59       ` Alexei Starovoitov
2026-01-28 18:23         ` Roman Gushchin
2026-01-28 18:53           ` Alexei Starovoitov
2026-02-02  3:26         ` Matt Bobrowski
2026-02-02 17:50           ` Alexei Starovoitov
2026-02-04 23:52             ` Matt Bobrowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ca47c0b-16a2-4a41-8990-2ec73e19563a@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mattbobrowski@google.com \
    --cc=mhocko@suse.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox