From: Shakeel Butt <shakeel.butt@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Peilin Ye <yepeilin@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Josh Don <joshdon@google.com>, Barret Rhoden <brho@google.com>,
linux-mm@kvack.org
Subject: Re: [PATCH bpf] bpf/helpers: Skip memcg accounting in __bpf_async_init()
Date: Fri, 5 Sep 2025 10:31:07 -0700 [thread overview]
Message-ID: <qwrl5ivlaou2qqbrj4wh2vi4uqmeny2zyfidkjizkyyzta3uo3@z6bjemb7om6y> (raw)
In-Reply-To: <CAADnVQKAd-jubdQ9ja=xhTqahs+2bk2a+8VUTj1bnLpueow0Lg@mail.gmail.com>
On Fri, Sep 05, 2025 at 08:18:25AM -0700, Alexei Starovoitov wrote:
> On Thu, Sep 4, 2025 at 11:20 PM Peilin Ye <yepeilin@google.com> wrote:
> >
> > Calling bpf_map_kmalloc_node() from __bpf_async_init() can cause various
> > locking issues; see the following stack trace (edited for style) as one
> > example:
> >
> > ...
> > [10.011566] do_raw_spin_lock.cold
> > [10.011570] try_to_wake_up (5) double-acquiring the same
> > [10.011575] kick_pool rq_lock, causing a hardlockup
> > [10.011579] __queue_work
> > [10.011582] queue_work_on
> > [10.011585] kernfs_notify
> > [10.011589] cgroup_file_notify
> > [10.011593] try_charge_memcg (4) memcg accounting raises an
> > [10.011597] obj_cgroup_charge_pages MEMCG_MAX event
> > [10.011599] obj_cgroup_charge_account
> > [10.011600] __memcg_slab_post_alloc_hook
> > [10.011603] __kmalloc_node_noprof
> > ...
> > [10.011611] bpf_map_kmalloc_node
> > [10.011612] __bpf_async_init
> > [10.011615] bpf_timer_init (3) BPF calls bpf_timer_init()
> > [10.011617] bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
> > [10.011619] bpf__sched_ext_ops_runnable
> > [10.011620] enqueue_task_scx (2) BPF runs with rq_lock held
> > [10.011622] enqueue_task
> > [10.011626] ttwu_do_activate
> > [10.011629] sched_ttwu_pending (1) grabs rq_lock
> > ...
> >
> > The above was reproduced on bpf-next (b338cf849ec8) by modifying
> > ./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
> > ops.runnable(), and hacking [1] the memcg accounting code a bit to make
> > it (much more likely to) raise an MEMCG_MAX event from a
> > bpf_timer_init() call.
> >
> > We have also run into other similar variants both internally (without
> > applying the [1] hack) and on bpf-next, including:
> >
> > * run_timer_softirq() -> cgroup_file_notify()
> > (grabs cgroup_file_kn_lock) -> try_to_wake_up() ->
> > BPF calls bpf_timer_init() -> bpf_map_kmalloc_node() ->
> > try_charge_memcg() raises MEMCG_MAX ->
> > cgroup_file_notify() (tries to grab cgroup_file_kn_lock again)
> >
> > * __queue_work() (grabs worker_pool::lock) -> try_to_wake_up() ->
> > BPF calls bpf_timer_init() -> bpf_map_kmalloc_node() ->
> > try_charge_memcg() raises MEMCG_MAX -> m() ->
> > __queue_work() (tries to grab the same worker_pool::lock)
> > ...
> >
> > As pointed out by Kumar, we can use bpf_mem_alloc() and friends for
> > bpf_hrtimer and bpf_work, to skip memcg accounting.
>
> This is a short term workaround that we shouldn't take.
> Long term bpf_mem_alloc() will use kmalloc_nolock() and
> memcg accounting that was already made to work from any context
> except that the path of memcg_memory_event() wasn't converted.
>
> Shakeel,
>
> Any suggestions how memcg_memory_event()->cgroup_file_notify()
> can be fixed?
> Can we just trylock and skip the event?
Will !gfpflags_allow_spinning(gfp_mask) be able to detect such call
chains? If yes, then we can change memcg_memory_event() to skip calls to
cgroup_file_notify() if spinning is not allowed.
next parent reply other threads:[~2025-09-05 17:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20250905061919.439648-1-yepeilin@google.com>
[not found] ` <CAADnVQKAd-jubdQ9ja=xhTqahs+2bk2a+8VUTj1bnLpueow0Lg@mail.gmail.com>
2025-09-05 17:31 ` Shakeel Butt [this message]
2025-09-05 18:23 ` Peilin Ye
2025-09-05 19:14 ` Peilin Ye
2025-09-05 20:32 ` Shakeel Butt
2025-09-05 19:48 ` Shakeel Butt
2025-09-05 20:31 ` Peilin Ye
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=qwrl5ivlaou2qqbrj4wh2vi4uqmeny2zyfidkjizkyyzta3uo3@z6bjemb7om6y \
--to=shakeel.butt@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=joshdon@google.com \
--cc=kpsingh@kernel.org \
--cc=linux-mm@kvack.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=tj@kernel.org \
--cc=yepeilin@google.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox