linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Peilin Ye <yepeilin@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,  Tejun Heo <tj@kernel.org>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	 Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	 Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	 KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>,
	 Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>,
	 Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Josh Don <joshdon@google.com>, Barret Rhoden <brho@google.com>,
	 linux-mm@kvack.org
Subject: Re: [PATCH bpf] bpf/helpers: Skip memcg accounting in __bpf_async_init()
Date: Fri, 5 Sep 2025 10:31:07 -0700	[thread overview]
Message-ID: <qwrl5ivlaou2qqbrj4wh2vi4uqmeny2zyfidkjizkyyzta3uo3@z6bjemb7om6y> (raw)
In-Reply-To: <CAADnVQKAd-jubdQ9ja=xhTqahs+2bk2a+8VUTj1bnLpueow0Lg@mail.gmail.com>

On Fri, Sep 05, 2025 at 08:18:25AM -0700, Alexei Starovoitov wrote:
> On Thu, Sep 4, 2025 at 11:20 PM Peilin Ye <yepeilin@google.com> wrote:
> >
> > Calling bpf_map_kmalloc_node() from __bpf_async_init() can cause various
> > locking issues; see the following stack trace (edited for style) as one
> > example:
> >
> > ...
> >  [10.011566]  do_raw_spin_lock.cold
> >  [10.011570]  try_to_wake_up             (5) double-acquiring the same
> >  [10.011575]  kick_pool                      rq_lock, causing a hardlockup
> >  [10.011579]  __queue_work
> >  [10.011582]  queue_work_on
> >  [10.011585]  kernfs_notify
> >  [10.011589]  cgroup_file_notify
> >  [10.011593]  try_charge_memcg           (4) memcg accounting raises an
> >  [10.011597]  obj_cgroup_charge_pages        MEMCG_MAX event
> >  [10.011599]  obj_cgroup_charge_account
> >  [10.011600]  __memcg_slab_post_alloc_hook
> >  [10.011603]  __kmalloc_node_noprof
> > ...
> >  [10.011611]  bpf_map_kmalloc_node
> >  [10.011612]  __bpf_async_init
> >  [10.011615]  bpf_timer_init             (3) BPF calls bpf_timer_init()
> >  [10.011617]  bpf_prog_xxxxxxxxxxxxxxxx_fcg_runnable
> >  [10.011619]  bpf__sched_ext_ops_runnable
> >  [10.011620]  enqueue_task_scx           (2) BPF runs with rq_lock held
> >  [10.011622]  enqueue_task
> >  [10.011626]  ttwu_do_activate
> >  [10.011629]  sched_ttwu_pending         (1) grabs rq_lock
> > ...
> >
> > The above was reproduced on bpf-next (b338cf849ec8) by modifying
> > ./tools/sched_ext/scx_flatcg.bpf.c to call bpf_timer_init() during
> > ops.runnable(), and hacking [1] the memcg accounting code a bit to make
> > it (much more likely to) raise an MEMCG_MAX event from a
> > bpf_timer_init() call.
> >
> > We have also run into other similar variants both internally (without
> > applying the [1] hack) and on bpf-next, including:
> >
> >  * run_timer_softirq() -> cgroup_file_notify()
> >    (grabs cgroup_file_kn_lock) -> try_to_wake_up() ->
> >    BPF calls bpf_timer_init() -> bpf_map_kmalloc_node() ->
> >    try_charge_memcg() raises MEMCG_MAX ->
> >    cgroup_file_notify() (tries to grab cgroup_file_kn_lock again)
> >
> >  * __queue_work() (grabs worker_pool::lock) -> try_to_wake_up() ->
> >    BPF calls bpf_timer_init() -> bpf_map_kmalloc_node() ->
> >    try_charge_memcg() raises MEMCG_MAX -> m() ->
> >    __queue_work() (tries to grab the same worker_pool::lock)
> >  ...
> >
> > As pointed out by Kumar, we can use bpf_mem_alloc() and friends for
> > bpf_hrtimer and bpf_work, to skip memcg accounting.
> 
> This is a short term workaround that we shouldn't take.
> Long term bpf_mem_alloc() will use kmalloc_nolock() and
> memcg accounting that was already made to work from any context
> except that the path of memcg_memory_event() wasn't converted.
> 
> Shakeel,
> 
> Any suggestions how memcg_memory_event()->cgroup_file_notify()
> can be fixed?
> Can we just trylock and skip the event?

Will !gfpflags_allow_spinning(gfp_mask) be able to detect such call
chains? If yes, then we can change memcg_memory_event() to skip calls to
cgroup_file_notify() if spinning is not allowed.


       reply	other threads:[~2025-09-05 17:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250905061919.439648-1-yepeilin@google.com>
     [not found] ` <CAADnVQKAd-jubdQ9ja=xhTqahs+2bk2a+8VUTj1bnLpueow0Lg@mail.gmail.com>
2025-09-05 17:31   ` Shakeel Butt [this message]
2025-09-05 18:23     ` Peilin Ye
2025-09-05 19:14     ` Peilin Ye
2025-09-05 20:32       ` Shakeel Butt
2025-09-05 19:48     ` Shakeel Butt
2025-09-05 20:31       ` Peilin Ye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=qwrl5ivlaou2qqbrj4wh2vi4uqmeny2zyfidkjizkyyzta3uo3@z6bjemb7om6y \
    --to=shakeel.butt@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=joshdon@google.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yepeilin@google.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox