linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>, bpf <bpf@vger.kernel.org>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	 Sebastian Sewior <bigeasy@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Hou Tao <houtao1@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	Matthew Wilcox <willy@infradead.org>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Jann Horn <jannh@google.com>, Tejun Heo <tj@kernel.org>,
	 linux-mm <linux-mm@kvack.org>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v4 1/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
Date: Tue, 14 Jan 2025 10:29:04 -0800	[thread overview]
Message-ID: <CAADnVQ+kGLtY0eWwaSL4To3z1KwgmsASvYHFkUXtyiVbvCNDbA@mail.gmail.com> (raw)
In-Reply-To: <20250114103946.GC8362@noisy.programming.kicks-ass.net>

On Tue, Jan 14, 2025 at 2:39 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jan 14, 2025 at 11:19:41AM +0100, Michal Hocko wrote:
> > On Tue 14-01-25 10:53:55, Peter Zijlstra wrote:
> > > On Mon, Jan 13, 2025 at 06:19:17PM -0800, Alexei Starovoitov wrote:
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > >
> > > > Tracing BPF programs execute from tracepoints and kprobes where
> > > > running context is unknown, but they need to request additional
> > > > memory.
> > >
> > > > The prior workarounds were using pre-allocated memory and
> > > > BPF specific freelists to satisfy such allocation requests.
> > > > Instead, introduce gfpflags_allow_spinning() condition that signals
> > > > to the allocator that running context is unknown.
> > > > Then rely on percpu free list of pages to allocate a page.
> > > > The rmqueue_pcplist() should be able to pop the page from.
> > > > If it fails (due to IRQ re-entrancy or list being empty) then
> > > > try_alloc_pages() attempts to spin_trylock zone->lock
> > > > and refill percpu freelist as normal.
> > >
> > > > BPF program may execute with IRQs disabled and zone->lock is
> > > > sleeping in RT, so trylock is the only option.
> > >
> > > how is spin_trylock() from IRQ context not utterly broken in RT?
> >
> > +     if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> > +             return NULL;
> >
> > Deals with that, right?
>
> Changelog didn't really mention that, did it? -- it seems to imply quite
> the opposite :/

Hmm. Until you said it I didn't read it as "imply the opposite" :(

The cover letter is pretty clear...
"
- Since spin_trylock() is not safe in RT from hard IRQ and NMI
  disable such usage in lock_trylock and in try_alloc_pages().
"

and the patch 2 commit log is clear too...

"
Since spin_trylock() cannot be used in RT from hard IRQ or NMI
it uses lockless link list...
"

and further in patch 3 commit log...

"
Use spin_trylock in PREEMPT_RT when not in hard IRQ and not in NMI
and fail instantly otherwise, since spin_trylock is not safe from IRQ
due to PI issues.
"

I guess I can reword this particular sentence in patch 1 commit log,
but before jumping to an incorrect conclusion please read the
whole set.

> But maybe, I suppose any BPF program needs to expect failure due to this
> being trylock. I just worry some programs will malfunction due to never
> succeeding -- and RT getting blamed for this.
>
> Maybe I worry too much.

Humans will find a way to blame BPF and/or RT for all of their problems
anyway. Just days ago BPF was blamed in RT for causing IPIs during JIT.
Valentin's patches are going to address that but ain't noone has time
to explain that continuously.

Seriously, though, the number of things that still run in hard irq context
in RT is so small that if some tracing BPF prog is attached there
it should be using prealloc mode. Full prealloc is still
the default for bpf hash map.


  parent reply	other threads:[~2025-01-14 18:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-14  2:19 [PATCH bpf-next v4 0/6] bpf, mm: Introduce try_alloc_pages() Alexei Starovoitov
2025-01-14  2:19 ` [PATCH bpf-next v4 1/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation Alexei Starovoitov
2025-01-14  9:53   ` Peter Zijlstra
2025-01-14 10:19     ` Michal Hocko
2025-01-14 10:39       ` Peter Zijlstra
2025-01-14 10:43         ` Michal Hocko
2025-01-14 18:29         ` Alexei Starovoitov [this message]
2025-01-14 18:34           ` Steven Rostedt
2025-01-14 10:31   ` Michal Hocko
2025-01-15  1:23     ` Alexei Starovoitov
2025-01-15  8:35       ` Michal Hocko
2025-01-15 22:33         ` Alexei Starovoitov
2025-01-14  2:19 ` [PATCH bpf-next v4 2/6] mm, bpf: Introduce free_pages_nolock() Alexei Starovoitov
2025-01-14  2:19 ` [PATCH bpf-next v4 3/6] locking/local_lock: Introduce local_trylock_irqsave() Alexei Starovoitov
2025-01-14  2:19 ` [PATCH bpf-next v4 4/6] memcg: Use trylock to access memcg stock_lock Alexei Starovoitov
2025-01-14 10:39   ` Michal Hocko
2025-01-14  2:19 ` [PATCH bpf-next v4 5/6] mm, bpf: Use memcg in try_alloc_pages() Alexei Starovoitov
2025-01-14  2:19 ` [PATCH bpf-next v4 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAADnVQ+kGLtY0eWwaSL4To3z1KwgmsASvYHFkUXtyiVbvCNDbA@mail.gmail.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=houtao1@huawei.com \
    --cc=jannh@google.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=memxor@gmail.com \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox