From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
bpf <bpf@vger.kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Sebastian Sewior <bigeasy@linutronix.de>,
Steven Rostedt <rostedt@goodmis.org>,
Hou Tao <houtao1@huawei.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Jann Horn <jannh@google.com>, Tejun Heo <tj@kernel.org>,
linux-mm <linux-mm@kvack.org>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v9 2/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
Date: Fri, 14 Mar 2025 17:34:08 -0700 [thread overview]
Message-ID: <CAADnVQ+Epu=s=Zu3OsrsDcbSGiLB0T-VXtdhQyTd3bpdV5PDOg@mail.gmail.com> (raw)
In-Reply-To: <rbfpuj6kmbwbzyd25tjhlrf4aytmhmegn5ez54rpb2mue3cxyk@ok46lkhvvfjt>
On Tue, Mar 11, 2025 at 11:04 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Tue, Mar 11, 2025 at 02:32:24PM +0100, Alexei Starovoitov wrote:
> > On Tue, Mar 11, 2025 at 3:04 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Fri, 21 Feb 2025 18:44:23 -0800 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > Tracing BPF programs execute from tracepoints and kprobes where
> > > > running context is unknown, but they need to request additional
> > > > memory. The prior workarounds were using pre-allocated memory and
> > > > BPF specific freelists to satisfy such allocation requests.
> > >
> > > The "prior workarounds" sound entirely appropriate. Because the
> > > performance and maintainability of Linux's page allocator is about
> > > 1,000,040 times more important than relieving BPF of having to carry a
> > > "workaround".
> >
> > Please explain where performance and maintainability is affected?
> >
>
> I have some related questions below. Note I'm a bystander, not claiming
> to have any (N)ACK power.
>
> A small bit before that:
> if (!spin_trylock_irqsave(&zone->lock, flags)) {
> if (unlikely(alloc_flags & ALLOC_TRYLOCK))
> return NULL;
> spin_lock_irqsave(&zone->lock, flags);
> }
>
> This is going to perform worse when contested due to an extra access to
> the lock. I presume it was done this way to avoid suffering another
> branch, with the assumption the trylock is normally going to succeed.
Steven already explained that extra trylock is a noise from performance pov.
Also notice that it's indeed not written as
if (unlikely(alloc_flags & ALLOC_TRYLOCK))
spin_trylock_irqsave(...);
else
spin_lock_irqsave(...);
because this way it will suffer an extra branch.
Even with this extra branch it would have been in the noise
considering all the prior branches rmqueue*() does.
With unconditional trylock first the performance noise is even less
noticeable.
> I think it would help to outline why these are doing any memory
> allocation from something like NMI to begin with. Perhaps you could have
> carved out a very small piece of memory as a reserve just for that? It
> would be refilled as needed from process context.
It's not about NMI. It's about an unknown context.
bpf and other subsystems will not be calling it from NMI on purpose.
It can happen by accident.
See netpoll_send_udp().
It's using alloc_skb(GFP_ATOMIC) which is the same as GFP_WISH_ME_LUCK.
netpoll is called from an arbitrary context where accessing slab is not ok.
> If non-task memory allocs got beaten to the curb, or at least got heavily
> limited, then a small allocator just for that purpose would do the
> trick and the two variants would likely be simpler than one thing which
> supports everyone.
Try diff stat of this patch vs diff of bpf_mem_alloc and other hacks
to see what it takes to build "small allocator".
Also notice how "small allocator" doesn't work for netpoll.
It needs an skb in an unknown context and currently pretends
that GFP_ATOMIC isn't going to crash.
Any context kmalloc will finally fix it.
next prev parent reply other threads:[~2025-03-15 0:34 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-22 2:44 [PATCH bpf-next v9 0/6] bpf, mm: Introduce try_alloc_pages() Alexei Starovoitov
2025-02-22 2:44 ` [PATCH bpf-next v9 1/6] locking/local_lock: Introduce localtry_lock_t Alexei Starovoitov
2025-03-11 15:44 ` Mateusz Guzik
2025-03-11 16:20 ` Sebastian Andrzej Siewior
2025-03-11 16:31 ` Mateusz Guzik
2025-03-11 20:21 ` Vlastimil Babka
2025-03-11 22:24 ` Alexei Starovoitov
2025-03-12 8:29 ` Vlastimil Babka
2025-03-14 21:05 ` Alexei Starovoitov
2025-03-14 21:08 ` Vlastimil Babka
2025-03-14 21:18 ` Alexei Starovoitov
2025-02-22 2:44 ` [PATCH bpf-next v9 2/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation Alexei Starovoitov
2025-03-11 2:04 ` Andrew Morton
2025-03-11 13:32 ` Alexei Starovoitov
2025-03-11 18:04 ` Mateusz Guzik
2025-03-12 9:45 ` Steven Rostedt
2025-03-15 0:34 ` Alexei Starovoitov [this message]
2025-03-12 10:00 ` Vlastimil Babka
2025-03-12 19:06 ` Shakeel Butt
2025-03-13 8:44 ` Michal Hocko
2025-03-13 14:21 ` Vlastimil Babka
2025-03-13 16:02 ` Shakeel Butt
2025-03-14 10:16 ` Michal Hocko
2025-03-15 0:51 ` Alexei Starovoitov
2025-02-22 2:44 ` [PATCH bpf-next v9 3/6] mm, bpf: Introduce free_pages_nolock() Alexei Starovoitov
2025-02-22 2:44 ` [PATCH bpf-next v9 4/6] memcg: Use trylock to access memcg stock_lock Alexei Starovoitov
2025-02-22 2:44 ` [PATCH bpf-next v9 5/6] mm, bpf: Use memcg in try_alloc_pages() Alexei Starovoitov
2025-02-22 2:44 ` [PATCH bpf-next v9 6/6] bpf: Use try_alloc_pages() to allocate pages for bpf needs Alexei Starovoitov
2025-02-26 3:19 ` [PATCH bpf-next v9 0/6] bpf, mm: Introduce try_alloc_pages() Alexei Starovoitov
2025-02-27 17:50 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAADnVQ+Epu=s=Zu3OsrsDcbSGiLB0T-VXtdhQyTd3bpdV5PDOg@mail.gmail.com' \
--to=alexei.starovoitov@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=houtao1@huawei.com \
--cc=jannh@google.com \
--cc=kernel-team@fb.com \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=mhocko@suse.com \
--cc=mjguzik@gmail.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox