From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Christoph Lameter <cl@gentwo.de>
Cc: Christoph Hellwig <hch@infradead.org>,
David Miller <davem@davemloft.net>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>, Tejun Heo <tj@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, bpf <bpf@vger.kernel.org>,
Kernel Team <kernel-team@fb.com>, linux-mm <linux-mm@kvack.org>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.
Date: Tue, 28 Jun 2022 10:03:43 -0700 [thread overview]
Message-ID: <20220628170343.ng46xfwi32vefiyp@MacBook-Pro-3.local> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2206281550210.328950@gentwo.de>
On Tue, Jun 28, 2022 at 03:57:54PM +0200, Christoph Lameter wrote:
> On Mon, 27 Jun 2022, Alexei Starovoitov wrote:
>
> > On Mon, Jun 27, 2022 at 5:17 PM Christoph Lameter <cl@gentwo.de> wrote:
> > >
> > > > From: Alexei Starovoitov <ast@kernel.org>
> > > >
> > > > Introduce any context BPF specific memory allocator.
> > > >
> > > > Tracing BPF programs can attach to kprobe and fentry. Hence they
> > > > run in unknown context where calling plain kmalloc() might not be safe.
> > > > Front-end kmalloc() with per-cpu per-bucket cache of free elements.
> > > > Refill this cache asynchronously from irq_work.
> > >
> > > GFP_ATOMIC etc is not going to work for you?
> >
> > slab_alloc_node->slab_alloc->local_lock_irqsave
> > kprobe -> bpf prog -> slab_alloc_node -> deadlock.
> > In other words, the slow path of slab allocator takes locks.
>
> That is a relatively new feature due to RT logic support. without RT this
> would be a simple irq disable.
Not just RT.
It's a slow path:
if (IS_ENABLED(CONFIG_PREEMPT_RT) ||
unlikely(!object || !slab || !node_match(slab, node))) {
local_unlock_irqrestore(&s->cpu_slab->lock,...);
and that's not the only lock in there.
new_slab->allocate_slab... alloc_pages grabbing more locks.
> Generally doing slab allocation while debugging slab allocation is not
> something that can work. Can we exempt RT locks/irqsave or slab alloc from
> BPF tracing?
People started doing lock profiling with bpf back in 2017.
People do rcu profiling now and attaching bpf progs to all kinds of low level
kernel internals: page alloc, etc.
> I would assume that other key items of kernel logic will have similar
> issues.
We're _not_ asking for any changes from mm/slab side.
Things were working all these years. We're making them more efficient now
by getting rid of 'lets prealloc everything' approach.
> > Which makes it unsafe to use from tracing bpf progs.
> > That's why we preallocated all elements in bpf maps,
> > so there are no calls to mm or rcu logic.
> > bpf specific allocator cannot use locks at all.
> > try_lock approach could have been used in alloc path,
> > but free path cannot fail with try_lock.
> > Hence the algorithm in this patch is purely lockless.
> > bpf prog can attach to spin_unlock_irqrestore and
> > safely do bpf_mem_alloc.
>
> That is generally safe unless you get into reetrance issues with memory
> allocation.
Right. Generic slab/mm/page_alloc/rcu are not ready for reentrance and
are not safe from NMI either.
That's why we're added all kinds of safey mechanisms in bpf layers.
> Which begs the question:
>
> What happens if I try to use BPF to trace *your* shiny new memory
'shiny and new' is overstatement. It's a trivial lock less freelist layer
on top of kmalloc. Please read the patch.
> allocation functions in the BPF logic like bpf_mem_alloc? How do you stop
> that from happening?
here is the comment in the patch:
/* notrace is necessary here and in other functions to make sure
* bpf programs cannot attach to them and cause llist corruptions.
*/
next prev parent reply other threads:[~2022-06-28 17:03 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220623003230.37497-1-alexei.starovoitov@gmail.com>
2022-06-27 7:03 ` Christoph Hellwig
2022-06-28 0:17 ` Christoph Lameter
2022-06-28 5:01 ` Alexei Starovoitov
2022-06-28 13:57 ` Christoph Lameter
2022-06-28 17:03 ` Alexei Starovoitov [this message]
2022-06-29 2:35 ` Christoph Lameter
2022-06-29 2:49 ` Alexei Starovoitov
2022-07-04 16:13 ` Vlastimil Babka
2022-07-06 17:43 ` Alexei Starovoitov
2022-07-19 11:52 ` Vlastimil Babka
2022-07-04 20:34 ` Matthew Wilcox
2022-07-06 17:50 ` Alexei Starovoitov
2022-07-06 17:55 ` Matthew Wilcox
2022-07-06 18:05 ` Alexei Starovoitov
2022-07-06 18:21 ` Matthew Wilcox
2022-07-06 18:26 ` Alexei Starovoitov
2022-07-06 18:31 ` Matthew Wilcox
2022-07-06 18:36 ` Alexei Starovoitov
2022-07-06 18:40 ` Matthew Wilcox
2022-07-06 18:51 ` Alexei Starovoitov
2022-07-06 18:55 ` Matthew Wilcox
2022-07-08 13:41 ` Michal Hocko
2022-07-08 17:48 ` Alexei Starovoitov
2022-07-08 20:13 ` Yosry Ahmed
2022-07-08 21:55 ` Shakeel Butt
2022-07-10 5:26 ` Alexei Starovoitov
2022-07-10 7:32 ` Shakeel Butt
2022-07-11 12:15 ` Michal Hocko
2022-07-12 4:39 ` Alexei Starovoitov
2022-07-12 7:40 ` Michal Hocko
2022-07-12 8:39 ` Yafang Shao
2022-07-12 9:52 ` Michal Hocko
2022-07-12 15:25 ` Shakeel Butt
2022-07-12 16:32 ` Tejun Heo
2022-07-12 17:26 ` Shakeel Butt
2022-07-12 17:36 ` Tejun Heo
2022-07-12 18:11 ` Shakeel Butt
2022-07-12 18:43 ` Alexei Starovoitov
2022-07-13 13:56 ` Yafang Shao
2022-07-12 19:11 ` Mina Almasry
2022-07-12 16:24 ` Tejun Heo
2022-07-18 14:13 ` Michal Hocko
2022-07-13 2:39 ` Roman Gushchin
2022-07-13 14:24 ` Yafang Shao
2022-07-13 16:24 ` Tejun Heo
2022-07-14 6:15 ` Yafang Shao
2022-07-18 17:55 ` Yosry Ahmed
2022-07-19 11:30 ` cgroup specific sticky resources (was: Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.) Michal Hocko
2022-07-19 18:00 ` Yosry Ahmed
2022-07-19 18:01 ` Yosry Ahmed
2022-07-19 18:46 ` Mina Almasry
2022-07-19 19:16 ` Tejun Heo
2022-07-19 19:30 ` Yosry Ahmed
2022-07-19 19:38 ` Tejun Heo
2022-07-19 19:40 ` Yosry Ahmed
2022-07-19 19:47 ` Mina Almasry
2022-07-19 19:54 ` Tejun Heo
2022-07-19 20:16 ` Mina Almasry
2022-07-19 20:29 ` Tejun Heo
2022-07-20 12:26 ` Michal Hocko
2022-07-12 18:40 ` [PATCH bpf-next 0/5] bpf: BPF specific memory allocator Alexei Starovoitov
2022-07-18 12:27 ` Michal Hocko
2022-07-13 2:27 ` Roman Gushchin
2022-07-11 12:22 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220628170343.ng46xfwi32vefiyp@MacBook-Pro-3.local \
--to=alexei.starovoitov@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cl@gentwo.de \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hch@infradead.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=kafai@fb.com \
--cc=kernel-team@fb.com \
--cc=linux-mm@kvack.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox