From: Vlastimil Babka <vbabka@suse.cz>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Harry Yoo <harry.yoo@oracle.com>, bpf <bpf@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Sebastian Sewior <bigeasy@linutronix.de>,
Andrii Nakryiko <andrii@kernel.org>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
Date: Tue, 16 Sep 2025 21:06:46 +0200 [thread overview]
Message-ID: <c370486e-cb8f-4201-b70e-2bdddab9e642@suse.cz> (raw)
In-Reply-To: <CAADnVQJbj3OqS9x6MOmnmHa=69ACVEKa=QX-KVAncyocjCn1AQ@mail.gmail.com>
On 9/16/25 20:46, Alexei Starovoitov wrote:
> On Tue, Sep 16, 2025 at 11:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 9/16/25 18:18, Alexei Starovoitov wrote:
>> > On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>> >>
>> >> On 9/16/25 14:58, Harry Yoo wrote:
>> >> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
>> >> >> On 9/16/25 04:21, Alexei Starovoitov wrote:
>> >> >> > From: Alexei Starovoitov <ast@kernel.org>
>> >> >> >
>> >> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
>> >> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >> >> > kprobe -> bpf -> kmalloc_nolock().
>> >> >> >
>> >> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> >> >>
>> >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
>> >> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
>> >> >>
>> >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
>> >> >>
>> >> >> > /*
>> >> >> > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
>> >> >> > * can be acquired without a deadlock before invoking the function.
>> >> >> > *
>> >> >> > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
>> >> >> > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
>> >> >> > * and kmalloc() is not used in an unsupported context.
>> >> >> > *
>> >> >> > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
>> >> >> > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
>> >> >> > * lockdep_assert() will catch a bug in case:
>> >> >> > * #1
>> >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
>> >> >> > * or
>> >> >> > * #2
>> >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
>> >> >>
>> >> >> AFAICS see we now eliminated this possibility.
>> >> >
>> >> > Right.
>> >> >
>> >> >> > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
>> >> >> > * disabled context. The lock will always be acquired and if needed it
>> >> >> > * block and sleep until the lock is available.
>> >> >> > * #1 is possible in !PREEMPT_RT only.
>> >> >>
>> >> >> Yes because this in kmalloc_nolock_noprof()
>> >> >>
>> >> >> if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>> >> >> /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
>> >> >> return NULL;
>> >> >>
>> >> >>
>> >> >> > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
>> >> >> > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
>> >> >> > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
>> >> >> And this is no longer possible, so can we just remove these comments and drop
>> >> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?
>> >> >
>> >> > Makes sense and sounds good to me.
>> >> >
>> >> > Also in the commit mesage should be adjusted too:
>> >> >> kmalloc_nolock() can be called from any context and can re-enter
>> >> >> into ___slab_alloc():
>> >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
>> >> >> kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >> >> or
>> >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
>> >> >> kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >> >
>> >> > The lattter path is not possible anymore,
>> >> >
>> >> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
>> >> >> rt_spin_lock is locked by current _task_. In this case re-entrance into
>> >> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
>> >> >> bucket that is most likely is not locked by the current task.
>> >> >> Though it may be locked by a different task it's safe to rt_spin_lock() and
>> >> >> sleep on it.
>> >> >
>> >> > and this paragraph is no longer valid either?
>> >>
>> >> Thanks for confirming! Let's see if Alexei agrees or we both missed
>> >> something.
>> >
>> > Not quite.
>> > This patch prevents
>> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> > kprobe -> bpf
>> >
>> > to make sure kprobe cannot be inserted in the _middle_ of
>> > freelist operations.
>> > kprobe/tracepoint outside of freelist is not a concern,
>> > and
>> > malloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> > tracepoint -> bpf
>> >
>> > is still possible. Especially on RT.
>>
>> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the
>> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin
>> that up.
>
> hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function.
> It disallows kprobes anywhere within the body,
> but it doesn't make it 'notrace', so tracing the first nop5
> is still ok.
Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock
operations you mentioned (local_lock_irqsave()). That's not part of the
___slab_alloc() body so you're right we have not eliminated it.
>>
>> But with nmi that's variant of #1 of that comment.
>>
>> Like for ___slab_alloc() we need to prevent #2 with no nmi?
>> example on !RT:
>>
>> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf ->
>> kfree_nolock() -> do_slab_free()
>>
>> in_nmi() || !USE_LOCKLESS_FAST_PATH()
>> false || false, we proceed, no checking of local_lock_is_locked()
>>
>> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT)
>> -> __update_cpu_freelist_fast()
>>
>> Am I missing something?
>
> It's ok to call __update_cpu_freelist_fast(). It won't break anything.
> Because only nmi can make this cpu to be in the middle of freelist update.
You're right, freeing uses the "slowpath" (local_lock protected instead of
cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
a kprobe on !RT because it doesn't exist there at all. The only one is in
___slab_alloc() and that's covered.
I do really hope we'll get to the point where sheaves will be able to
replace the other percpu slab caching completely... it should be much simpler.
next prev parent reply other threads:[~2025-09-16 19:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-16 2:21 Alexei Starovoitov
2025-09-16 10:40 ` Vlastimil Babka
2025-09-16 12:58 ` Harry Yoo
2025-09-16 13:13 ` Vlastimil Babka
2025-09-16 16:18 ` Alexei Starovoitov
2025-09-16 18:12 ` Vlastimil Babka
2025-09-16 18:46 ` Alexei Starovoitov
2025-09-16 19:06 ` Vlastimil Babka [this message]
2025-09-16 20:26 ` Alexei Starovoitov
2025-09-17 7:02 ` Harry Yoo
2025-09-17 7:06 ` Harry Yoo
2025-09-17 18:26 ` Alexei Starovoitov
2025-09-17 18:34 ` Vlastimil Babka
2025-09-17 18:40 ` Alexei Starovoitov
2025-09-16 10:59 ` Harry Yoo
2025-09-16 12:25 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c370486e-cb8f-4201-b70e-2bdddab9e642@suse.cz \
--to=vbabka@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=linux-mm@kvack.org \
--cc=memxor@gmail.com \
--cc=mhocko@suse.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=shakeel.butt@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox