linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
@ 2025-09-16  2:21 Alexei Starovoitov
  2025-09-16 10:40 ` Vlastimil Babka
  2025-09-16 10:59 ` Harry Yoo
  0 siblings, 2 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2025-09-16  2:21 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: vbabka, harry.yoo, shakeel.butt, mhocko, bigeasy, andrii, memxor,
	akpm, peterz, rostedt, hannes

From: Alexei Starovoitov <ast@kernel.org>

Disallow kprobes in ___slab_alloc() to prevent reentrance:
kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
kprobe -> bpf -> kmalloc_nolock().

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 mm/slub.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index c995f3bec69d..922d47b10c2f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -45,7 +45,7 @@
 #include <kunit/test-bug.h>
 #include <linux/sort.h>
 #include <linux/irq_work.h>
-
+#include <linux/kprobes.h>
 #include <linux/debugfs.h>
 #include <trace/events/kmem.h>
 
@@ -4697,6 +4697,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 
 	goto load_freelist;
 }
+NOKPROBE_SYMBOL(___slab_alloc);
 
 /*
  * A wrapper for ___slab_alloc() for contexts where preemption is not yet
-- 
2.47.3



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16  2:21 [PATCH slab] slab: Disallow kprobes in ___slab_alloc() Alexei Starovoitov
@ 2025-09-16 10:40 ` Vlastimil Babka
  2025-09-16 12:58   ` Harry Yoo
  2025-09-16 10:59 ` Harry Yoo
  1 sibling, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2025-09-16 10:40 UTC (permalink / raw)
  To: Alexei Starovoitov, bpf, linux-mm
  Cc: harry.yoo, shakeel.butt, mhocko, bigeasy, andrii, memxor, akpm,
	peterz, rostedt, hannes

On 9/16/25 04:21, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Disallow kprobes in ___slab_alloc() to prevent reentrance:
> kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
> kprobe -> bpf -> kmalloc_nolock().
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);

But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:

> /*
>  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
>  * can be acquired without a deadlock before invoking the function.
>  *
>  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
>  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
>  * and kmalloc() is not used in an unsupported context.
>  *
>  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
>  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
>  * lockdep_assert() will catch a bug in case:
>  * #1
>  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
>  * or
>  * #2
>  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()

AFAICS see we now eliminated this possibility.

>  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
>  * disabled context. The lock will always be acquired and if needed it
>  * block and sleep until the lock is available.
>  * #1 is possible in !PREEMPT_RT only.

Yes because this in kmalloc_nolock_noprof()

        if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
                /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
                return NULL;


>  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
>  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
>  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)

And this is no longer possible, so can we just remove these comments and drop
"slab: Make slub local_(try)lock more precise for LOCKDEP" now? 
>  * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B
>  */

However, what about the freeing path?
Shouldn't we do the same with __slab_free() to prevent fast path messing up
an interrupted slow path?

And a question below:

> ---
>  mm/slub.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index c995f3bec69d..922d47b10c2f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -45,7 +45,7 @@
>  #include <kunit/test-bug.h>
>  #include <linux/sort.h>
>  #include <linux/irq_work.h>
> -
> +#include <linux/kprobes.h>
>  #include <linux/debugfs.h>
>  #include <trace/events/kmem.h>
>  
> @@ -4697,6 +4697,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  
>  	goto load_freelist;
>  }
> +NOKPROBE_SYMBOL(___slab_alloc);

Is that guaranteed to work as expected without also making ___slab_alloc noinline?
  
>  /*
>   * A wrapper for ___slab_alloc() for contexts where preemption is not yet



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16  2:21 [PATCH slab] slab: Disallow kprobes in ___slab_alloc() Alexei Starovoitov
  2025-09-16 10:40 ` Vlastimil Babka
@ 2025-09-16 10:59 ` Harry Yoo
  2025-09-16 12:25   ` Vlastimil Babka
  1 sibling, 1 reply; 16+ messages in thread
From: Harry Yoo @ 2025-09-16 10:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-mm, vbabka, shakeel.butt, mhocko, bigeasy, andrii,
	memxor, akpm, peterz, rostedt, hannes

On Mon, Sep 15, 2025 at 07:21:40PM -0700, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Disallow kprobes in ___slab_alloc() to prevent reentrance:
> kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
> kprobe -> bpf -> kmalloc_nolock().
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Maybe I'm a bit paranoid, but should we also add this to
all functions that ___slab_alloc() calls?

-- 
Cheers,
Harry / Hyeonggon

>  mm/slub.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index c995f3bec69d..922d47b10c2f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -45,7 +45,7 @@
>  #include <kunit/test-bug.h>
>  #include <linux/sort.h>
>  #include <linux/irq_work.h>
> -
> +#include <linux/kprobes.h>
>  #include <linux/debugfs.h>
>  #include <trace/events/kmem.h>
>  
> @@ -4697,6 +4697,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  
>  	goto load_freelist;
>  }
> +NOKPROBE_SYMBOL(___slab_alloc);
>  
>  /*
>   * A wrapper for ___slab_alloc() for contexts where preemption is not yet
> -- 
> 2.47.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 10:59 ` Harry Yoo
@ 2025-09-16 12:25   ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2025-09-16 12:25 UTC (permalink / raw)
  To: Harry Yoo, Alexei Starovoitov
  Cc: bpf, linux-mm, shakeel.butt, mhocko, bigeasy, andrii, memxor,
	akpm, peterz, rostedt, hannes

On 9/16/25 12:59, Harry Yoo wrote:
> On Mon, Sep 15, 2025 at 07:21:40PM -0700, Alexei Starovoitov wrote:
>> From: Alexei Starovoitov <ast@kernel.org>
>> 
>> Disallow kprobes in ___slab_alloc() to prevent reentrance:
>> kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> kprobe -> bpf -> kmalloc_nolock().
>> 
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
> 
> Maybe I'm a bit paranoid, but should we also add this to
> all functions that ___slab_alloc() calls?

It should be enough to all that it calls under local_lock_cpu_slab().
AFAICS these are all trivial and inline. Maybe get_freelist() and the
functions it calls is more complex. Should we mark all that __always_inline
and would we be able to to rely on this? Or rather add the
NOKPROBE_SYMBOL()? (but it would be wrong if that turned the function to be
non-inlined).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 10:40 ` Vlastimil Babka
@ 2025-09-16 12:58   ` Harry Yoo
  2025-09-16 13:13     ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Harry Yoo @ 2025-09-16 12:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Alexei Starovoitov, bpf, linux-mm, shakeel.butt, mhocko, bigeasy,
	andrii, memxor, akpm, peterz, rostedt, hannes

On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
> On 9/16/25 04:21, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> > 
> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
> > kprobe -> bpf -> kmalloc_nolock().
> > 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> 
> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
> 
> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
> 
> > /*
> >  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
> >  * can be acquired without a deadlock before invoking the function.
> >  *
> >  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
> >  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
> >  * and kmalloc() is not used in an unsupported context.
> >  *
> >  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
> >  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
> >  * lockdep_assert() will catch a bug in case:
> >  * #1
> >  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
> >  * or
> >  * #2
> >  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
> 
> AFAICS see we now eliminated this possibility.

Right.

> >  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
> >  * disabled context. The lock will always be acquired and if needed it
> >  * block and sleep until the lock is available.
> >  * #1 is possible in !PREEMPT_RT only.
> 
> Yes because this in kmalloc_nolock_noprof()
> 
>         if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>                 /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
>                 return NULL;
> 
> 
> >  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
> >  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
> >  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
> And this is no longer possible, so can we just remove these comments and drop
> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?

Makes sense and sounds good to me.

Also in the commit mesage should be adjusted too:
> kmalloc_nolock() can be called from any context and can re-enter
> into ___slab_alloc():
>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
>     kmalloc_nolock() -> ___slab_alloc(cache_B)
> or
>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
>    kmalloc_nolock() -> ___slab_alloc(cache_B)

The lattter path is not possible anymore,

> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
> rt_spin_lock is locked by current _task_. In this case re-entrance into
> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
> bucket that is most likely is not locked by the current task.
> Though it may be locked by a different task it's safe to rt_spin_lock() and
> sleep on it.

and this paragraph is no longer valid either?

> >  * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B
> >  */
> 
> However, what about the freeing path?
> Shouldn't we do the same with __slab_free() to prevent fast path messing up
> an interrupted slow path?

Hmm right, but we have:

(in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked()

check in do_slab_free() if !allow_spin?

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 12:58   ` Harry Yoo
@ 2025-09-16 13:13     ` Vlastimil Babka
  2025-09-16 16:18       ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2025-09-16 13:13 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Alexei Starovoitov, bpf, linux-mm, shakeel.butt, mhocko, bigeasy,
	andrii, memxor, akpm, peterz, rostedt, hannes

On 9/16/25 14:58, Harry Yoo wrote:
> On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
>> On 9/16/25 04:21, Alexei Starovoitov wrote:
>> > From: Alexei Starovoitov <ast@kernel.org>
>> > 
>> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
>> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> > kprobe -> bpf -> kmalloc_nolock().
>> > 
>> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> 
>> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
>> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
>> 
>> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
>> 
>> > /*
>> >  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
>> >  * can be acquired without a deadlock before invoking the function.
>> >  *
>> >  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
>> >  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
>> >  * and kmalloc() is not used in an unsupported context.
>> >  *
>> >  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
>> >  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
>> >  * lockdep_assert() will catch a bug in case:
>> >  * #1
>> >  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
>> >  * or
>> >  * #2
>> >  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
>> 
>> AFAICS see we now eliminated this possibility.
> 
> Right.
> 
>> >  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
>> >  * disabled context. The lock will always be acquired and if needed it
>> >  * block and sleep until the lock is available.
>> >  * #1 is possible in !PREEMPT_RT only.
>> 
>> Yes because this in kmalloc_nolock_noprof()
>> 
>>         if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>>                 /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
>>                 return NULL;
>> 
>> 
>> >  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
>> >  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
>> >  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
>> And this is no longer possible, so can we just remove these comments and drop
>> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?
> 
> Makes sense and sounds good to me.
> 
> Also in the commit mesage should be adjusted too:
>> kmalloc_nolock() can be called from any context and can re-enter
>> into ___slab_alloc():
>>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
>>     kmalloc_nolock() -> ___slab_alloc(cache_B)
>> or
>>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
>>    kmalloc_nolock() -> ___slab_alloc(cache_B)
> 
> The lattter path is not possible anymore,
> 
>> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
>> rt_spin_lock is locked by current _task_. In this case re-entrance into
>> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
>> bucket that is most likely is not locked by the current task.
>> Though it may be locked by a different task it's safe to rt_spin_lock() and
>> sleep on it.
> 
> and this paragraph is no longer valid either?

Thanks for confirming! Let's see if Alexei agrees or we both missed
something.

>> >  * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B
>> >  */
>> 
>> However, what about the freeing path?
>> Shouldn't we do the same with __slab_free() to prevent fast path messing up
>> an interrupted slow path?
> 
> Hmm right, but we have:
> 
> (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked()

Yes, but like in the alloc case, this doesn't trigger in the
!in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right?

But now I think I see another solution here. Since we're already under
"if (!allow_spin)" we could stick a very ugly goto there to skip the
fastpath if we don't defer_free()?
(apparently declaration under a goto label is a C23 extension)

diff --git a/mm/slub.c b/mm/slub.c
index 6e858a6e397c..212c0e3e5007 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
 {
        /* cnt == 0 signals that it's called from kfree_nolock() */
        bool allow_spin = cnt;
+       __maybe_unused unsigned long flags;
        struct kmem_cache_cpu *c;
        unsigned long tid;
        void **freelist;
@@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
                        return;
                }
                cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */
+
+               /* prevent a fastpath interrupting a slowpath */
+               goto no_lockless;
        }
 
        if (USE_LOCKLESS_FAST_PATH()) {
@@ -6501,8 +6505,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
                        goto redo;
                }
        } else {
-               __maybe_unused unsigned long flags = 0;
-
+no_lockless:
                /* Update the free list under the local lock */
                local_lock_cpu_slab(s, flags);
                c = this_cpu_ptr(s->cpu_slab);






 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 13:13     ` Vlastimil Babka
@ 2025-09-16 16:18       ` Alexei Starovoitov
  2025-09-16 18:12         ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-09-16 16:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/16/25 14:58, Harry Yoo wrote:
> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
> >> On 9/16/25 04:21, Alexei Starovoitov wrote:
> >> > From: Alexei Starovoitov <ast@kernel.org>
> >> >
> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
> >> > kprobe -> bpf -> kmalloc_nolock().
> >> >
> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >>
> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
> >>
> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
> >>
> >> > /*
> >> >  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
> >> >  * can be acquired without a deadlock before invoking the function.
> >> >  *
> >> >  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
> >> >  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
> >> >  * and kmalloc() is not used in an unsupported context.
> >> >  *
> >> >  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
> >> >  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
> >> >  * lockdep_assert() will catch a bug in case:
> >> >  * #1
> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
> >> >  * or
> >> >  * #2
> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
> >>
> >> AFAICS see we now eliminated this possibility.
> >
> > Right.
> >
> >> >  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
> >> >  * disabled context. The lock will always be acquired and if needed it
> >> >  * block and sleep until the lock is available.
> >> >  * #1 is possible in !PREEMPT_RT only.
> >>
> >> Yes because this in kmalloc_nolock_noprof()
> >>
> >>         if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> >>                 /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
> >>                 return NULL;
> >>
> >>
> >> >  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
> >> >  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
> >> >  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
> >> And this is no longer possible, so can we just remove these comments and drop
> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?
> >
> > Makes sense and sounds good to me.
> >
> > Also in the commit mesage should be adjusted too:
> >> kmalloc_nolock() can be called from any context and can re-enter
> >> into ___slab_alloc():
> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
> >>     kmalloc_nolock() -> ___slab_alloc(cache_B)
> >> or
> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
> >>    kmalloc_nolock() -> ___slab_alloc(cache_B)
> >
> > The lattter path is not possible anymore,
> >
> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
> >> rt_spin_lock is locked by current _task_. In this case re-entrance into
> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
> >> bucket that is most likely is not locked by the current task.
> >> Though it may be locked by a different task it's safe to rt_spin_lock() and
> >> sleep on it.
> >
> > and this paragraph is no longer valid either?
>
> Thanks for confirming! Let's see if Alexei agrees or we both missed
> something.

Not quite.
This patch prevents
kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
    kprobe -> bpf

to make sure kprobe cannot be inserted in the _middle_ of
freelist operations.
kprobe/tracepoint outside of freelist is not a concern,
and
malloc() -> ___slab_alloc() -> local_lock_irqsave() ->
   tracepoint -> bpf

is still possible. Especially on RT.

For example, there are trace_contention_begin/end in spinlocks
and mutexes, so on RT it's easy to construct such reentrance
because local_lock_irqsave() is an rtmutex.
On !RT it's just irqsave and as far as I could analyze
the code there are no tracepoints in
local_lock_cpu_slab()/unlock critical sections.

But it would be fine to add a tracepoint (though silly)
if it is not splitting freelist operation.
Like, local_lock_cpu_slab() is used in put_cpu_partial().
There is no need to mark it as NOKPROBE,
since it doesn't conflict with __update_cpu_freelist_fast().
and it's ok to add a tracepoint _anywhere_ in put_cpu_partial().
It's also fine to add a tracepoint right after
local_lock_cpu_slab() in ___slab_alloc(),
but not within freelist manipulations.
rtmutex's trace_contention tracepoint is such
safe tracepoint within a critical section.

So "more precise for LOCKDEP" patch is still needed.

I thought about whether do_slab_free() should be marked as NOKPROBE,
but that's not necessary. There is freelist manipulation
there under local_lock_cpu_slab(), but it's RT only,
and there is no fast path there.

>
> >> >  * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B
> >> >  */
> >>
> >> However, what about the freeing path?
> >> Shouldn't we do the same with __slab_free() to prevent fast path messing up
> >> an interrupted slow path?
> >
> > Hmm right, but we have:
> >
> > (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked()
>
> Yes, but like in the alloc case, this doesn't trigger in the
> !in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right?
>
> But now I think I see another solution here. Since we're already under
> "if (!allow_spin)" we could stick a very ugly goto there to skip the
> fastpath if we don't defer_free()?
> (apparently declaration under a goto label is a C23 extension)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 6e858a6e397c..212c0e3e5007 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
>  {
>         /* cnt == 0 signals that it's called from kfree_nolock() */
>         bool allow_spin = cnt;
> +       __maybe_unused unsigned long flags;
>         struct kmem_cache_cpu *c;
>         unsigned long tid;
>         void **freelist;
> @@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
>                         return;
>                 }
>                 cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */
> +
> +               /* prevent a fastpath interrupting a slowpath */
> +               goto no_lockless;

I'm missing why this is needed.

do_slab_free() does:
                if ((in_nmi() || !USE_LOCKLESS_FAST_PATH()) &&
                    local_lock_is_locked(&s->cpu_slab->lock)) {
                        defer_free(s, head); return;

It's the same check as in kmalloc_nolock() to avoid invalid:
freelist ops -> nmi -> bpf -> __update_cpu_freelist_fast.

The big comment in kmalloc_nolock() applies here too.
I didn't copy paste it.
Maybe worth adding a reference like:
/* See comment in kmalloc_nolock() why fast path should be skipped
in_nmi() && lock_is_locked() */

So this patch with NOKPROBE_SYMBOL(___slab_alloc)
is enough, afaict.
Maybe expand a comment with the above details while folding?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 16:18       ` Alexei Starovoitov
@ 2025-09-16 18:12         ` Vlastimil Babka
  2025-09-16 18:46           ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2025-09-16 18:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On 9/16/25 18:18, Alexei Starovoitov wrote:
> On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 9/16/25 14:58, Harry Yoo wrote:
>> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
>> >> On 9/16/25 04:21, Alexei Starovoitov wrote:
>> >> > From: Alexei Starovoitov <ast@kernel.org>
>> >> >
>> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
>> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >> > kprobe -> bpf -> kmalloc_nolock().
>> >> >
>> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> >>
>> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
>> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
>> >>
>> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
>> >>
>> >> > /*
>> >> >  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
>> >> >  * can be acquired without a deadlock before invoking the function.
>> >> >  *
>> >> >  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
>> >> >  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
>> >> >  * and kmalloc() is not used in an unsupported context.
>> >> >  *
>> >> >  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
>> >> >  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
>> >> >  * lockdep_assert() will catch a bug in case:
>> >> >  * #1
>> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
>> >> >  * or
>> >> >  * #2
>> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
>> >>
>> >> AFAICS see we now eliminated this possibility.
>> >
>> > Right.
>> >
>> >> >  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
>> >> >  * disabled context. The lock will always be acquired and if needed it
>> >> >  * block and sleep until the lock is available.
>> >> >  * #1 is possible in !PREEMPT_RT only.
>> >>
>> >> Yes because this in kmalloc_nolock_noprof()
>> >>
>> >>         if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>> >>                 /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
>> >>                 return NULL;
>> >>
>> >>
>> >> >  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
>> >> >  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
>> >> >  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
>> >> And this is no longer possible, so can we just remove these comments and drop
>> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?
>> >
>> > Makes sense and sounds good to me.
>> >
>> > Also in the commit mesage should be adjusted too:
>> >> kmalloc_nolock() can be called from any context and can re-enter
>> >> into ___slab_alloc():
>> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
>> >>     kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >> or
>> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
>> >>    kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >
>> > The lattter path is not possible anymore,
>> >
>> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
>> >> rt_spin_lock is locked by current _task_. In this case re-entrance into
>> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
>> >> bucket that is most likely is not locked by the current task.
>> >> Though it may be locked by a different task it's safe to rt_spin_lock() and
>> >> sleep on it.
>> >
>> > and this paragraph is no longer valid either?
>>
>> Thanks for confirming! Let's see if Alexei agrees or we both missed
>> something.
> 
> Not quite.
> This patch prevents
> kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>     kprobe -> bpf
> 
> to make sure kprobe cannot be inserted in the _middle_ of
> freelist operations.
> kprobe/tracepoint outside of freelist is not a concern,
> and
> malloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>    tracepoint -> bpf
> 
> is still possible. Especially on RT.

Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the
whole scope of ___slab_alloc() but that's not the case. Thanks for clearin
that up.

> I thought about whether do_slab_free() should be marked as NOKPROBE,
> but that's not necessary. There is freelist manipulation
> there under local_lock_cpu_slab(), but it's RT only,
> and there is no fast path there.

There's __update_cpu_freelist_fast() called from do_slab_free() for !RT?

>>
>> >> >  * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B
>> >> >  */
>> >>
>> >> However, what about the freeing path?
>> >> Shouldn't we do the same with __slab_free() to prevent fast path messing up
>> >> an interrupted slow path?
>> >
>> > Hmm right, but we have:
>> >
>> > (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked()
>>
>> Yes, but like in the alloc case, this doesn't trigger in the
>> !in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right?
>>
>> But now I think I see another solution here. Since we're already under
>> "if (!allow_spin)" we could stick a very ugly goto there to skip the
>> fastpath if we don't defer_free()?
>> (apparently declaration under a goto label is a C23 extension)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 6e858a6e397c..212c0e3e5007 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
>>  {
>>         /* cnt == 0 signals that it's called from kfree_nolock() */
>>         bool allow_spin = cnt;
>> +       __maybe_unused unsigned long flags;
>>         struct kmem_cache_cpu *c;
>>         unsigned long tid;
>>         void **freelist;
>> @@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
>>                         return;
>>                 }
>>                 cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */
>> +
>> +               /* prevent a fastpath interrupting a slowpath */
>> +               goto no_lockless;
> 
> I'm missing why this is needed.
> 
> do_slab_free() does:
>                 if ((in_nmi() || !USE_LOCKLESS_FAST_PATH()) &&
>                     local_lock_is_locked(&s->cpu_slab->lock)) {
>                         defer_free(s, head); return;
> 
> It's the same check as in kmalloc_nolock() to avoid invalid:
> freelist ops -> nmi -> bpf -> __update_cpu_freelist_fast.
> 
> The big comment in kmalloc_nolock() applies here too.

But with nmi that's variant of #1 of that comment.

Like for ___slab_alloc() we need to prevent #2 with no nmi?
example on !RT:

kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf ->
kfree_nolock() -> do_slab_free()

in_nmi() || !USE_LOCKLESS_FAST_PATH()
false || false, we proceed, no checking of local_lock_is_locked()

if (USE_LOCKLESS_FAST_PATH()) { - true (!RT)
-> __update_cpu_freelist_fast()

Am I missing something?

> I didn't copy paste it.
> Maybe worth adding a reference like:
> /* See comment in kmalloc_nolock() why fast path should be skipped
> in_nmi() && lock_is_locked() */
> 
> So this patch with NOKPROBE_SYMBOL(___slab_alloc)
> is enough, afaict.
> Maybe expand a comment with the above details while folding?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 18:12         ` Vlastimil Babka
@ 2025-09-16 18:46           ` Alexei Starovoitov
  2025-09-16 19:06             ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-09-16 18:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On Tue, Sep 16, 2025 at 11:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 9/16/25 18:18, Alexei Starovoitov wrote:
> > On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> On 9/16/25 14:58, Harry Yoo wrote:
> >> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
> >> >> On 9/16/25 04:21, Alexei Starovoitov wrote:
> >> >> > From: Alexei Starovoitov <ast@kernel.org>
> >> >> >
> >> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
> >> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
> >> >> > kprobe -> bpf -> kmalloc_nolock().
> >> >> >
> >> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >> >>
> >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
> >> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
> >> >>
> >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
> >> >>
> >> >> > /*
> >> >> >  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
> >> >> >  * can be acquired without a deadlock before invoking the function.
> >> >> >  *
> >> >> >  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
> >> >> >  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
> >> >> >  * and kmalloc() is not used in an unsupported context.
> >> >> >  *
> >> >> >  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
> >> >> >  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
> >> >> >  * lockdep_assert() will catch a bug in case:
> >> >> >  * #1
> >> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
> >> >> >  * or
> >> >> >  * #2
> >> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
> >> >>
> >> >> AFAICS see we now eliminated this possibility.
> >> >
> >> > Right.
> >> >
> >> >> >  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
> >> >> >  * disabled context. The lock will always be acquired and if needed it
> >> >> >  * block and sleep until the lock is available.
> >> >> >  * #1 is possible in !PREEMPT_RT only.
> >> >>
> >> >> Yes because this in kmalloc_nolock_noprof()
> >> >>
> >> >>         if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
> >> >>                 /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
> >> >>                 return NULL;
> >> >>
> >> >>
> >> >> >  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
> >> >> >  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
> >> >> >  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
> >> >> And this is no longer possible, so can we just remove these comments and drop
> >> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?
> >> >
> >> > Makes sense and sounds good to me.
> >> >
> >> > Also in the commit mesage should be adjusted too:
> >> >> kmalloc_nolock() can be called from any context and can re-enter
> >> >> into ___slab_alloc():
> >> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
> >> >>     kmalloc_nolock() -> ___slab_alloc(cache_B)
> >> >> or
> >> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
> >> >>    kmalloc_nolock() -> ___slab_alloc(cache_B)
> >> >
> >> > The lattter path is not possible anymore,
> >> >
> >> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
> >> >> rt_spin_lock is locked by current _task_. In this case re-entrance into
> >> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
> >> >> bucket that is most likely is not locked by the current task.
> >> >> Though it may be locked by a different task it's safe to rt_spin_lock() and
> >> >> sleep on it.
> >> >
> >> > and this paragraph is no longer valid either?
> >>
> >> Thanks for confirming! Let's see if Alexei agrees or we both missed
> >> something.
> >
> > Not quite.
> > This patch prevents
> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
> >     kprobe -> bpf
> >
> > to make sure kprobe cannot be inserted in the _middle_ of
> > freelist operations.
> > kprobe/tracepoint outside of freelist is not a concern,
> > and
> > malloc() -> ___slab_alloc() -> local_lock_irqsave() ->
> >    tracepoint -> bpf
> >
> > is still possible. Especially on RT.
>
> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the
> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin
> that up.

hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function.
It disallows kprobes anywhere within the body,
but it doesn't make it 'notrace', so tracing the first nop5
is still ok.

> > I thought about whether do_slab_free() should be marked as NOKPROBE,
> > but that's not necessary. There is freelist manipulation
> > there under local_lock_cpu_slab(), but it's RT only,
> > and there is no fast path there.
>
> There's __update_cpu_freelist_fast() called from do_slab_free() for !RT?

yes.
do_slab_free() -> USE_LOCKLESS_FAST_PATH -> __update_cpu_freelist_fast.

> >>
> >> >> >  * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B
> >> >> >  */
> >> >>
> >> >> However, what about the freeing path?
> >> >> Shouldn't we do the same with __slab_free() to prevent fast path messing up
> >> >> an interrupted slow path?
> >> >
> >> > Hmm right, but we have:
> >> >
> >> > (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked()
> >>
> >> Yes, but like in the alloc case, this doesn't trigger in the
> >> !in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right?
> >>
> >> But now I think I see another solution here. Since we're already under
> >> "if (!allow_spin)" we could stick a very ugly goto there to skip the
> >> fastpath if we don't defer_free()?
> >> (apparently declaration under a goto label is a C23 extension)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 6e858a6e397c..212c0e3e5007 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
> >>  {
> >>         /* cnt == 0 signals that it's called from kfree_nolock() */
> >>         bool allow_spin = cnt;
> >> +       __maybe_unused unsigned long flags;
> >>         struct kmem_cache_cpu *c;
> >>         unsigned long tid;
> >>         void **freelist;
> >> @@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
> >>                         return;
> >>                 }
> >>                 cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */
> >> +
> >> +               /* prevent a fastpath interrupting a slowpath */
> >> +               goto no_lockless;
> >
> > I'm missing why this is needed.
> >
> > do_slab_free() does:
> >                 if ((in_nmi() || !USE_LOCKLESS_FAST_PATH()) &&
> >                     local_lock_is_locked(&s->cpu_slab->lock)) {
> >                         defer_free(s, head); return;
> >
> > It's the same check as in kmalloc_nolock() to avoid invalid:
> > freelist ops -> nmi -> bpf -> __update_cpu_freelist_fast.
> >
> > The big comment in kmalloc_nolock() applies here too.
>
> But with nmi that's variant of #1 of that comment.
>
> Like for ___slab_alloc() we need to prevent #2 with no nmi?
> example on !RT:
>
> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf ->
> kfree_nolock() -> do_slab_free()
>
> in_nmi() || !USE_LOCKLESS_FAST_PATH()
> false || false, we proceed, no checking of local_lock_is_locked()
>
> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT)
> -> __update_cpu_freelist_fast()
>
> Am I missing something?

It's ok to call __update_cpu_freelist_fast(). It won't break anything.
Because only nmi can make this cpu to be in the middle of freelist update.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 18:46           ` Alexei Starovoitov
@ 2025-09-16 19:06             ` Vlastimil Babka
  2025-09-16 20:26               ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2025-09-16 19:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On 9/16/25 20:46, Alexei Starovoitov wrote:
> On Tue, Sep 16, 2025 at 11:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 9/16/25 18:18, Alexei Starovoitov wrote:
>> > On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>> >>
>> >> On 9/16/25 14:58, Harry Yoo wrote:
>> >> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
>> >> >> On 9/16/25 04:21, Alexei Starovoitov wrote:
>> >> >> > From: Alexei Starovoitov <ast@kernel.org>
>> >> >> >
>> >> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
>> >> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >> >> > kprobe -> bpf -> kmalloc_nolock().
>> >> >> >
>> >> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> >> >>
>> >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
>> >> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
>> >> >>
>> >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
>> >> >>
>> >> >> > /*
>> >> >> >  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
>> >> >> >  * can be acquired without a deadlock before invoking the function.
>> >> >> >  *
>> >> >> >  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
>> >> >> >  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
>> >> >> >  * and kmalloc() is not used in an unsupported context.
>> >> >> >  *
>> >> >> >  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
>> >> >> >  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
>> >> >> >  * lockdep_assert() will catch a bug in case:
>> >> >> >  * #1
>> >> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
>> >> >> >  * or
>> >> >> >  * #2
>> >> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
>> >> >>
>> >> >> AFAICS see we now eliminated this possibility.
>> >> >
>> >> > Right.
>> >> >
>> >> >> >  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
>> >> >> >  * disabled context. The lock will always be acquired and if needed it
>> >> >> >  * block and sleep until the lock is available.
>> >> >> >  * #1 is possible in !PREEMPT_RT only.
>> >> >>
>> >> >> Yes because this in kmalloc_nolock_noprof()
>> >> >>
>> >> >>         if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>> >> >>                 /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
>> >> >>                 return NULL;
>> >> >>
>> >> >>
>> >> >> >  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
>> >> >> >  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
>> >> >> >  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
>> >> >> And this is no longer possible, so can we just remove these comments and drop
>> >> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?
>> >> >
>> >> > Makes sense and sounds good to me.
>> >> >
>> >> > Also in the commit mesage should be adjusted too:
>> >> >> kmalloc_nolock() can be called from any context and can re-enter
>> >> >> into ___slab_alloc():
>> >> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
>> >> >>     kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >> >> or
>> >> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
>> >> >>    kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >> >
>> >> > The lattter path is not possible anymore,
>> >> >
>> >> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
>> >> >> rt_spin_lock is locked by current _task_. In this case re-entrance into
>> >> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
>> >> >> bucket that is most likely is not locked by the current task.
>> >> >> Though it may be locked by a different task it's safe to rt_spin_lock() and
>> >> >> sleep on it.
>> >> >
>> >> > and this paragraph is no longer valid either?
>> >>
>> >> Thanks for confirming! Let's see if Alexei agrees or we both missed
>> >> something.
>> >
>> > Not quite.
>> > This patch prevents
>> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >     kprobe -> bpf
>> >
>> > to make sure kprobe cannot be inserted in the _middle_ of
>> > freelist operations.
>> > kprobe/tracepoint outside of freelist is not a concern,
>> > and
>> > malloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >    tracepoint -> bpf
>> >
>> > is still possible. Especially on RT.
>>
>> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the
>> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin
>> that up.
> 
> hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function.
> It disallows kprobes anywhere within the body,
> but it doesn't make it 'notrace', so tracing the first nop5
> is still ok.

Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock
operations you mentioned (local_lock_irqsave()). That's not part of the
___slab_alloc() body so you're right we have not eliminated it.

>>
>> But with nmi that's variant of #1 of that comment.
>>
>> Like for ___slab_alloc() we need to prevent #2 with no nmi?
>> example on !RT:
>>
>> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf ->
>> kfree_nolock() -> do_slab_free()
>>
>> in_nmi() || !USE_LOCKLESS_FAST_PATH()
>> false || false, we proceed, no checking of local_lock_is_locked()
>>
>> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT)
>> -> __update_cpu_freelist_fast()
>>
>> Am I missing something?
> 
> It's ok to call __update_cpu_freelist_fast(). It won't break anything.
> Because only nmi can make this cpu to be in the middle of freelist update.

You're right, freeing uses the "slowpath" (local_lock protected instead of
cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
a kprobe on !RT because it doesn't exist there at all. The only one is in
___slab_alloc() and that's covered.

I do really hope we'll get to the point where sheaves will be able to
replace the other percpu slab caching completely... it should be much simpler.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 19:06             ` Vlastimil Babka
@ 2025-09-16 20:26               ` Alexei Starovoitov
  2025-09-17  7:02                 ` Harry Yoo
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-09-16 20:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> >>
> >> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the
> >> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin
> >> that up.
> >
> > hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function.
> > It disallows kprobes anywhere within the body,
> > but it doesn't make it 'notrace', so tracing the first nop5
> > is still ok.
>
> Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock
> operations you mentioned (local_lock_irqsave()). That's not part of the
> ___slab_alloc() body so you're right we have not eliminated it.

Ahh. Yes. All functions that ___slab_alloc() calls
are not affected and it's ok.
There are no calls in the middle freelist update.

> >>
> >> But with nmi that's variant of #1 of that comment.
> >>
> >> Like for ___slab_alloc() we need to prevent #2 with no nmi?
> >> example on !RT:
> >>
> >> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf ->
> >> kfree_nolock() -> do_slab_free()
> >>
> >> in_nmi() || !USE_LOCKLESS_FAST_PATH()
> >> false || false, we proceed, no checking of local_lock_is_locked()
> >>
> >> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT)
> >> -> __update_cpu_freelist_fast()
> >>
> >> Am I missing something?
> >
> > It's ok to call __update_cpu_freelist_fast(). It won't break anything.
> > Because only nmi can make this cpu to be in the middle of freelist update.
>
> You're right, freeing uses the "slowpath" (local_lock protected instead of
> cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
> a kprobe on !RT because it doesn't exist there at all. The only one is in
> ___slab_alloc() and that's covered.

yep.

> I do really hope we'll get to the point where sheaves will be able to
> replace the other percpu slab caching completely... it should be much simpler.

+1.
Since we're talking about long term plans...
would be really cool to have per-numa allocators.
Like per-cpu alloc, but per-numa. And corresponding this_numa_add()
that will use atomic_add underneath.
Regular atomic across all nodes are becoming quite slow
in modern cpus, while per-cpu counters are too expensive from memory pov.
per-numa could be such middle ground with fast enough operations
and good memory usage.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-16 20:26               ` Alexei Starovoitov
@ 2025-09-17  7:02                 ` Harry Yoo
  2025-09-17  7:06                   ` Harry Yoo
  0 siblings, 1 reply; 16+ messages in thread
From: Harry Yoo @ 2025-09-17  7:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlastimil Babka, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote:
> On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > >>
> > >> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the
> > >> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin
> > >> that up.
> > >
> > > hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function.
> > > It disallows kprobes anywhere within the body,
> > > but it doesn't make it 'notrace', so tracing the first nop5
> > > is still ok.
> >
> > Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock
> > operations you mentioned (local_lock_irqsave()). That's not part of the
> > ___slab_alloc() body so you're right we have not eliminated it.
> 
> Ahh. Yes. All functions that ___slab_alloc() calls
> are not affected and it's ok.
> There are no calls in the middle freelist update.

Gotcha! I'm confused about this too :)

> > >> But with nmi that's variant of #1 of that comment.
> > >>
> > >> Like for ___slab_alloc() we need to prevent #2 with no nmi?
> > >> example on !RT:
> > >>
> > >> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf ->
> > >> kfree_nolock() -> do_slab_free()
> > >>
> > >> in_nmi() || !USE_LOCKLESS_FAST_PATH()
> > >> false || false, we proceed, no checking of local_lock_is_locked()
> > >>
> > >> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT)
> > >> -> __update_cpu_freelist_fast()
> > >>
> > >> Am I missing something?
> > >
> > > It's ok to call __update_cpu_freelist_fast(). It won't break anything.
> > > Because only nmi can make this cpu to be in the middle of freelist update.
> >
> > You're right, freeing uses the "slowpath" (local_lock protected instead of
> > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
> > a kprobe on !RT because it doesn't exist there at all.

Right.

> > The only one is in ___slab_alloc() and that's covered.

Right.

and this is a question not relevant to reentrant kmalloc:

On PREEMPT_RT, disabling fastpath in the alloc path makes sense because
both paths updates c->freelist, but in the free path, by disabling the
lockless fastpath, what are we protecting against?

the free fastpath updates c->freelist but not slab->freelist, and
the free slowpath updates slab->freelist but not c->freelist?

I failed to imagine how things can go wrong if we enable the lockless
fastpath in the free path.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-17  7:02                 ` Harry Yoo
@ 2025-09-17  7:06                   ` Harry Yoo
  2025-09-17 18:26                     ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Harry Yoo @ 2025-09-17  7:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlastimil Babka, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On Wed, Sep 17, 2025 at 04:02:25PM +0900, Harry Yoo wrote:
> On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote:
> > On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything.
> > > > Because only nmi can make this cpu to be in the middle of freelist update.
> > >
> > > You're right, freeing uses the "slowpath" (local_lock protected instead of
> > > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
> > > a kprobe on !RT because it doesn't exist there at all.
> 
> Right.
> 
> > > The only one is in ___slab_alloc() and that's covered.
> 
> Right.
> 
> and this is a question not relevant to reentrant kmalloc:
> 
> On PREEMPT_RT, disabling fastpath in the alloc path makes sense because
> both paths updates c->freelist, but in the free path, by disabling the
> lockless fastpath, what are we protecting against?
> 
> the free fastpath updates c->freelist but not slab->freelist, and
> the free slowpath updates slab->freelist but not c->freelist?
> 
> I failed to imagine how things can go wrong if we enable the lockless
> fastpath in the free path.

Oops, sorry. it slipped my mind. Things can go wrong if the free fastpath
is executed while the alloc slowpath is executing and gets preempted.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-17  7:06                   ` Harry Yoo
@ 2025-09-17 18:26                     ` Alexei Starovoitov
  2025-09-17 18:34                       ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-09-17 18:26 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Vlastimil Babka, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On Wed, Sep 17, 2025 at 12:06 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Wed, Sep 17, 2025 at 04:02:25PM +0900, Harry Yoo wrote:
> > On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote:
> > > On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> > > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything.
> > > > > Because only nmi can make this cpu to be in the middle of freelist update.
> > > >
> > > > You're right, freeing uses the "slowpath" (local_lock protected instead of
> > > > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
> > > > a kprobe on !RT because it doesn't exist there at all.
> >
> > Right.
> >
> > > > The only one is in ___slab_alloc() and that's covered.
> >
> > Right.
> >
> > and this is a question not relevant to reentrant kmalloc:
> >
> > On PREEMPT_RT, disabling fastpath in the alloc path makes sense because
> > both paths updates c->freelist, but in the free path, by disabling the
> > lockless fastpath, what are we protecting against?
> >
> > the free fastpath updates c->freelist but not slab->freelist, and
> > the free slowpath updates slab->freelist but not c->freelist?
> >
> > I failed to imagine how things can go wrong if we enable the lockless
> > fastpath in the free path.
>
> Oops, sorry. it slipped my mind. Things can go wrong if the free fastpath
> is executed while the alloc slowpath is executing and gets preempted.

Agree. All that is very tricky. Probably worth adding a comment.
Long term though I think RT folks needs to reconsider the lack of
fastpath. My gut feel is that overall kernel performance is poor on RT
due to this. sheaves might help though.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-17 18:26                     ` Alexei Starovoitov
@ 2025-09-17 18:34                       ` Vlastimil Babka
  2025-09-17 18:40                         ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2025-09-17 18:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Harry Yoo
  Cc: bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior,
	Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton,
	Peter Zijlstra, Steven Rostedt, Johannes Weiner

On 9/17/25 20:26, Alexei Starovoitov wrote:
> On Wed, Sep 17, 2025 at 12:06 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>>
>> On Wed, Sep 17, 2025 at 04:02:25PM +0900, Harry Yoo wrote:
>> > On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote:
>> > > On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>> > > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything.
>> > > > > Because only nmi can make this cpu to be in the middle of freelist update.
>> > > >
>> > > > You're right, freeing uses the "slowpath" (local_lock protected instead of
>> > > > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
>> > > > a kprobe on !RT because it doesn't exist there at all.
>> >
>> > Right.
>> >
>> > > > The only one is in ___slab_alloc() and that's covered.
>> >
>> > Right.
>> >
>> > and this is a question not relevant to reentrant kmalloc:
>> >
>> > On PREEMPT_RT, disabling fastpath in the alloc path makes sense because
>> > both paths updates c->freelist, but in the free path, by disabling the
>> > lockless fastpath, what are we protecting against?
>> >
>> > the free fastpath updates c->freelist but not slab->freelist, and
>> > the free slowpath updates slab->freelist but not c->freelist?
>> >
>> > I failed to imagine how things can go wrong if we enable the lockless
>> > fastpath in the free path.
>>
>> Oops, sorry. it slipped my mind. Things can go wrong if the free fastpath
>> is executed while the alloc slowpath is executing and gets preempted.
> 
> Agree. All that is very tricky. Probably worth adding a comment.

Ah right, I did add one based on your commit log and this conversation:

+/*
+ * We disallow kprobes in ___slab_alloc() to prevent reentrance
+ *
+ * kmalloc() -> ___slab_alloc() -> local_lock_cpu_slab() protected part of
+ * ___slab_alloc() manipulating c->freelist -> kprobe -> bpf ->
+ * kmalloc_nolock() or kfree_nolock() -> __update_cpu_freelist_fast()
+ * manipulating c->freelist without lock.
+ *
+ * This does not prevent kprobe in functions called from ___slab_alloc() such as
+ * local_lock_irqsave() itself, and that is fine, we only need to protect the
+ * c->freelist manipulation in ___slab_alloc() itself.
+ */
+NOKPROBE_SYMBOL(___slab_alloc);

> Long term though I think RT folks needs to reconsider the lack of
> fastpath. My gut feel is that overall kernel performance is poor on RT
> due to this. sheaves might help though.

Oh they definitely should, thanks for the selling point :)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
  2025-09-17 18:34                       ` Vlastimil Babka
@ 2025-09-17 18:40                         ` Alexei Starovoitov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2025-09-17 18:40 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko,
	Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi,
	Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner

On Wed, Sep 17, 2025 at 11:34 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
>
> Ah right, I did add one based on your commit log and this conversation:
>
> +/*
> + * We disallow kprobes in ___slab_alloc() to prevent reentrance
> + *
> + * kmalloc() -> ___slab_alloc() -> local_lock_cpu_slab() protected part of
> + * ___slab_alloc() manipulating c->freelist -> kprobe -> bpf ->
> + * kmalloc_nolock() or kfree_nolock() -> __update_cpu_freelist_fast()
> + * manipulating c->freelist without lock.
> + *
> + * This does not prevent kprobe in functions called from ___slab_alloc() such as
> + * local_lock_irqsave() itself, and that is fine, we only need to protect the
> + * c->freelist manipulation in ___slab_alloc() itself.
> + */
> +NOKPROBE_SYMBOL(___slab_alloc);

Perfect. Short and to the point. Thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-17 18:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-16  2:21 [PATCH slab] slab: Disallow kprobes in ___slab_alloc() Alexei Starovoitov
2025-09-16 10:40 ` Vlastimil Babka
2025-09-16 12:58   ` Harry Yoo
2025-09-16 13:13     ` Vlastimil Babka
2025-09-16 16:18       ` Alexei Starovoitov
2025-09-16 18:12         ` Vlastimil Babka
2025-09-16 18:46           ` Alexei Starovoitov
2025-09-16 19:06             ` Vlastimil Babka
2025-09-16 20:26               ` Alexei Starovoitov
2025-09-17  7:02                 ` Harry Yoo
2025-09-17  7:06                   ` Harry Yoo
2025-09-17 18:26                     ` Alexei Starovoitov
2025-09-17 18:34                       ` Vlastimil Babka
2025-09-17 18:40                         ` Alexei Starovoitov
2025-09-16 10:59 ` Harry Yoo
2025-09-16 12:25   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox