* [PATCH slab] slab: Disallow kprobes in ___slab_alloc() @ 2025-09-16 2:21 Alexei Starovoitov 2025-09-16 10:40 ` Vlastimil Babka 2025-09-16 10:59 ` Harry Yoo 0 siblings, 2 replies; 16+ messages in thread From: Alexei Starovoitov @ 2025-09-16 2:21 UTC (permalink / raw) To: bpf, linux-mm Cc: vbabka, harry.yoo, shakeel.butt, mhocko, bigeasy, andrii, memxor, akpm, peterz, rostedt, hannes From: Alexei Starovoitov <ast@kernel.org> Disallow kprobes in ___slab_alloc() to prevent reentrance: kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> kprobe -> bpf -> kmalloc_nolock(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> --- mm/slub.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c index c995f3bec69d..922d47b10c2f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -45,7 +45,7 @@ #include <kunit/test-bug.h> #include <linux/sort.h> #include <linux/irq_work.h> - +#include <linux/kprobes.h> #include <linux/debugfs.h> #include <trace/events/kmem.h> @@ -4697,6 +4697,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, goto load_freelist; } +NOKPROBE_SYMBOL(___slab_alloc); /* * A wrapper for ___slab_alloc() for contexts where preemption is not yet -- 2.47.3 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 2:21 [PATCH slab] slab: Disallow kprobes in ___slab_alloc() Alexei Starovoitov @ 2025-09-16 10:40 ` Vlastimil Babka 2025-09-16 12:58 ` Harry Yoo 2025-09-16 10:59 ` Harry Yoo 1 sibling, 1 reply; 16+ messages in thread From: Vlastimil Babka @ 2025-09-16 10:40 UTC (permalink / raw) To: Alexei Starovoitov, bpf, linux-mm Cc: harry.yoo, shakeel.butt, mhocko, bigeasy, andrii, memxor, akpm, peterz, rostedt, hannes On 9/16/25 04:21, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@kernel.org> > > Disallow kprobes in ___slab_alloc() to prevent reentrance: > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> > kprobe -> bpf -> kmalloc_nolock(). > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc); But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore: > /* > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock > * can be acquired without a deadlock before invoking the function. > * > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(), > * and kmalloc() is not used in an unsupported context. > * > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave(). > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but > * lockdep_assert() will catch a bug in case: > * #1 > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock() > * or > * #2 > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() AFAICS see we now eliminated this possibility. > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt > * disabled context. The lock will always be acquired and if needed it > * block and sleep until the lock is available. > * #1 is possible in !PREEMPT_RT only. Yes because this in kmalloc_nolock_noprof() if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */ return NULL; > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock: > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) -> > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B) And this is no longer possible, so can we just remove these comments and drop "slab: Make slub local_(try)lock more precise for LOCKDEP" now? > * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B > */ However, what about the freeing path? Shouldn't we do the same with __slab_free() to prevent fast path messing up an interrupted slow path? And a question below: > --- > mm/slub.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > index c995f3bec69d..922d47b10c2f 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -45,7 +45,7 @@ > #include <kunit/test-bug.h> > #include <linux/sort.h> > #include <linux/irq_work.h> > - > +#include <linux/kprobes.h> > #include <linux/debugfs.h> > #include <trace/events/kmem.h> > > @@ -4697,6 +4697,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > goto load_freelist; > } > +NOKPROBE_SYMBOL(___slab_alloc); Is that guaranteed to work as expected without also making ___slab_alloc noinline? > /* > * A wrapper for ___slab_alloc() for contexts where preemption is not yet ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 10:40 ` Vlastimil Babka @ 2025-09-16 12:58 ` Harry Yoo 2025-09-16 13:13 ` Vlastimil Babka 0 siblings, 1 reply; 16+ messages in thread From: Harry Yoo @ 2025-09-16 12:58 UTC (permalink / raw) To: Vlastimil Babka Cc: Alexei Starovoitov, bpf, linux-mm, shakeel.butt, mhocko, bigeasy, andrii, memxor, akpm, peterz, rostedt, hannes On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote: > On 9/16/25 04:21, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > > > Disallow kprobes in ___slab_alloc() to prevent reentrance: > > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> > > kprobe -> bpf -> kmalloc_nolock(). > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." > and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc); > > But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore: > > > /* > > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock > > * can be acquired without a deadlock before invoking the function. > > * > > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is > > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(), > > * and kmalloc() is not used in an unsupported context. > > * > > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave(). > > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but > > * lockdep_assert() will catch a bug in case: > > * #1 > > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock() > > * or > > * #2 > > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() > > AFAICS see we now eliminated this possibility. Right. > > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt > > * disabled context. The lock will always be acquired and if needed it > > * block and sleep until the lock is available. > > * #1 is possible in !PREEMPT_RT only. > > Yes because this in kmalloc_nolock_noprof() > > if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) > /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */ > return NULL; > > > > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock: > > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) -> > > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B) > And this is no longer possible, so can we just remove these comments and drop > "slab: Make slub local_(try)lock more precise for LOCKDEP" now? Makes sense and sounds good to me. Also in the commit mesage should be adjusted too: > kmalloc_nolock() can be called from any context and can re-enter > into ___slab_alloc(): > kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> > kmalloc_nolock() -> ___slab_alloc(cache_B) > or > kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> > kmalloc_nolock() -> ___slab_alloc(cache_B) The lattter path is not possible anymore, > Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu > rt_spin_lock is locked by current _task_. In this case re-entrance into > the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different > bucket that is most likely is not locked by the current task. > Though it may be locked by a different task it's safe to rt_spin_lock() and > sleep on it. and this paragraph is no longer valid either? > > * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B > > */ > > However, what about the freeing path? > Shouldn't we do the same with __slab_free() to prevent fast path messing up > an interrupted slow path? Hmm right, but we have: (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked() check in do_slab_free() if !allow_spin? -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 12:58 ` Harry Yoo @ 2025-09-16 13:13 ` Vlastimil Babka 2025-09-16 16:18 ` Alexei Starovoitov 0 siblings, 1 reply; 16+ messages in thread From: Vlastimil Babka @ 2025-09-16 13:13 UTC (permalink / raw) To: Harry Yoo Cc: Alexei Starovoitov, bpf, linux-mm, shakeel.butt, mhocko, bigeasy, andrii, memxor, akpm, peterz, rostedt, hannes On 9/16/25 14:58, Harry Yoo wrote: > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote: >> On 9/16/25 04:21, Alexei Starovoitov wrote: >> > From: Alexei Starovoitov <ast@kernel.org> >> > >> > Disallow kprobes in ___slab_alloc() to prevent reentrance: >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> >> > kprobe -> bpf -> kmalloc_nolock(). >> > >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org> >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc); >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore: >> >> > /* >> > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock >> > * can be acquired without a deadlock before invoking the function. >> > * >> > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is >> > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(), >> > * and kmalloc() is not used in an unsupported context. >> > * >> > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave(). >> > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but >> > * lockdep_assert() will catch a bug in case: >> > * #1 >> > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock() >> > * or >> > * #2 >> > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() >> >> AFAICS see we now eliminated this possibility. > > Right. > >> > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt >> > * disabled context. The lock will always be acquired and if needed it >> > * block and sleep until the lock is available. >> > * #1 is possible in !PREEMPT_RT only. >> >> Yes because this in kmalloc_nolock_noprof() >> >> if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) >> /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */ >> return NULL; >> >> >> > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock: >> > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) -> >> > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B) >> And this is no longer possible, so can we just remove these comments and drop >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now? > > Makes sense and sounds good to me. > > Also in the commit mesage should be adjusted too: >> kmalloc_nolock() can be called from any context and can re-enter >> into ___slab_alloc(): >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> >> kmalloc_nolock() -> ___slab_alloc(cache_B) >> or >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> >> kmalloc_nolock() -> ___slab_alloc(cache_B) > > The lattter path is not possible anymore, > >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu >> rt_spin_lock is locked by current _task_. In this case re-entrance into >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different >> bucket that is most likely is not locked by the current task. >> Though it may be locked by a different task it's safe to rt_spin_lock() and >> sleep on it. > > and this paragraph is no longer valid either? Thanks for confirming! Let's see if Alexei agrees or we both missed something. >> > * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B >> > */ >> >> However, what about the freeing path? >> Shouldn't we do the same with __slab_free() to prevent fast path messing up >> an interrupted slow path? > > Hmm right, but we have: > > (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked() Yes, but like in the alloc case, this doesn't trigger in the !in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right? But now I think I see another solution here. Since we're already under "if (!allow_spin)" we could stick a very ugly goto there to skip the fastpath if we don't defer_free()? (apparently declaration under a goto label is a C23 extension) diff --git a/mm/slub.c b/mm/slub.c index 6e858a6e397c..212c0e3e5007 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s, { /* cnt == 0 signals that it's called from kfree_nolock() */ bool allow_spin = cnt; + __maybe_unused unsigned long flags; struct kmem_cache_cpu *c; unsigned long tid; void **freelist; @@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s, return; } cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */ + + /* prevent a fastpath interrupting a slowpath */ + goto no_lockless; } if (USE_LOCKLESS_FAST_PATH()) { @@ -6501,8 +6505,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s, goto redo; } } else { - __maybe_unused unsigned long flags = 0; - +no_lockless: /* Update the free list under the local lock */ local_lock_cpu_slab(s, flags); c = this_cpu_ptr(s->cpu_slab); ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 13:13 ` Vlastimil Babka @ 2025-09-16 16:18 ` Alexei Starovoitov 2025-09-16 18:12 ` Vlastimil Babka 0 siblings, 1 reply; 16+ messages in thread From: Alexei Starovoitov @ 2025-09-16 16:18 UTC (permalink / raw) To: Vlastimil Babka Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 9/16/25 14:58, Harry Yoo wrote: > > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote: > >> On 9/16/25 04:21, Alexei Starovoitov wrote: > >> > From: Alexei Starovoitov <ast@kernel.org> > >> > > >> > Disallow kprobes in ___slab_alloc() to prevent reentrance: > >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> > >> > kprobe -> bpf -> kmalloc_nolock(). > >> > > >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > >> > >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." > >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc); > >> > >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore: > >> > >> > /* > >> > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock > >> > * can be acquired without a deadlock before invoking the function. > >> > * > >> > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is > >> > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(), > >> > * and kmalloc() is not used in an unsupported context. > >> > * > >> > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave(). > >> > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but > >> > * lockdep_assert() will catch a bug in case: > >> > * #1 > >> > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock() > >> > * or > >> > * #2 > >> > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() > >> > >> AFAICS see we now eliminated this possibility. > > > > Right. > > > >> > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt > >> > * disabled context. The lock will always be acquired and if needed it > >> > * block and sleep until the lock is available. > >> > * #1 is possible in !PREEMPT_RT only. > >> > >> Yes because this in kmalloc_nolock_noprof() > >> > >> if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) > >> /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */ > >> return NULL; > >> > >> > >> > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock: > >> > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) -> > >> > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B) > >> And this is no longer possible, so can we just remove these comments and drop > >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now? > > > > Makes sense and sounds good to me. > > > > Also in the commit mesage should be adjusted too: > >> kmalloc_nolock() can be called from any context and can re-enter > >> into ___slab_alloc(): > >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> > >> kmalloc_nolock() -> ___slab_alloc(cache_B) > >> or > >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> > >> kmalloc_nolock() -> ___slab_alloc(cache_B) > > > > The lattter path is not possible anymore, > > > >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu > >> rt_spin_lock is locked by current _task_. In this case re-entrance into > >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different > >> bucket that is most likely is not locked by the current task. > >> Though it may be locked by a different task it's safe to rt_spin_lock() and > >> sleep on it. > > > > and this paragraph is no longer valid either? > > Thanks for confirming! Let's see if Alexei agrees or we both missed > something. Not quite. This patch prevents kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> kprobe -> bpf to make sure kprobe cannot be inserted in the _middle_ of freelist operations. kprobe/tracepoint outside of freelist is not a concern, and malloc() -> ___slab_alloc() -> local_lock_irqsave() -> tracepoint -> bpf is still possible. Especially on RT. For example, there are trace_contention_begin/end in spinlocks and mutexes, so on RT it's easy to construct such reentrance because local_lock_irqsave() is an rtmutex. On !RT it's just irqsave and as far as I could analyze the code there are no tracepoints in local_lock_cpu_slab()/unlock critical sections. But it would be fine to add a tracepoint (though silly) if it is not splitting freelist operation. Like, local_lock_cpu_slab() is used in put_cpu_partial(). There is no need to mark it as NOKPROBE, since it doesn't conflict with __update_cpu_freelist_fast(). and it's ok to add a tracepoint _anywhere_ in put_cpu_partial(). It's also fine to add a tracepoint right after local_lock_cpu_slab() in ___slab_alloc(), but not within freelist manipulations. rtmutex's trace_contention tracepoint is such safe tracepoint within a critical section. So "more precise for LOCKDEP" patch is still needed. I thought about whether do_slab_free() should be marked as NOKPROBE, but that's not necessary. There is freelist manipulation there under local_lock_cpu_slab(), but it's RT only, and there is no fast path there. > > >> > * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B > >> > */ > >> > >> However, what about the freeing path? > >> Shouldn't we do the same with __slab_free() to prevent fast path messing up > >> an interrupted slow path? > > > > Hmm right, but we have: > > > > (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked() > > Yes, but like in the alloc case, this doesn't trigger in the > !in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right? > > But now I think I see another solution here. Since we're already under > "if (!allow_spin)" we could stick a very ugly goto there to skip the > fastpath if we don't defer_free()? > (apparently declaration under a goto label is a C23 extension) > > diff --git a/mm/slub.c b/mm/slub.c > index 6e858a6e397c..212c0e3e5007 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s, > { > /* cnt == 0 signals that it's called from kfree_nolock() */ > bool allow_spin = cnt; > + __maybe_unused unsigned long flags; > struct kmem_cache_cpu *c; > unsigned long tid; > void **freelist; > @@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s, > return; > } > cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */ > + > + /* prevent a fastpath interrupting a slowpath */ > + goto no_lockless; I'm missing why this is needed. do_slab_free() does: if ((in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked(&s->cpu_slab->lock)) { defer_free(s, head); return; It's the same check as in kmalloc_nolock() to avoid invalid: freelist ops -> nmi -> bpf -> __update_cpu_freelist_fast. The big comment in kmalloc_nolock() applies here too. I didn't copy paste it. Maybe worth adding a reference like: /* See comment in kmalloc_nolock() why fast path should be skipped in_nmi() && lock_is_locked() */ So this patch with NOKPROBE_SYMBOL(___slab_alloc) is enough, afaict. Maybe expand a comment with the above details while folding? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 16:18 ` Alexei Starovoitov @ 2025-09-16 18:12 ` Vlastimil Babka 2025-09-16 18:46 ` Alexei Starovoitov 0 siblings, 1 reply; 16+ messages in thread From: Vlastimil Babka @ 2025-09-16 18:12 UTC (permalink / raw) To: Alexei Starovoitov Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On 9/16/25 18:18, Alexei Starovoitov wrote: > On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> >> On 9/16/25 14:58, Harry Yoo wrote: >> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote: >> >> On 9/16/25 04:21, Alexei Starovoitov wrote: >> >> > From: Alexei Starovoitov <ast@kernel.org> >> >> > >> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance: >> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> >> >> > kprobe -> bpf -> kmalloc_nolock(). >> >> > >> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org> >> >> >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." >> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc); >> >> >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore: >> >> >> >> > /* >> >> > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock >> >> > * can be acquired without a deadlock before invoking the function. >> >> > * >> >> > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is >> >> > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(), >> >> > * and kmalloc() is not used in an unsupported context. >> >> > * >> >> > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave(). >> >> > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but >> >> > * lockdep_assert() will catch a bug in case: >> >> > * #1 >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock() >> >> > * or >> >> > * #2 >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() >> >> >> >> AFAICS see we now eliminated this possibility. >> > >> > Right. >> > >> >> > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt >> >> > * disabled context. The lock will always be acquired and if needed it >> >> > * block and sleep until the lock is available. >> >> > * #1 is possible in !PREEMPT_RT only. >> >> >> >> Yes because this in kmalloc_nolock_noprof() >> >> >> >> if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) >> >> /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */ >> >> return NULL; >> >> >> >> >> >> > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock: >> >> > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) -> >> >> > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B) >> >> And this is no longer possible, so can we just remove these comments and drop >> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now? >> > >> > Makes sense and sounds good to me. >> > >> > Also in the commit mesage should be adjusted too: >> >> kmalloc_nolock() can be called from any context and can re-enter >> >> into ___slab_alloc(): >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> >> >> kmalloc_nolock() -> ___slab_alloc(cache_B) >> >> or >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> >> >> kmalloc_nolock() -> ___slab_alloc(cache_B) >> > >> > The lattter path is not possible anymore, >> > >> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu >> >> rt_spin_lock is locked by current _task_. In this case re-entrance into >> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different >> >> bucket that is most likely is not locked by the current task. >> >> Though it may be locked by a different task it's safe to rt_spin_lock() and >> >> sleep on it. >> > >> > and this paragraph is no longer valid either? >> >> Thanks for confirming! Let's see if Alexei agrees or we both missed >> something. > > Not quite. > This patch prevents > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> > kprobe -> bpf > > to make sure kprobe cannot be inserted in the _middle_ of > freelist operations. > kprobe/tracepoint outside of freelist is not a concern, > and > malloc() -> ___slab_alloc() -> local_lock_irqsave() -> > tracepoint -> bpf > > is still possible. Especially on RT. Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the whole scope of ___slab_alloc() but that's not the case. Thanks for clearin that up. > I thought about whether do_slab_free() should be marked as NOKPROBE, > but that's not necessary. There is freelist manipulation > there under local_lock_cpu_slab(), but it's RT only, > and there is no fast path there. There's __update_cpu_freelist_fast() called from do_slab_free() for !RT? >> >> >> > * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B >> >> > */ >> >> >> >> However, what about the freeing path? >> >> Shouldn't we do the same with __slab_free() to prevent fast path messing up >> >> an interrupted slow path? >> > >> > Hmm right, but we have: >> > >> > (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked() >> >> Yes, but like in the alloc case, this doesn't trigger in the >> !in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right? >> >> But now I think I see another solution here. Since we're already under >> "if (!allow_spin)" we could stick a very ugly goto there to skip the >> fastpath if we don't defer_free()? >> (apparently declaration under a goto label is a C23 extension) >> >> diff --git a/mm/slub.c b/mm/slub.c >> index 6e858a6e397c..212c0e3e5007 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s, >> { >> /* cnt == 0 signals that it's called from kfree_nolock() */ >> bool allow_spin = cnt; >> + __maybe_unused unsigned long flags; >> struct kmem_cache_cpu *c; >> unsigned long tid; >> void **freelist; >> @@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s, >> return; >> } >> cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */ >> + >> + /* prevent a fastpath interrupting a slowpath */ >> + goto no_lockless; > > I'm missing why this is needed. > > do_slab_free() does: > if ((in_nmi() || !USE_LOCKLESS_FAST_PATH()) && > local_lock_is_locked(&s->cpu_slab->lock)) { > defer_free(s, head); return; > > It's the same check as in kmalloc_nolock() to avoid invalid: > freelist ops -> nmi -> bpf -> __update_cpu_freelist_fast. > > The big comment in kmalloc_nolock() applies here too. But with nmi that's variant of #1 of that comment. Like for ___slab_alloc() we need to prevent #2 with no nmi? example on !RT: kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kfree_nolock() -> do_slab_free() in_nmi() || !USE_LOCKLESS_FAST_PATH() false || false, we proceed, no checking of local_lock_is_locked() if (USE_LOCKLESS_FAST_PATH()) { - true (!RT) -> __update_cpu_freelist_fast() Am I missing something? > I didn't copy paste it. > Maybe worth adding a reference like: > /* See comment in kmalloc_nolock() why fast path should be skipped > in_nmi() && lock_is_locked() */ > > So this patch with NOKPROBE_SYMBOL(___slab_alloc) > is enough, afaict. > Maybe expand a comment with the above details while folding? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 18:12 ` Vlastimil Babka @ 2025-09-16 18:46 ` Alexei Starovoitov 2025-09-16 19:06 ` Vlastimil Babka 0 siblings, 1 reply; 16+ messages in thread From: Alexei Starovoitov @ 2025-09-16 18:46 UTC (permalink / raw) To: Vlastimil Babka Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On Tue, Sep 16, 2025 at 11:12 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > On 9/16/25 18:18, Alexei Starovoitov wrote: > > On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote: > >> > >> On 9/16/25 14:58, Harry Yoo wrote: > >> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote: > >> >> On 9/16/25 04:21, Alexei Starovoitov wrote: > >> >> > From: Alexei Starovoitov <ast@kernel.org> > >> >> > > >> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance: > >> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> > >> >> > kprobe -> bpf -> kmalloc_nolock(). > >> >> > > >> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > >> >> > >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." > >> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc); > >> >> > >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore: > >> >> > >> >> > /* > >> >> > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock > >> >> > * can be acquired without a deadlock before invoking the function. > >> >> > * > >> >> > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is > >> >> > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(), > >> >> > * and kmalloc() is not used in an unsupported context. > >> >> > * > >> >> > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave(). > >> >> > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but > >> >> > * lockdep_assert() will catch a bug in case: > >> >> > * #1 > >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock() > >> >> > * or > >> >> > * #2 > >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() > >> >> > >> >> AFAICS see we now eliminated this possibility. > >> > > >> > Right. > >> > > >> >> > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt > >> >> > * disabled context. The lock will always be acquired and if needed it > >> >> > * block and sleep until the lock is available. > >> >> > * #1 is possible in !PREEMPT_RT only. > >> >> > >> >> Yes because this in kmalloc_nolock_noprof() > >> >> > >> >> if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) > >> >> /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */ > >> >> return NULL; > >> >> > >> >> > >> >> > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock: > >> >> > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) -> > >> >> > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B) > >> >> And this is no longer possible, so can we just remove these comments and drop > >> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now? > >> > > >> > Makes sense and sounds good to me. > >> > > >> > Also in the commit mesage should be adjusted too: > >> >> kmalloc_nolock() can be called from any context and can re-enter > >> >> into ___slab_alloc(): > >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> > >> >> kmalloc_nolock() -> ___slab_alloc(cache_B) > >> >> or > >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> > >> >> kmalloc_nolock() -> ___slab_alloc(cache_B) > >> > > >> > The lattter path is not possible anymore, > >> > > >> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu > >> >> rt_spin_lock is locked by current _task_. In this case re-entrance into > >> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different > >> >> bucket that is most likely is not locked by the current task. > >> >> Though it may be locked by a different task it's safe to rt_spin_lock() and > >> >> sleep on it. > >> > > >> > and this paragraph is no longer valid either? > >> > >> Thanks for confirming! Let's see if Alexei agrees or we both missed > >> something. > > > > Not quite. > > This patch prevents > > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> > > kprobe -> bpf > > > > to make sure kprobe cannot be inserted in the _middle_ of > > freelist operations. > > kprobe/tracepoint outside of freelist is not a concern, > > and > > malloc() -> ___slab_alloc() -> local_lock_irqsave() -> > > tracepoint -> bpf > > > > is still possible. Especially on RT. > > Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the > whole scope of ___slab_alloc() but that's not the case. Thanks for clearin > that up. hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function. It disallows kprobes anywhere within the body, but it doesn't make it 'notrace', so tracing the first nop5 is still ok. > > I thought about whether do_slab_free() should be marked as NOKPROBE, > > but that's not necessary. There is freelist manipulation > > there under local_lock_cpu_slab(), but it's RT only, > > and there is no fast path there. > > There's __update_cpu_freelist_fast() called from do_slab_free() for !RT? yes. do_slab_free() -> USE_LOCKLESS_FAST_PATH -> __update_cpu_freelist_fast. > >> > >> >> > * local_lock_is_locked() prevents the case kmem_cache_A == kmem_cache_B > >> >> > */ > >> >> > >> >> However, what about the freeing path? > >> >> Shouldn't we do the same with __slab_free() to prevent fast path messing up > >> >> an interrupted slow path? > >> > > >> > Hmm right, but we have: > >> > > >> > (in_nmi() || !USE_LOCKLESS_FAST_PATH()) && local_lock_is_locked() > >> > >> Yes, but like in the alloc case, this doesn't trigger in the > >> !in_nmi() && !PREEMPT_RT case, i.e. a kprobe handler on !PREEMPT_RT, right? > >> > >> But now I think I see another solution here. Since we're already under > >> "if (!allow_spin)" we could stick a very ugly goto there to skip the > >> fastpath if we don't defer_free()? > >> (apparently declaration under a goto label is a C23 extension) > >> > >> diff --git a/mm/slub.c b/mm/slub.c > >> index 6e858a6e397c..212c0e3e5007 100644 > >> --- a/mm/slub.c > >> +++ b/mm/slub.c > >> @@ -6450,6 +6450,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s, > >> { > >> /* cnt == 0 signals that it's called from kfree_nolock() */ > >> bool allow_spin = cnt; > >> + __maybe_unused unsigned long flags; > >> struct kmem_cache_cpu *c; > >> unsigned long tid; > >> void **freelist; > >> @@ -6489,6 +6490,9 @@ static __always_inline void do_slab_free(struct kmem_cache *s, > >> return; > >> } > >> cnt = 1; /* restore cnt. kfree_nolock() frees one object at a time */ > >> + > >> + /* prevent a fastpath interrupting a slowpath */ > >> + goto no_lockless; > > > > I'm missing why this is needed. > > > > do_slab_free() does: > > if ((in_nmi() || !USE_LOCKLESS_FAST_PATH()) && > > local_lock_is_locked(&s->cpu_slab->lock)) { > > defer_free(s, head); return; > > > > It's the same check as in kmalloc_nolock() to avoid invalid: > > freelist ops -> nmi -> bpf -> __update_cpu_freelist_fast. > > > > The big comment in kmalloc_nolock() applies here too. > > But with nmi that's variant of #1 of that comment. > > Like for ___slab_alloc() we need to prevent #2 with no nmi? > example on !RT: > > kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> > kfree_nolock() -> do_slab_free() > > in_nmi() || !USE_LOCKLESS_FAST_PATH() > false || false, we proceed, no checking of local_lock_is_locked() > > if (USE_LOCKLESS_FAST_PATH()) { - true (!RT) > -> __update_cpu_freelist_fast() > > Am I missing something? It's ok to call __update_cpu_freelist_fast(). It won't break anything. Because only nmi can make this cpu to be in the middle of freelist update. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 18:46 ` Alexei Starovoitov @ 2025-09-16 19:06 ` Vlastimil Babka 2025-09-16 20:26 ` Alexei Starovoitov 0 siblings, 1 reply; 16+ messages in thread From: Vlastimil Babka @ 2025-09-16 19:06 UTC (permalink / raw) To: Alexei Starovoitov Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On 9/16/25 20:46, Alexei Starovoitov wrote: > On Tue, Sep 16, 2025 at 11:12 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> >> On 9/16/25 18:18, Alexei Starovoitov wrote: >> > On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote: >> >> >> >> On 9/16/25 14:58, Harry Yoo wrote: >> >> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote: >> >> >> On 9/16/25 04:21, Alexei Starovoitov wrote: >> >> >> > From: Alexei Starovoitov <ast@kernel.org> >> >> >> > >> >> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance: >> >> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> >> >> >> > kprobe -> bpf -> kmalloc_nolock(). >> >> >> > >> >> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org> >> >> >> >> >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()." >> >> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc); >> >> >> >> >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore: >> >> >> >> >> >> > /* >> >> >> > * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock >> >> >> > * can be acquired without a deadlock before invoking the function. >> >> >> > * >> >> >> > * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is >> >> >> > * using local_lock_is_locked() properly before calling local_lock_cpu_slab(), >> >> >> > * and kmalloc() is not used in an unsupported context. >> >> >> > * >> >> >> > * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave(). >> >> >> > * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but >> >> >> > * lockdep_assert() will catch a bug in case: >> >> >> > * #1 >> >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock() >> >> >> > * or >> >> >> > * #2 >> >> >> > * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() >> >> >> >> >> >> AFAICS see we now eliminated this possibility. >> >> > >> >> > Right. >> >> > >> >> >> > * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt >> >> >> > * disabled context. The lock will always be acquired and if needed it >> >> >> > * block and sleep until the lock is available. >> >> >> > * #1 is possible in !PREEMPT_RT only. >> >> >> >> >> >> Yes because this in kmalloc_nolock_noprof() >> >> >> >> >> >> if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) >> >> >> /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */ >> >> >> return NULL; >> >> >> >> >> >> >> >> >> > * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock: >> >> >> > * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) -> >> >> >> > * tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B) >> >> >> And this is no longer possible, so can we just remove these comments and drop >> >> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now? >> >> > >> >> > Makes sense and sounds good to me. >> >> > >> >> > Also in the commit mesage should be adjusted too: >> >> >> kmalloc_nolock() can be called from any context and can re-enter >> >> >> into ___slab_alloc(): >> >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> >> >> >> kmalloc_nolock() -> ___slab_alloc(cache_B) >> >> >> or >> >> >> kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> >> >> >> kmalloc_nolock() -> ___slab_alloc(cache_B) >> >> > >> >> > The lattter path is not possible anymore, >> >> > >> >> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu >> >> >> rt_spin_lock is locked by current _task_. In this case re-entrance into >> >> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different >> >> >> bucket that is most likely is not locked by the current task. >> >> >> Though it may be locked by a different task it's safe to rt_spin_lock() and >> >> >> sleep on it. >> >> > >> >> > and this paragraph is no longer valid either? >> >> >> >> Thanks for confirming! Let's see if Alexei agrees or we both missed >> >> something. >> > >> > Not quite. >> > This patch prevents >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> >> > kprobe -> bpf >> > >> > to make sure kprobe cannot be inserted in the _middle_ of >> > freelist operations. >> > kprobe/tracepoint outside of freelist is not a concern, >> > and >> > malloc() -> ___slab_alloc() -> local_lock_irqsave() -> >> > tracepoint -> bpf >> > >> > is still possible. Especially on RT. >> >> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the >> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin >> that up. > > hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function. > It disallows kprobes anywhere within the body, > but it doesn't make it 'notrace', so tracing the first nop5 > is still ok. Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock operations you mentioned (local_lock_irqsave()). That's not part of the ___slab_alloc() body so you're right we have not eliminated it. >> >> But with nmi that's variant of #1 of that comment. >> >> Like for ___slab_alloc() we need to prevent #2 with no nmi? >> example on !RT: >> >> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> >> kfree_nolock() -> do_slab_free() >> >> in_nmi() || !USE_LOCKLESS_FAST_PATH() >> false || false, we proceed, no checking of local_lock_is_locked() >> >> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT) >> -> __update_cpu_freelist_fast() >> >> Am I missing something? > > It's ok to call __update_cpu_freelist_fast(). It won't break anything. > Because only nmi can make this cpu to be in the middle of freelist update. You're right, freeing uses the "slowpath" (local_lock protected instead of cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with a kprobe on !RT because it doesn't exist there at all. The only one is in ___slab_alloc() and that's covered. I do really hope we'll get to the point where sheaves will be able to replace the other percpu slab caching completely... it should be much simpler. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 19:06 ` Vlastimil Babka @ 2025-09-16 20:26 ` Alexei Starovoitov 2025-09-17 7:02 ` Harry Yoo 0 siblings, 1 reply; 16+ messages in thread From: Alexei Starovoitov @ 2025-09-16 20:26 UTC (permalink / raw) To: Vlastimil Babka Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > >> > >> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the > >> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin > >> that up. > > > > hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function. > > It disallows kprobes anywhere within the body, > > but it doesn't make it 'notrace', so tracing the first nop5 > > is still ok. > > Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock > operations you mentioned (local_lock_irqsave()). That's not part of the > ___slab_alloc() body so you're right we have not eliminated it. Ahh. Yes. All functions that ___slab_alloc() calls are not affected and it's ok. There are no calls in the middle freelist update. > >> > >> But with nmi that's variant of #1 of that comment. > >> > >> Like for ___slab_alloc() we need to prevent #2 with no nmi? > >> example on !RT: > >> > >> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> > >> kfree_nolock() -> do_slab_free() > >> > >> in_nmi() || !USE_LOCKLESS_FAST_PATH() > >> false || false, we proceed, no checking of local_lock_is_locked() > >> > >> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT) > >> -> __update_cpu_freelist_fast() > >> > >> Am I missing something? > > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything. > > Because only nmi can make this cpu to be in the middle of freelist update. > > You're right, freeing uses the "slowpath" (local_lock protected instead of > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with > a kprobe on !RT because it doesn't exist there at all. The only one is in > ___slab_alloc() and that's covered. yep. > I do really hope we'll get to the point where sheaves will be able to > replace the other percpu slab caching completely... it should be much simpler. +1. Since we're talking about long term plans... would be really cool to have per-numa allocators. Like per-cpu alloc, but per-numa. And corresponding this_numa_add() that will use atomic_add underneath. Regular atomic across all nodes are becoming quite slow in modern cpus, while per-cpu counters are too expensive from memory pov. per-numa could be such middle ground with fast enough operations and good memory usage. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 20:26 ` Alexei Starovoitov @ 2025-09-17 7:02 ` Harry Yoo 2025-09-17 7:06 ` Harry Yoo 0 siblings, 1 reply; 16+ messages in thread From: Harry Yoo @ 2025-09-17 7:02 UTC (permalink / raw) To: Alexei Starovoitov Cc: Vlastimil Babka, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote: > On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > >> > > >> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the > > >> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin > > >> that up. > > > > > > hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function. > > > It disallows kprobes anywhere within the body, > > > but it doesn't make it 'notrace', so tracing the first nop5 > > > is still ok. > > > > Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock > > operations you mentioned (local_lock_irqsave()). That's not part of the > > ___slab_alloc() body so you're right we have not eliminated it. > > Ahh. Yes. All functions that ___slab_alloc() calls > are not affected and it's ok. > There are no calls in the middle freelist update. Gotcha! I'm confused about this too :) > > >> But with nmi that's variant of #1 of that comment. > > >> > > >> Like for ___slab_alloc() we need to prevent #2 with no nmi? > > >> example on !RT: > > >> > > >> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> > > >> kfree_nolock() -> do_slab_free() > > >> > > >> in_nmi() || !USE_LOCKLESS_FAST_PATH() > > >> false || false, we proceed, no checking of local_lock_is_locked() > > >> > > >> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT) > > >> -> __update_cpu_freelist_fast() > > >> > > >> Am I missing something? > > > > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything. > > > Because only nmi can make this cpu to be in the middle of freelist update. > > > > You're right, freeing uses the "slowpath" (local_lock protected instead of > > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with > > a kprobe on !RT because it doesn't exist there at all. Right. > > The only one is in ___slab_alloc() and that's covered. Right. and this is a question not relevant to reentrant kmalloc: On PREEMPT_RT, disabling fastpath in the alloc path makes sense because both paths updates c->freelist, but in the free path, by disabling the lockless fastpath, what are we protecting against? the free fastpath updates c->freelist but not slab->freelist, and the free slowpath updates slab->freelist but not c->freelist? I failed to imagine how things can go wrong if we enable the lockless fastpath in the free path. -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-17 7:02 ` Harry Yoo @ 2025-09-17 7:06 ` Harry Yoo 2025-09-17 18:26 ` Alexei Starovoitov 0 siblings, 1 reply; 16+ messages in thread From: Harry Yoo @ 2025-09-17 7:06 UTC (permalink / raw) To: Alexei Starovoitov Cc: Vlastimil Babka, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On Wed, Sep 17, 2025 at 04:02:25PM +0900, Harry Yoo wrote: > On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote: > > On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything. > > > > Because only nmi can make this cpu to be in the middle of freelist update. > > > > > > You're right, freeing uses the "slowpath" (local_lock protected instead of > > > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with > > > a kprobe on !RT because it doesn't exist there at all. > > Right. > > > > The only one is in ___slab_alloc() and that's covered. > > Right. > > and this is a question not relevant to reentrant kmalloc: > > On PREEMPT_RT, disabling fastpath in the alloc path makes sense because > both paths updates c->freelist, but in the free path, by disabling the > lockless fastpath, what are we protecting against? > > the free fastpath updates c->freelist but not slab->freelist, and > the free slowpath updates slab->freelist but not c->freelist? > > I failed to imagine how things can go wrong if we enable the lockless > fastpath in the free path. Oops, sorry. it slipped my mind. Things can go wrong if the free fastpath is executed while the alloc slowpath is executing and gets preempted. -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-17 7:06 ` Harry Yoo @ 2025-09-17 18:26 ` Alexei Starovoitov 2025-09-17 18:34 ` Vlastimil Babka 0 siblings, 1 reply; 16+ messages in thread From: Alexei Starovoitov @ 2025-09-17 18:26 UTC (permalink / raw) To: Harry Yoo Cc: Vlastimil Babka, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On Wed, Sep 17, 2025 at 12:06 AM Harry Yoo <harry.yoo@oracle.com> wrote: > > On Wed, Sep 17, 2025 at 04:02:25PM +0900, Harry Yoo wrote: > > On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote: > > > On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote: > > > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything. > > > > > Because only nmi can make this cpu to be in the middle of freelist update. > > > > > > > > You're right, freeing uses the "slowpath" (local_lock protected instead of > > > > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with > > > > a kprobe on !RT because it doesn't exist there at all. > > > > Right. > > > > > > The only one is in ___slab_alloc() and that's covered. > > > > Right. > > > > and this is a question not relevant to reentrant kmalloc: > > > > On PREEMPT_RT, disabling fastpath in the alloc path makes sense because > > both paths updates c->freelist, but in the free path, by disabling the > > lockless fastpath, what are we protecting against? > > > > the free fastpath updates c->freelist but not slab->freelist, and > > the free slowpath updates slab->freelist but not c->freelist? > > > > I failed to imagine how things can go wrong if we enable the lockless > > fastpath in the free path. > > Oops, sorry. it slipped my mind. Things can go wrong if the free fastpath > is executed while the alloc slowpath is executing and gets preempted. Agree. All that is very tricky. Probably worth adding a comment. Long term though I think RT folks needs to reconsider the lack of fastpath. My gut feel is that overall kernel performance is poor on RT due to this. sheaves might help though. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-17 18:26 ` Alexei Starovoitov @ 2025-09-17 18:34 ` Vlastimil Babka 2025-09-17 18:40 ` Alexei Starovoitov 0 siblings, 1 reply; 16+ messages in thread From: Vlastimil Babka @ 2025-09-17 18:34 UTC (permalink / raw) To: Alexei Starovoitov, Harry Yoo Cc: bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On 9/17/25 20:26, Alexei Starovoitov wrote: > On Wed, Sep 17, 2025 at 12:06 AM Harry Yoo <harry.yoo@oracle.com> wrote: >> >> On Wed, Sep 17, 2025 at 04:02:25PM +0900, Harry Yoo wrote: >> > On Tue, Sep 16, 2025 at 01:26:53PM -0700, Alexei Starovoitov wrote: >> > > On Tue, Sep 16, 2025 at 12:06 PM Vlastimil Babka <vbabka@suse.cz> wrote: >> > > > > It's ok to call __update_cpu_freelist_fast(). It won't break anything. >> > > > > Because only nmi can make this cpu to be in the middle of freelist update. >> > > > >> > > > You're right, freeing uses the "slowpath" (local_lock protected instead of >> > > > cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with >> > > > a kprobe on !RT because it doesn't exist there at all. >> > >> > Right. >> > >> > > > The only one is in ___slab_alloc() and that's covered. >> > >> > Right. >> > >> > and this is a question not relevant to reentrant kmalloc: >> > >> > On PREEMPT_RT, disabling fastpath in the alloc path makes sense because >> > both paths updates c->freelist, but in the free path, by disabling the >> > lockless fastpath, what are we protecting against? >> > >> > the free fastpath updates c->freelist but not slab->freelist, and >> > the free slowpath updates slab->freelist but not c->freelist? >> > >> > I failed to imagine how things can go wrong if we enable the lockless >> > fastpath in the free path. >> >> Oops, sorry. it slipped my mind. Things can go wrong if the free fastpath >> is executed while the alloc slowpath is executing and gets preempted. > > Agree. All that is very tricky. Probably worth adding a comment. Ah right, I did add one based on your commit log and this conversation: +/* + * We disallow kprobes in ___slab_alloc() to prevent reentrance + * + * kmalloc() -> ___slab_alloc() -> local_lock_cpu_slab() protected part of + * ___slab_alloc() manipulating c->freelist -> kprobe -> bpf -> + * kmalloc_nolock() or kfree_nolock() -> __update_cpu_freelist_fast() + * manipulating c->freelist without lock. + * + * This does not prevent kprobe in functions called from ___slab_alloc() such as + * local_lock_irqsave() itself, and that is fine, we only need to protect the + * c->freelist manipulation in ___slab_alloc() itself. + */ +NOKPROBE_SYMBOL(___slab_alloc); > Long term though I think RT folks needs to reconsider the lack of > fastpath. My gut feel is that overall kernel performance is poor on RT > due to this. sheaves might help though. Oh they definitely should, thanks for the selling point :) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-17 18:34 ` Vlastimil Babka @ 2025-09-17 18:40 ` Alexei Starovoitov 0 siblings, 0 replies; 16+ messages in thread From: Alexei Starovoitov @ 2025-09-17 18:40 UTC (permalink / raw) To: Vlastimil Babka Cc: Harry Yoo, bpf, linux-mm, Shakeel Butt, Michal Hocko, Sebastian Sewior, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Andrew Morton, Peter Zijlstra, Steven Rostedt, Johannes Weiner On Wed, Sep 17, 2025 at 11:34 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > > Ah right, I did add one based on your commit log and this conversation: > > +/* > + * We disallow kprobes in ___slab_alloc() to prevent reentrance > + * > + * kmalloc() -> ___slab_alloc() -> local_lock_cpu_slab() protected part of > + * ___slab_alloc() manipulating c->freelist -> kprobe -> bpf -> > + * kmalloc_nolock() or kfree_nolock() -> __update_cpu_freelist_fast() > + * manipulating c->freelist without lock. > + * > + * This does not prevent kprobe in functions called from ___slab_alloc() such as > + * local_lock_irqsave() itself, and that is fine, we only need to protect the > + * c->freelist manipulation in ___slab_alloc() itself. > + */ > +NOKPROBE_SYMBOL(___slab_alloc); Perfect. Short and to the point. Thanks! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 2:21 [PATCH slab] slab: Disallow kprobes in ___slab_alloc() Alexei Starovoitov 2025-09-16 10:40 ` Vlastimil Babka @ 2025-09-16 10:59 ` Harry Yoo 2025-09-16 12:25 ` Vlastimil Babka 1 sibling, 1 reply; 16+ messages in thread From: Harry Yoo @ 2025-09-16 10:59 UTC (permalink / raw) To: Alexei Starovoitov Cc: bpf, linux-mm, vbabka, shakeel.butt, mhocko, bigeasy, andrii, memxor, akpm, peterz, rostedt, hannes On Mon, Sep 15, 2025 at 07:21:40PM -0700, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@kernel.org> > > Disallow kprobes in ___slab_alloc() to prevent reentrance: > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> > kprobe -> bpf -> kmalloc_nolock(). > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- Maybe I'm a bit paranoid, but should we also add this to all functions that ___slab_alloc() calls? -- Cheers, Harry / Hyeonggon > mm/slub.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/slub.c b/mm/slub.c > index c995f3bec69d..922d47b10c2f 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -45,7 +45,7 @@ > #include <kunit/test-bug.h> > #include <linux/sort.h> > #include <linux/irq_work.h> > - > +#include <linux/kprobes.h> > #include <linux/debugfs.h> > #include <trace/events/kmem.h> > > @@ -4697,6 +4697,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > goto load_freelist; > } > +NOKPROBE_SYMBOL(___slab_alloc); > > /* > * A wrapper for ___slab_alloc() for contexts where preemption is not yet > -- > 2.47.3 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc() 2025-09-16 10:59 ` Harry Yoo @ 2025-09-16 12:25 ` Vlastimil Babka 0 siblings, 0 replies; 16+ messages in thread From: Vlastimil Babka @ 2025-09-16 12:25 UTC (permalink / raw) To: Harry Yoo, Alexei Starovoitov Cc: bpf, linux-mm, shakeel.butt, mhocko, bigeasy, andrii, memxor, akpm, peterz, rostedt, hannes On 9/16/25 12:59, Harry Yoo wrote: > On Mon, Sep 15, 2025 at 07:21:40PM -0700, Alexei Starovoitov wrote: >> From: Alexei Starovoitov <ast@kernel.org> >> >> Disallow kprobes in ___slab_alloc() to prevent reentrance: >> kmalloc() -> ___slab_alloc() -> local_lock_irqsave() -> >> kprobe -> bpf -> kmalloc_nolock(). >> >> Signed-off-by: Alexei Starovoitov <ast@kernel.org> >> --- > > Maybe I'm a bit paranoid, but should we also add this to > all functions that ___slab_alloc() calls? It should be enough to all that it calls under local_lock_cpu_slab(). AFAICS these are all trivial and inline. Maybe get_freelist() and the functions it calls is more complex. Should we mark all that __always_inline and would we be able to to rely on this? Or rather add the NOKPROBE_SYMBOL()? (but it would be wrong if that turned the function to be non-inlined). ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-09-17 18:41 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-09-16 2:21 [PATCH slab] slab: Disallow kprobes in ___slab_alloc() Alexei Starovoitov 2025-09-16 10:40 ` Vlastimil Babka 2025-09-16 12:58 ` Harry Yoo 2025-09-16 13:13 ` Vlastimil Babka 2025-09-16 16:18 ` Alexei Starovoitov 2025-09-16 18:12 ` Vlastimil Babka 2025-09-16 18:46 ` Alexei Starovoitov 2025-09-16 19:06 ` Vlastimil Babka 2025-09-16 20:26 ` Alexei Starovoitov 2025-09-17 7:02 ` Harry Yoo 2025-09-17 7:06 ` Harry Yoo 2025-09-17 18:26 ` Alexei Starovoitov 2025-09-17 18:34 ` Vlastimil Babka 2025-09-17 18:40 ` Alexei Starovoitov 2025-09-16 10:59 ` Harry Yoo 2025-09-16 12:25 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox