linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Harry Yoo <harry.yoo@oracle.com>, bpf <bpf@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Michal Hocko <mhocko@suse.com>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH slab] slab: Disallow kprobes in ___slab_alloc()
Date: Tue, 16 Sep 2025 21:06:46 +0200	[thread overview]
Message-ID: <c370486e-cb8f-4201-b70e-2bdddab9e642@suse.cz> (raw)
In-Reply-To: <CAADnVQJbj3OqS9x6MOmnmHa=69ACVEKa=QX-KVAncyocjCn1AQ@mail.gmail.com>

On 9/16/25 20:46, Alexei Starovoitov wrote:
> On Tue, Sep 16, 2025 at 11:12 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 9/16/25 18:18, Alexei Starovoitov wrote:
>> > On Tue, Sep 16, 2025 at 6:13 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>> >>
>> >> On 9/16/25 14:58, Harry Yoo wrote:
>> >> > On Tue, Sep 16, 2025 at 12:40:12PM +0200, Vlastimil Babka wrote:
>> >> >> On 9/16/25 04:21, Alexei Starovoitov wrote:
>> >> >> > From: Alexei Starovoitov <ast@kernel.org>
>> >> >> >
>> >> >> > Disallow kprobes in ___slab_alloc() to prevent reentrance:
>> >> >> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >> >> > kprobe -> bpf -> kmalloc_nolock().
>> >> >> >
>> >> >> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> >> >>
>> >> >> I wanted to fold this in "slab: Introduce kmalloc_nolock() and kfree_nolock()."
>> >> >> and update comments to explain the NOKPROBE_SYMBOL(___slab_alloc);
>> >> >>
>> >> >> But now I'm not sure if we still need to invent the lockdep classes for PREEMPT_RT anymore:
>> >> >>
>> >> >> > /*
>> >> >> >  * ___slab_alloc()'s caller is supposed to check if kmem_cache::kmem_cache_cpu::lock
>> >> >> >  * can be acquired without a deadlock before invoking the function.
>> >> >> >  *
>> >> >> >  * Without LOCKDEP we trust the code to be correct. kmalloc_nolock() is
>> >> >> >  * using local_lock_is_locked() properly before calling local_lock_cpu_slab(),
>> >> >> >  * and kmalloc() is not used in an unsupported context.
>> >> >> >  *
>> >> >> >  * With LOCKDEP, on PREEMPT_RT lockdep does its checking in local_lock_irqsave().
>> >> >> >  * On !PREEMPT_RT we use trylock to avoid false positives in NMI, but
>> >> >> >  * lockdep_assert() will catch a bug in case:
>> >> >> >  * #1
>> >> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> NMI -> bpf -> kmalloc_nolock()
>> >> >> >  * or
>> >> >> >  * #2
>> >> >> >  * kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock()
>> >> >>
>> >> >> AFAICS see we now eliminated this possibility.
>> >> >
>> >> > Right.
>> >> >
>> >> >> >  * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
>> >> >> >  * disabled context. The lock will always be acquired and if needed it
>> >> >> >  * block and sleep until the lock is available.
>> >> >> >  * #1 is possible in !PREEMPT_RT only.
>> >> >>
>> >> >> Yes because this in kmalloc_nolock_noprof()
>> >> >>
>> >> >>         if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq()))
>> >> >>                 /* kmalloc_nolock() in PREEMPT_RT is not supported from irq */
>> >> >>                 return NULL;
>> >> >>
>> >> >>
>> >> >> >  * #2 is possible in both with a twist that irqsave is replaced with rt_spinlock:
>> >> >> >  * kmalloc() -> ___slab_alloc() -> rt_spin_lock(kmem_cache_A) ->
>> >> >> >  *    tracepoint/kprobe -> bpf -> kmalloc_nolock() -> rt_spin_lock(kmem_cache_B)
>> >> >> And this is no longer possible, so can we just remove these comments and drop
>> >> >> "slab: Make slub local_(try)lock more precise for LOCKDEP" now?
>> >> >
>> >> > Makes sense and sounds good to me.
>> >> >
>> >> > Also in the commit mesage should be adjusted too:
>> >> >> kmalloc_nolock() can be called from any context and can re-enter
>> >> >> into ___slab_alloc():
>> >> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
>> >> >>     kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >> >> or
>> >> >>  kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
>> >> >>    kmalloc_nolock() -> ___slab_alloc(cache_B)
>> >> >
>> >> > The lattter path is not possible anymore,
>> >> >
>> >> >> Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu
>> >> >> rt_spin_lock is locked by current _task_. In this case re-entrance into
>> >> >> the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different
>> >> >> bucket that is most likely is not locked by the current task.
>> >> >> Though it may be locked by a different task it's safe to rt_spin_lock() and
>> >> >> sleep on it.
>> >> >
>> >> > and this paragraph is no longer valid either?
>> >>
>> >> Thanks for confirming! Let's see if Alexei agrees or we both missed
>> >> something.
>> >
>> > Not quite.
>> > This patch prevents
>> > kmalloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >     kprobe -> bpf
>> >
>> > to make sure kprobe cannot be inserted in the _middle_ of
>> > freelist operations.
>> > kprobe/tracepoint outside of freelist is not a concern,
>> > and
>> > malloc() -> ___slab_alloc() -> local_lock_irqsave() ->
>> >    tracepoint -> bpf
>> >
>> > is still possible. Especially on RT.
>>
>> Hm I see. I wrongly reasoned as if NOKPROBE_SYMBOL(___slab_alloc) covers the
>> whole scope of ___slab_alloc() but that's not the case. Thanks for clearin
>> that up.
> 
> hmm. NOKPROBE_SYMBOL(___slab_alloc) covers the whole function.
> It disallows kprobes anywhere within the body,
> but it doesn't make it 'notrace', so tracing the first nop5
> is still ok.

Yeah by "scope" I meant also whatever that function calls, i.e. the spinlock
operations you mentioned (local_lock_irqsave()). That's not part of the
___slab_alloc() body so you're right we have not eliminated it.

>>
>> But with nmi that's variant of #1 of that comment.
>>
>> Like for ___slab_alloc() we need to prevent #2 with no nmi?
>> example on !RT:
>>
>> kmalloc() -> ___slab_alloc() -> irqsave -> tracepoint/kprobe -> bpf ->
>> kfree_nolock() -> do_slab_free()
>>
>> in_nmi() || !USE_LOCKLESS_FAST_PATH()
>> false || false, we proceed, no checking of local_lock_is_locked()
>>
>> if (USE_LOCKLESS_FAST_PATH()) { - true (!RT)
>> -> __update_cpu_freelist_fast()
>>
>> Am I missing something?
> 
> It's ok to call __update_cpu_freelist_fast(). It won't break anything.
> Because only nmi can make this cpu to be in the middle of freelist update.

You're right, freeing uses the "slowpath" (local_lock protected instead of
cmpxchg16b) c->freelist manipulation only on RT. So we can't preempt it with
a kprobe on !RT because it doesn't exist there at all. The only one is in
___slab_alloc() and that's covered.

I do really hope we'll get to the point where sheaves will be able to
replace the other percpu slab caching completely... it should be much simpler.


  reply	other threads:[~2025-09-16 19:06 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-16  2:21 Alexei Starovoitov
2025-09-16 10:40 ` Vlastimil Babka
2025-09-16 12:58   ` Harry Yoo
2025-09-16 13:13     ` Vlastimil Babka
2025-09-16 16:18       ` Alexei Starovoitov
2025-09-16 18:12         ` Vlastimil Babka
2025-09-16 18:46           ` Alexei Starovoitov
2025-09-16 19:06             ` Vlastimil Babka [this message]
2025-09-16 20:26               ` Alexei Starovoitov
2025-09-17  7:02                 ` Harry Yoo
2025-09-17  7:06                   ` Harry Yoo
2025-09-17 18:26                     ` Alexei Starovoitov
2025-09-17 18:34                       ` Vlastimil Babka
2025-09-17 18:40                         ` Alexei Starovoitov
2025-09-16 10:59 ` Harry Yoo
2025-09-16 12:25   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c370486e-cb8f-4201-b70e-2bdddab9e642@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=memxor@gmail.com \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shakeel.butt@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox