linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Harry Yoo <harry.yoo@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Alexei Starovoitov <ast@kernel.org>, Hao Li <hao.li@linux.dev>,
	linux-mm@kvack.org
Subject: Re: [PATCH 2/2] mm/slab: use prandom if !allow_spin
Date: Fri, 6 Feb 2026 19:27:38 +0100	[thread overview]
Message-ID: <52b408e8-ea29-439d-8e34-91c4cae8009a@suse.cz> (raw)
In-Reply-To: <20260206171348.35886-3-harry.yoo@oracle.com>

On 2/6/26 18:13, Harry Yoo wrote:
> When CONFIG_SLAB_FREELIST_RANDOM is enabled and get_random_u32()
> is called in an NMI context, lockdep complains because it acquires
> a local_lock:
> 
>   ================================
>   WARNING: inconsistent lock state
>   6.19.0-rc5-slab-for-next+ #325 Tainted: G                 N
>   --------------------------------
>   inconsistent {INITIAL USE} -> {IN-NMI} usage.
>   kunit_try_catch/8312 [HC2[2]:SC0[0]:HE0:SE1] takes:
>   ffff88a02ec49cc0 (batched_entropy_u32.lock){-.-.}-{3:3}, at: get_random_u32+0x7f/0x2e0
>   {INITIAL USE} state was registered at:
>     lock_acquire+0xd9/0x2f0
>     get_random_u32+0x93/0x2e0
>     __get_random_u32_below+0x17/0x70
>     cache_random_seq_create+0x121/0x1c0
>     init_cache_random_seq+0x5d/0x110
>     do_kmem_cache_create+0x1e0/0xa30
>     __kmem_cache_create_args+0x4ec/0x830
>     create_kmalloc_caches+0xe6/0x130
>     kmem_cache_init+0x1b1/0x660
>     mm_core_init+0x1d8/0x4b0
>     start_kernel+0x620/0xcd0
>     x86_64_start_reservations+0x18/0x30
>     x86_64_start_kernel+0xf3/0x140
>     common_startup_64+0x13e/0x148
>   irq event stamp: 76
>   hardirqs last  enabled at (75): [<ffffffff8298b77a>] exc_nmi+0x11a/0x240
>   hardirqs last disabled at (76): [<ffffffff8298b991>] sysvec_irq_work+0x11/0x110
>   softirqs last  enabled at (0): [<ffffffff813b2dda>] copy_process+0xc7a/0x2350
>   softirqs last disabled at (0): [<0000000000000000>] 0x0
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(batched_entropy_u32.lock);
>     <Interrupt>
>       lock(batched_entropy_u32.lock);
> 
>    *** DEADLOCK ***
> 
> Fix this by using pseudo-random number generator if !allow_spin.
> This means kmalloc_nolock() users won't get truly random numbers,
> but there is not much we can do about it.
> 
> Note that an NMI handler might interrupt prandom_u32_state() and
> change the random state, but that's safe.
> 
> Link: https://lore.kernel.org/all/0c33bdee-6de8-4d9f-92ca-4f72c1b6fb9f@suse.cz
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>  mm/slub.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index d46464654c15..4d76af84f018 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -43,6 +43,7 @@
>  #include <linux/prefetch.h>
>  #include <linux/memcontrol.h>
>  #include <linux/random.h>
> +#include <linux/prandom.h>
>  #include <kunit/test.h>
>  #include <kunit/test-bug.h>
>  #include <linux/sort.h>
> @@ -3308,8 +3309,11 @@ static void *next_freelist_entry(struct kmem_cache *s,
>  	return (char *)start + idx;
>  }
>  
> +static DEFINE_PER_CPU(struct rnd_state, slab_rnd_state);
> +
>  /* Shuffle the single linked freelist based on a random pre-computed sequence */
> -static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> +static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
> +			     bool allow_spin)
>  {
>  	void *start;
>  	void *cur;
> @@ -3320,7 +3324,19 @@ static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
>  		return false;
>  
>  	freelist_count = oo_objects(s->oo);
> -	pos = get_random_u32_below(freelist_count);
> +	if (allow_spin) {
> +		pos = get_random_u32_below(freelist_count);
> +	} else {
> +		struct rnd_state *state;
> +
> +		/*
> +		 * kmalloc_nolock() called in an NMI context might interrupt
> +		 * and change the state in the middle.
> +		 */
> +		state = &get_cpu_var(slab_rnd_state);
> +		pos = prandom_u32_state(state) % freelist_count;
> +		put_cpu_var(slab_rnd_state);

I don't think this prevents the changing in the middle? We just stored the
pointer in a local variable state, but the prandom call will still access
the percpu variable through that?

So we might need to disable irq here, and have another percpu state that's
used when in_nmi()?

> +	}
>  
>  	page_limit = slab->objects * s->size;
>  	start = fixup_red_left(s, slab_address(slab));
> @@ -3347,7 +3363,8 @@ static inline int init_cache_random_seq(struct kmem_cache *s)
>  	return 0;
>  }
>  static inline void init_freelist_randomization(void) { }
> -static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
> +static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
> +				    bool allow_spin)
>  {
>  	return false;
>  }
> @@ -3438,7 +3455,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  	alloc_slab_obj_exts_early(s, slab);
>  	account_slab(slab, oo_order(oo), s, flags);
>  
> -	shuffle = shuffle_freelist(s, slab);
> +	shuffle = shuffle_freelist(s, slab, allow_spin);
>  
>  	if (!shuffle) {
>  		start = fixup_red_left(s, start);
> @@ -8337,6 +8354,9 @@ void __init kmem_cache_init_late(void)
>  {
>  	flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0);
>  	WARN_ON(!flushwq);
> +#ifdef CONFIG_SLAB_FREELIST_RANDOM
> +	prandom_init_once(&slab_rnd_state);
> +#endif
>  }
>  
>  int do_kmem_cache_create(struct kmem_cache *s, const char *name,



  reply	other threads:[~2026-02-06 18:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 17:13 [PATCH 0/2] mm/slab: fix lockdep warnings with kmalloc_nolock() Harry Yoo
2026-02-06 17:13 ` [PATCH 1/2] mm/slab: skip get_from_any_partial() if !allow_spin Harry Yoo
2026-02-06 18:10   ` Vlastimil Babka
2026-02-06 19:19     ` Alexei Starovoitov
2026-02-09  3:18       ` Harry Yoo
2026-02-09 19:03         ` Vlastimil Babka
2026-02-06 17:13 ` [PATCH 2/2] mm/slab: use prandom " Harry Yoo
2026-02-06 18:27   ` Vlastimil Babka [this message]
2026-02-06 19:22     ` Alexei Starovoitov
2026-02-07  1:25       ` Harry Yoo
2026-02-06 17:37 ` [PATCH 0/2] mm/slab: fix lockdep warnings with kmalloc_nolock() Harry Yoo
2026-02-09 19:03   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52b408e8-ea29-439d-8e34-91c4cae8009a@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=harry.yoo@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox