linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock()
@ 2026-02-10  8:18 Harry Yoo
  2026-02-10  8:18 ` [PATCH V2 1/2] mm/slab: do not access current->mems_allowed_seq if !allow_spin Harry Yoo
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Harry Yoo @ 2026-02-10  8:18 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	Alexei Starovoitov, Hao Li, linux-mm

Hi, I've observed two lockdep warnings while testing
kmalloc_nolock() in NMI:

  1. Accessing current->mems_allowed_seq seqlock in NMI isn't safe
     and lockdep complains.

  2. w/ CONFIG_SLAB_FREELIST_RANDOM, get_random_u32() acquires
     a local_lock, which isn't safe in NMI and could cause a deadlock.

Let's fix them.

Note: This is based on the latest slab/for-next. It wasn't clear to me
if I should base it on slab/for-next or slab/for-next-fixes,
because the merge window has started, this series needs some exposure
in -next, and the patches in slab/for-next might land in mainline
in the meantime.

But the conflict resolution should be trivial even if it should have
been based on slab/for-next-fixes.

v1 -> v2:
  - Patch 1: per Vlastimil's suggestion, do not access
    current->mems_allowed_seq and avoid retry if !allow_spin,
    rather than returning NULL.

v1: https://lore.kernel.org/linux-mm/20260206171348.35886-1-harry.yoo@oracle.com

Harry Yoo (2):
  mm/slab: do not access current->mems_allowed_seq if !allow_spin
  mm/slab: use prandom if !allow_spin

 mm/slub.c | 41 +++++++++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 6 deletions(-)


base-commit: f6ed7e47c1fc78e78c9bfeb668b1ad9ba5c58120

-- 
2.43.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH V2 1/2] mm/slab: do not access current->mems_allowed_seq if !allow_spin
  2026-02-10  8:18 [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock() Harry Yoo
@ 2026-02-10  8:18 ` Harry Yoo
  2026-02-10  8:19 ` [PATCH V2 2/2] mm/slab: use prandom " Harry Yoo
  2026-02-10  9:58 ` [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock() Vlastimil Babka
  2 siblings, 0 replies; 4+ messages in thread
From: Harry Yoo @ 2026-02-10  8:18 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	Alexei Starovoitov, Hao Li, linux-mm, stable

Lockdep complains when get_from_any_partial() is called in an NMI
context, because current->mems_allowed_seq is seqcount_spinlock_t and
not NMI-safe:

  ================================
  WARNING: inconsistent lock state
  6.19.0-rc5-kfree-rcu+ #315 Tainted: G                 N
  --------------------------------
  inconsistent {INITIAL USE} -> {IN-NMI} usage.
  kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00
  {INITIAL USE} state was registered at:
    lock_acquire+0x185/0x320
    kernel_init_freeable+0x391/0x1150
    kernel_init+0x1f/0x220
    ret_from_fork+0x736/0x8f0
    ret_from_fork_asm+0x1a/0x30
  irq event stamp: 56
  hardirqs last  enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70
  hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630
  softirqs last  enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10
  softirqs last disabled at (0): [<0000000000000000>] 0x0

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&____s->seqcount#3);
    <Interrupt>
      lock(&____s->seqcount#3);

   *** DEADLOCK ***

According to Documentation/locking/seqlock.rst, seqcount_t is not
NMI-safe and seqcount_latch_t should be used when read path can interrupt
the write-side critical section. In this case, do not access
current->mems_allowed_seq and avoid retry.

Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Cc: stable@vger.kernel.org
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/slub.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 11a99bd06ac7..90f0e6667130 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3791,6 +3791,7 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
 	struct zone *zone;
 	enum zone_type highest_zoneidx = gfp_zone(pc->flags);
 	unsigned int cpuset_mems_cookie;
+	bool allow_spin = gfpflags_allow_spinning(pc->flags);
 
 	/*
 	 * The defrag ratio allows a configuration of the tradeoffs between
@@ -3815,7 +3816,15 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
 		return NULL;
 
 	do {
-		cpuset_mems_cookie = read_mems_allowed_begin();
+		/*
+		 * read_mems_allowed_begin() accesses current->mems_allowed_seq,
+		 * a seqcount_spinlock_t that is not NMI-safe. Do not access
+		 * current->mems_allowed_seq and avoid retry when GFP flags
+		 * indicate spinning is not allowed.
+		 */
+		if (allow_spin)
+			cpuset_mems_cookie = read_mems_allowed_begin();
+
 		zonelist = node_zonelist(mempolicy_slab_node(), pc->flags);
 		for_each_zone_zonelist(zone, z, zonelist, highest_zoneidx) {
 			struct kmem_cache_node *n;
@@ -3839,7 +3848,7 @@ static void *get_from_any_partial(struct kmem_cache *s, struct partial_context *
 				}
 			}
 		}
-	} while (read_mems_allowed_retry(cpuset_mems_cookie));
+	} while (allow_spin && read_mems_allowed_retry(cpuset_mems_cookie));
 #endif	/* CONFIG_NUMA */
 	return NULL;
 }
-- 
2.43.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH V2 2/2] mm/slab: use prandom if !allow_spin
  2026-02-10  8:18 [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock() Harry Yoo
  2026-02-10  8:18 ` [PATCH V2 1/2] mm/slab: do not access current->mems_allowed_seq if !allow_spin Harry Yoo
@ 2026-02-10  8:19 ` Harry Yoo
  2026-02-10  9:58 ` [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock() Vlastimil Babka
  2 siblings, 0 replies; 4+ messages in thread
From: Harry Yoo @ 2026-02-10  8:19 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin, Harry Yoo,
	Alexei Starovoitov, Hao Li, linux-mm, stable

When CONFIG_SLAB_FREELIST_RANDOM is enabled and get_random_u32()
is called in an NMI context, lockdep complains because it acquires
a local_lock:

  ================================
  WARNING: inconsistent lock state
  6.19.0-rc5-slab-for-next+ #325 Tainted: G                 N
  --------------------------------
  inconsistent {INITIAL USE} -> {IN-NMI} usage.
  kunit_try_catch/8312 [HC2[2]:SC0[0]:HE0:SE1] takes:
  ffff88a02ec49cc0 (batched_entropy_u32.lock){-.-.}-{3:3}, at: get_random_u32+0x7f/0x2e0
  {INITIAL USE} state was registered at:
    lock_acquire+0xd9/0x2f0
    get_random_u32+0x93/0x2e0
    __get_random_u32_below+0x17/0x70
    cache_random_seq_create+0x121/0x1c0
    init_cache_random_seq+0x5d/0x110
    do_kmem_cache_create+0x1e0/0xa30
    __kmem_cache_create_args+0x4ec/0x830
    create_kmalloc_caches+0xe6/0x130
    kmem_cache_init+0x1b1/0x660
    mm_core_init+0x1d8/0x4b0
    start_kernel+0x620/0xcd0
    x86_64_start_reservations+0x18/0x30
    x86_64_start_kernel+0xf3/0x140
    common_startup_64+0x13e/0x148
  irq event stamp: 76
  hardirqs last  enabled at (75): [<ffffffff8298b77a>] exc_nmi+0x11a/0x240
  hardirqs last disabled at (76): [<ffffffff8298b991>] sysvec_irq_work+0x11/0x110
  softirqs last  enabled at (0): [<ffffffff813b2dda>] copy_process+0xc7a/0x2350
  softirqs last disabled at (0): [<0000000000000000>] 0x0

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(batched_entropy_u32.lock);
    <Interrupt>
      lock(batched_entropy_u32.lock);

   *** DEADLOCK ***

Fix this by using pseudo-random number generator if !allow_spin.
This means kmalloc_nolock() users won't get truly random numbers,
but there is not much we can do about it.

Note that an NMI handler might interrupt prandom_u32_state() and
change the random state, but that's safe.

Link: https://lore.kernel.org/all/0c33bdee-6de8-4d9f-92ca-4f72c1b6fb9f@suse.cz
Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().")
Cc: stable@vger.kernel.org
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
 mm/slub.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 90f0e6667130..591e41e5acc4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -43,6 +43,7 @@
 #include <linux/prefetch.h>
 #include <linux/memcontrol.h>
 #include <linux/random.h>
+#include <linux/prandom.h>
 #include <kunit/test.h>
 #include <kunit/test-bug.h>
 #include <linux/sort.h>
@@ -3311,8 +3312,11 @@ static void *next_freelist_entry(struct kmem_cache *s,
 	return (char *)start + idx;
 }
 
+static DEFINE_PER_CPU(struct rnd_state, slab_rnd_state);
+
 /* Shuffle the single linked freelist based on a random pre-computed sequence */
-static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
+static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
+			     bool allow_spin)
 {
 	void *start;
 	void *cur;
@@ -3323,7 +3327,19 @@ static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
 		return false;
 
 	freelist_count = oo_objects(s->oo);
-	pos = get_random_u32_below(freelist_count);
+	if (allow_spin) {
+		pos = get_random_u32_below(freelist_count);
+	} else {
+		struct rnd_state *state;
+
+		/*
+		 * An interrupt or NMI handler might interrupt and change
+		 * the state in the middle, but that's safe.
+		 */
+		state = &get_cpu_var(slab_rnd_state);
+		pos = prandom_u32_state(state) % freelist_count;
+		put_cpu_var(slab_rnd_state);
+	}
 
 	page_limit = slab->objects * s->size;
 	start = fixup_red_left(s, slab_address(slab));
@@ -3350,7 +3366,8 @@ static inline int init_cache_random_seq(struct kmem_cache *s)
 	return 0;
 }
 static inline void init_freelist_randomization(void) { }
-static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab)
+static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
+				    bool allow_spin)
 {
 	return false;
 }
@@ -3441,7 +3458,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	alloc_slab_obj_exts_early(s, slab);
 	account_slab(slab, oo_order(oo), s, flags);
 
-	shuffle = shuffle_freelist(s, slab);
+	shuffle = shuffle_freelist(s, slab, allow_spin);
 
 	if (!shuffle) {
 		start = fixup_red_left(s, start);
@@ -8341,6 +8358,9 @@ void __init kmem_cache_init_late(void)
 {
 	flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0);
 	WARN_ON(!flushwq);
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+	prandom_init_once(&slab_rnd_state);
+#endif
 }
 
 int do_kmem_cache_create(struct kmem_cache *s, const char *name,
-- 
2.43.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock()
  2026-02-10  8:18 [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock() Harry Yoo
  2026-02-10  8:18 ` [PATCH V2 1/2] mm/slab: do not access current->mems_allowed_seq if !allow_spin Harry Yoo
  2026-02-10  8:19 ` [PATCH V2 2/2] mm/slab: use prandom " Harry Yoo
@ 2026-02-10  9:58 ` Vlastimil Babka
  2 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2026-02-10  9:58 UTC (permalink / raw)
  To: Harry Yoo, Andrew Morton
  Cc: Christoph Lameter, David Rientjes, Roman Gushchin,
	Alexei Starovoitov, Hao Li, linux-mm

On 2/10/26 09:18, Harry Yoo wrote:
> Hi, I've observed two lockdep warnings while testing
> kmalloc_nolock() in NMI:
> 
>   1. Accessing current->mems_allowed_seq seqlock in NMI isn't safe
>      and lockdep complains.
> 
>   2. w/ CONFIG_SLAB_FREELIST_RANDOM, get_random_u32() acquires
>      a local_lock, which isn't safe in NMI and could cause a deadlock.
> 
> Let's fix them.
> 
> Note: This is based on the latest slab/for-next. It wasn't clear to me
> if I should base it on slab/for-next or slab/for-next-fixes,
> because the merge window has started, this series needs some exposure
> in -next, and the patches in slab/for-next might land in mainline
> in the meantime.

Right, adding to slab/for-next on top of the submitted PR, so I can send
another next week. Thanks!

> But the conflict resolution should be trivial even if it should have
> been based on slab/for-next-fixes.
> 
> v1 -> v2:
>   - Patch 1: per Vlastimil's suggestion, do not access
>     current->mems_allowed_seq and avoid retry if !allow_spin,
>     rather than returning NULL.
> 
> v1: https://lore.kernel.org/linux-mm/20260206171348.35886-1-harry.yoo@oracle.com
> 
> Harry Yoo (2):
>   mm/slab: do not access current->mems_allowed_seq if !allow_spin
>   mm/slab: use prandom if !allow_spin
> 
>  mm/slub.c | 41 +++++++++++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 6 deletions(-)
> 
> 
> base-commit: f6ed7e47c1fc78e78c9bfeb668b1ad9ba5c58120
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-10  9:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-10  8:18 [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock() Harry Yoo
2026-02-10  8:18 ` [PATCH V2 1/2] mm/slab: do not access current->mems_allowed_seq if !allow_spin Harry Yoo
2026-02-10  8:19 ` [PATCH V2 2/2] mm/slab: use prandom " Harry Yoo
2026-02-10  9:58 ` [PATCH V2 0/2] fix lockdep warnings with kmalloc_nolock() Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox