linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
@ 2026-04-09 12:43 hu.shengming
  2026-04-13  3:45 ` Harry Yoo (Oracle)
  0 siblings, 1 reply; 5+ messages in thread
From: hu.shengming @ 2026-04-09 12:43 UTC (permalink / raw)
  To: vbabka, harry, akpm
  Cc: hao.li, cl, rientjes, roman.gushchin, linux-mm, linux-kernel,
	zhang.run, xu.xin16, yang.tao172, yang.yang29


[-- Attachment #1.1.1: Type: text/plain, Size: 19157 bytes --]

From: Shengming Hu <hu.shengming@zte.com.cn>

Allocations from a fresh slab can consume all of its objects, and the
freelist built during slab allocation is discarded immediately as a result.

Instead of special-casing the whole-slab bulk refill case, defer freelist
construction until after objects are emitted from a fresh slab.
new_slab() now only allocates the slab and initializes its metadata.
refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
emit objects directly, building a freelist only for the objects left
unallocated; the same change is applied to alloc_single_from_new_slab().

To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
small iterator abstraction for walking free objects in allocation order.
The iterator is used both for filling the sheaf and for building the
freelist of the remaining objects.

Also mark setup_object() inline. After this optimization, the compiler no
longer consistently inlines this helper in the hot path, which can hurt
performance. Explicitly marking it inline restores the expected code
generation.

This reduces per-object overhead when allocating from a fresh slab.
The most direct benefit is in the paths that allocate objects first and
only build a freelist for the remainder afterward: bulk allocation from
a new slab in refill_objects(), single-object allocation from a new slab
in ___slab_alloc(), and the corresponding early-boot paths that now use
the same deferred-freelist scheme. Since refill_objects() is also used to
refill sheaves, the optimization is not limited to the small set of
kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
workloads may benefit as well when they refill from a fresh slab.

In slub_bulk_bench, the time per object drops by about 32% to 71% with
CONFIG_SLAB_FREELIST_RANDOM=n, and by about 52% to 70% with
CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
cost removed by this change: each iteration allocates exactly
slab->objects from a fresh slab. That makes it a near best-case scenario
for deferred freelist construction, because the old path still built a
full freelist even when no objects remained, while the new path avoids
that work. Realistic workloads may see smaller end-to-end gains depending
on how often allocations reach this fresh-slab refill path.

Benchmark results (slub_bulk_bench):
Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
Kernel: Linux 7.0.0-rc7-next-20260407
Config: x86_64_defconfig
Cpu: 0
Rounds: 20
Total: 256MB

- CONFIG_SLAB_FREELIST_RANDOM=n -

obj_size=16, batch=256:
before: 4.73 +- 0.05 ns/object
after: 3.19 +- 0.03 ns/object
delta: -32.6%

obj_size=32, batch=128:
before: 6.75 +- 0.06 ns/object
after: 3.61 +- 0.04 ns/object
delta: -46.5%

obj_size=64, batch=64:
before: 10.47 +- 0.07 ns/object
after: 4.47 +- 0.06 ns/object
delta: -57.3%

obj_size=128, batch=32:
before: 18.38 +- 0.08 ns/object
after: 5.99 +- 0.11 ns/object
delta: -67.4%

obj_size=256, batch=32:
before: 22.14 +- 0.21 ns/object
after: 6.36 +- 0.12 ns/object
delta: -71.3%

obj_size=512, batch=32:
before: 19.86 +- 0.10 ns/object
after: 6.60 +- 0.18 ns/object
delta: -66.8%

- CONFIG_SLAB_FREELIST_RANDOM=y -

obj_size=16, batch=256:
before: 8.88 +- 0.06 ns/object
after: 4.18 +- 0.15 ns/object
delta: -52.9%

obj_size=32, batch=128:
before: 11.45 +- 0.10 ns/object
after: 4.72 +- 0.11 ns/object
delta: -58.8%

obj_size=64, batch=64:
before: 15.67 +- 0.25 ns/object
after: 5.63 +- 0.04 ns/object
delta: -64.1%

obj_size=128, batch=32:
before: 22.26 +- 0.18 ns/object
after: 7.62 +- 0.08 ns/object
delta: -65.8%

obj_size=256, batch=32:
before: 27.26 +- 0.37 ns/object
after: 8.12 +- 0.07 ns/object
delta: -70.2%

obj_size=512, batch=32:
before: 28.08 +- 0.55 ns/object
after: 8.26 +- 0.12 ns/object
delta: -70.6%

Link: https://github.com/HSM6236/slub_bulk_test.git
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
---
Changes in v2:
- Handle CONFIG_SLAB_FREELIST_RANDOM=y and add benchmark results.
- Update the QEMU benchmark setup to use -enable-kvm -cpu host so benchmark results better reflect native CPU performance.
- Link to v1: https://lore.kernel.org/all/20260328125538341lvTGRpS62UNdRiAAz2gH3@zte.com.cn/

Changes in v3:
- refactor fresh-slab allocation to use a shared slab_obj_iter
- defer freelist construction until after bulk allocation from a new slab
- build a freelist only for leftover objects when the slab is left partial
- add build_slab_freelist(), prepare_slab_alloc_flags() and next_slab_obj() helpers
- remove obsolete freelist construction helpers now replaced by the iterator-based path, including next_freelist_entry() and shuffle_freelist()
- Link to v2: https://lore.kernel.org/all/202604011257259669oAdDsdnKx6twdafNZsF5@zte.com.cn/

Changes in v4:
- remove slab_obj_iter::cur
- drop prepare_slab_alloc_flags() and restore the original flag handling in new_slab()
- new_slab() no longer builds the freelist. Instead, the freelist is constructed only for the objects left unallocated in
  alloc_single_from_new_slab(), alloc_from_new_slab(), and early_kmem_cache_node_alloc().
- remove maybe_wipe_obj_freeptr() when allocating objects directly without freelist built
- Link to v3: https://lore.kernel.org/all/202604062150182836ygUiyPoKcxtHjgF7rWXe@zte.com.cn/

Changes in v5:
- Call build_slab_freelist() unconditionally, and remove the redundant "slab->freelist = NULL" initialization in allocate_slab().
- Check the return value of alloc_from_new_slab() to prevent a potential use-after-free bug.
- Refine the commit message with more precise test coverage descriptions.
- Link to v4: https://lore.kernel.org/all/2026040823281824773ybHpC3kgUhR9OE1rGTl@zte.com.cn/

---
 mm/slab.h |  10 ++
 mm/slub.c | 279 +++++++++++++++++++++++++++---------------------------
 2 files changed, 147 insertions(+), 142 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index bf2f87acf5e3..ada3f9c3909f 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -91,6 +91,16 @@ struct slab {
 #endif
 };
 
+struct slab_obj_iter {
+	unsigned long pos;
+	void *start;
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+	unsigned long freelist_count;
+	unsigned long page_limit;
+	bool random;
+#endif
+};
+
 #define SLAB_MATCH(pg, sl)						\
 	static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl))
 SLAB_MATCH(flags, flags);
diff --git a/mm/slub.c b/mm/slub.c
index 4927407c9699..9ff8af8c2f73 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2733,7 +2733,7 @@ bool slab_free_freelist_hook(struct kmem_cache *s, void **head, void **tail,
 	return *head != NULL;
 }
 
-static void *setup_object(struct kmem_cache *s, void *object)
+static inline void *setup_object(struct kmem_cache *s, void *object)
 {
 	setup_object_debug(s, object);
 	object = kasan_init_slab_obj(s, object);
@@ -3329,87 +3329,14 @@ static void __init init_freelist_randomization(void)
 	mutex_unlock(&slab_mutex);
 }
 
-/* Get the next entry on the pre-computed freelist randomized */
-static void *next_freelist_entry(struct kmem_cache *s,
-				unsigned long *pos, void *start,
-				unsigned long page_limit,
-				unsigned long freelist_count)
-{
-	unsigned int idx;
-
-	/*
-	 * If the target page allocation failed, the number of objects on the
-	 * page might be smaller than the usual size defined by the cache.
-	 */
-	do {
-		idx = s->random_seq[*pos];
-		*pos += 1;
-		if (*pos >= freelist_count)
-			*pos = 0;
-	} while (unlikely(idx >= page_limit));
-
-	return (char *)start + idx;
-}
-
 static DEFINE_PER_CPU(struct rnd_state, slab_rnd_state);
 
-/* Shuffle the single linked freelist based on a random pre-computed sequence */
-static bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
-			     bool allow_spin)
-{
-	void *start;
-	void *cur;
-	void *next;
-	unsigned long idx, pos, page_limit, freelist_count;
-
-	if (slab->objects < 2 || !s->random_seq)
-		return false;
-
-	freelist_count = oo_objects(s->oo);
-	if (allow_spin) {
-		pos = get_random_u32_below(freelist_count);
-	} else {
-		struct rnd_state *state;
-
-		/*
-		 * An interrupt or NMI handler might interrupt and change
-		 * the state in the middle, but that's safe.
-		 */
-		state = &get_cpu_var(slab_rnd_state);
-		pos = prandom_u32_state(state) % freelist_count;
-		put_cpu_var(slab_rnd_state);
-	}
-
-	page_limit = slab->objects * s->size;
-	start = fixup_red_left(s, slab_address(slab));
-
-	/* First entry is used as the base of the freelist */
-	cur = next_freelist_entry(s, &pos, start, page_limit, freelist_count);
-	cur = setup_object(s, cur);
-	slab->freelist = cur;
-
-	for (idx = 1; idx < slab->objects; idx++) {
-		next = next_freelist_entry(s, &pos, start, page_limit,
-			freelist_count);
-		next = setup_object(s, next);
-		set_freepointer(s, cur, next);
-		cur = next;
-	}
-	set_freepointer(s, cur, NULL);
-
-	return true;
-}
 #else
 static inline int init_cache_random_seq(struct kmem_cache *s)
 {
 	return 0;
 }
 static inline void init_freelist_randomization(void) { }
-static inline bool shuffle_freelist(struct kmem_cache *s, struct slab *slab,
-				    bool allow_spin)
-{
-	return false;
-}
 #endif /* CONFIG_SLAB_FREELIST_RANDOM */
 
 static __always_inline void account_slab(struct slab *slab, int order,
@@ -3438,15 +3365,14 @@ static __always_inline void unaccount_slab(struct slab *slab, int order,
 			    -(PAGE_SIZE << order));
 }
 
+/* Allocate and initialize a slab without building its freelist. */
 static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
 	bool allow_spin = gfpflags_allow_spinning(flags);
 	struct slab *slab;
 	struct kmem_cache_order_objects oo = s->oo;
 	gfp_t alloc_gfp;
-	void *start, *p, *next;
-	int idx;
-	bool shuffle;
+	void *start;
 
 	flags &= gfp_allowed_mask;
 
@@ -3497,21 +3423,6 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	alloc_slab_obj_exts_early(s, slab);
 	account_slab(slab, oo_order(oo), s, flags);
 
-	shuffle = shuffle_freelist(s, slab, allow_spin);
-
-	if (!shuffle) {
-		start = fixup_red_left(s, start);
-		start = setup_object(s, start);
-		slab->freelist = start;
-		for (idx = 0, p = start; idx < slab->objects - 1; idx++) {
-			next = p + s->size;
-			next = setup_object(s, next);
-			set_freepointer(s, p, next);
-			p = next;
-		}
-		set_freepointer(s, p, NULL);
-	}
-
 	return slab;
 }
 
@@ -3665,30 +3576,109 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
 	return object;
 }
 
+/* Return the next free object in allocation order. */
+static inline void *next_slab_obj(struct kmem_cache *s,
+				  struct slab_obj_iter *iter)
+{
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+	if (iter->random) {
+		unsigned long idx;
+
+		/*
+		 * If the target page allocation failed, the number of objects on the
+		 * page might be smaller than the usual size defined by the cache.
+		 */
+		do {
+			idx = s->random_seq[iter->pos];
+			iter->pos++;
+			if (iter->pos >= iter->freelist_count)
+				iter->pos = 0;
+		} while (unlikely(idx >= iter->page_limit));
+
+		return setup_object(s, (char *)iter->start + idx);
+	}
+#endif
+	return setup_object(s, (char *)iter->start + iter->pos++ * s->size);
+}
+
+/* Build a freelist from the objects not yet allocated from a fresh slab. */
+static inline void build_slab_freelist(struct kmem_cache *s, struct slab *slab,
+				       struct slab_obj_iter *iter)
+{
+	unsigned int nr = slab->objects - slab->inuse;
+	unsigned int i;
+	void *cur, *next;
+
+	if (!nr) {
+		slab->freelist = NULL;
+		return;
+	}
+
+	cur = next_slab_obj(s, iter);
+	slab->freelist = cur;
+
+	for (i = 1; i < nr; i++) {
+		next = next_slab_obj(s, iter);
+		set_freepointer(s, cur, next);
+		cur = next;
+	}
+
+	set_freepointer(s, cur, NULL);
+}
+
+/* Initialize an iterator over free objects in allocation order. */
+static inline void init_slab_obj_iter(struct kmem_cache *s, struct slab *slab,
+				      struct slab_obj_iter *iter,
+				      bool allow_spin)
+{
+	iter->pos = 0;
+	iter->start = fixup_red_left(s, slab_address(slab));
+
+#ifdef CONFIG_SLAB_FREELIST_RANDOM
+	iter->random = (slab->objects >= 2 && s->random_seq);
+	if (!iter->random)
+		return;
+
+	iter->freelist_count = oo_objects(s->oo);
+	iter->page_limit = slab->objects * s->size;
+
+	if (allow_spin) {
+		iter->pos = get_random_u32_below(iter->freelist_count);
+	} else {
+		struct rnd_state *state;
+
+		/*
+		 * An interrupt or NMI handler might interrupt and change
+		 * the state in the middle, but that's safe.
+		 */
+		state = &get_cpu_var(slab_rnd_state);
+		iter->pos = prandom_u32_state(state) % iter->freelist_count;
+		put_cpu_var(slab_rnd_state);
+	}
+#endif
+}
+
 /*
  * Called only for kmem_cache_debug() caches to allocate from a freshly
  * allocated slab. Allocate a single object instead of whole freelist
  * and put the slab to the partial (or full) list.
  */
 static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
-					int orig_size, gfp_t gfpflags)
+					int orig_size, bool allow_spin)
 {
-	bool allow_spin = gfpflags_allow_spinning(gfpflags);
-	int nid = slab_nid(slab);
-	struct kmem_cache_node *n = get_node(s, nid);
+	struct kmem_cache_node *n;
+	struct slab_obj_iter iter;
+	bool needs_add_partial;
 	unsigned long flags;
 	void *object;
 
-	if (!allow_spin && !spin_trylock_irqsave(&n->list_lock, flags)) {
-		/* Unlucky, discard newly allocated slab. */
-		free_new_slab_nolock(s, slab);
-		return NULL;
-	}
-
-	object = slab->freelist;
-	slab->freelist = get_freepointer(s, object);
+	init_slab_obj_iter(s, slab, &iter, allow_spin);
+	object = next_slab_obj(s, &iter);
 	slab->inuse = 1;
 
+	needs_add_partial = (slab->objects > 1);
+	build_slab_freelist(s, slab, &iter);
+
 	if (!alloc_debug_processing(s, slab, object, orig_size)) {
 		/*
 		 * It's not really expected that this would fail on a
@@ -3696,22 +3686,30 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
 		 * corruption in theory could cause that.
 		 * Leak memory of allocated slab.
 		 */
-		if (!allow_spin)
-			spin_unlock_irqrestore(&n->list_lock, flags);
 		return NULL;
 	}
 
-	if (allow_spin)
+	n = get_node(s, slab_nid(slab));
+	if (allow_spin) {
 		spin_lock_irqsave(&n->list_lock, flags);
+	} else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
+		/*
+		 * Unlucky, discard newly allocated slab.
+		 * The slab is not fully free, but it's fine as
+		 * objects are not allocated to users.
+		 */
+		free_new_slab_nolock(s, slab);
+		return NULL;
+	}
 
-	if (slab->inuse == slab->objects)
-		add_full(s, n, slab);
-	else
+	if (needs_add_partial)
 		add_partial(n, slab, ADD_TO_HEAD);
+	else
+		add_full(s, n, slab);
 
-	inc_slabs_node(s, nid, slab->objects);
 	spin_unlock_irqrestore(&n->list_lock, flags);
 
+	inc_slabs_node(s, slab_nid(slab), slab->objects);
 	return object;
 }
 
@@ -4349,9 +4347,10 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
 {
 	unsigned int allocated = 0;
 	struct kmem_cache_node *n;
+	struct slab_obj_iter iter;
 	bool needs_add_partial;
 	unsigned long flags;
-	void *object;
+	unsigned int target_inuse;
 
 	/*
 	 * Are we going to put the slab on the partial list?
@@ -4359,33 +4358,30 @@ static unsigned int alloc_from_new_slab(struct kmem_cache *s, struct slab *slab,
 	 */
 	needs_add_partial = (slab->objects > count);
 
-	if (!allow_spin && needs_add_partial) {
+	/* Target inuse count after allocating from this new slab. */
+	target_inuse = needs_add_partial ? count : slab->objects;
 
-		n = get_node(s, slab_nid(slab));
-
-		if (!spin_trylock_irqsave(&n->list_lock, flags)) {
-			/* Unlucky, discard newly allocated slab */
-			free_new_slab_nolock(s, slab);
-			return 0;
-		}
-	}
-
-	object = slab->freelist;
-	while (object && allocated < count) {
-		p[allocated] = object;
-		object = get_freepointer(s, object);
-		maybe_wipe_obj_freeptr(s, p[allocated]);
+	init_slab_obj_iter(s, slab, &iter, allow_spin);
 
-		slab->inuse++;
+	while (allocated < target_inuse) {
+		p[allocated] = next_slab_obj(s, &iter);
 		allocated++;
 	}
-	slab->freelist = object;
+	slab->inuse = target_inuse;
+	build_slab_freelist(s, slab, &iter);
 
 	if (needs_add_partial) {
-
+		n = get_node(s, slab_nid(slab));
 		if (allow_spin) {
-			n = get_node(s, slab_nid(slab));
 			spin_lock_irqsave(&n->list_lock, flags);
+		} else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
+			/*
+			 * Unlucky, discard newly allocated slab.
+			 * The slab is not fully free, but it's fine as
+			 * objects are not allocated to users.
+			 */
+			free_new_slab_nolock(s, slab);
+			return 0;
 		}
 		add_partial(n, slab, ADD_TO_HEAD);
 		spin_unlock_irqrestore(&n->list_lock, flags);
@@ -4456,15 +4452,13 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	stat(s, ALLOC_SLAB);
 
 	if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) {
-		object = alloc_single_from_new_slab(s, slab, orig_size, gfpflags);
+		object = alloc_single_from_new_slab(s, slab, orig_size, allow_spin);
 
 		if (likely(object))
 			goto success;
 	} else {
-		alloc_from_new_slab(s, slab, &object, 1, allow_spin);
-
 		/* we don't need to check SLAB_STORE_USER here */
-		if (likely(object))
+		if (alloc_from_new_slab(s, slab, &object, 1, allow_spin))
 			return object;
 	}
 
@@ -7251,10 +7245,6 @@ refill_objects(struct kmem_cache *s, void **p, gfp_t gfp, unsigned int min,
 
 	stat(s, ALLOC_SLAB);
 
-	/*
-	 * TODO: possible optimization - if we know we will consume the whole
-	 * slab we might skip creating the freelist?
-	 */
 	refilled += alloc_from_new_slab(s, slab, p + refilled, max - refilled,
 					/* allow_spin = */ true);
 
@@ -7585,6 +7575,7 @@ static void early_kmem_cache_node_alloc(int node)
 {
 	struct slab *slab;
 	struct kmem_cache_node *n;
+	struct slab_obj_iter iter;
 
 	BUG_ON(kmem_cache_node->size < sizeof(struct kmem_cache_node));
 
@@ -7596,14 +7587,18 @@ static void early_kmem_cache_node_alloc(int node)
 		pr_err("SLUB: Allocating a useless per node structure in order to be able to continue\n");
 	}
 
-	n = slab->freelist;
+	init_slab_obj_iter(kmem_cache_node, slab, &iter, true);
+
+	n = next_slab_obj(kmem_cache_node, &iter);
 	BUG_ON(!n);
+
+	slab->inuse = 1;
+	build_slab_freelist(kmem_cache_node, slab, &iter);
+
 #ifdef CONFIG_SLUB_DEBUG
 	init_object(kmem_cache_node, n, SLUB_RED_ACTIVE);
 #endif
 	n = kasan_slab_alloc(kmem_cache_node, n, GFP_KERNEL, false);
-	slab->freelist = get_freepointer(kmem_cache_node, n);
-	slab->inuse = 1;
 	kmem_cache_node->per_node[node].node = n;
 	init_kmem_cache_node(n);
 	inc_slabs_node(kmem_cache_node, node, slab->objects);
-- 
2.25.1

[-- Attachment #1.1.2: Type: text/html , Size: 50875 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
  2026-04-09 12:43 [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab hu.shengming
@ 2026-04-13  3:45 ` Harry Yoo (Oracle)
  2026-04-13  5:14   ` hu.shengming
  0 siblings, 1 reply; 5+ messages in thread
From: Harry Yoo (Oracle) @ 2026-04-13  3:45 UTC (permalink / raw)
  To: hu.shengming
  Cc: vbabka, akpm, hao.li, cl, rientjes, roman.gushchin, linux-mm,
	linux-kernel, zhang.run, xu.xin16, yang.tao172, yang.yang29

On Thu, Apr 09, 2026 at 08:43:52PM +0800, hu.shengming@zte.com.cn wrote:
> From: Shengming Hu <hu.shengming@zte.com.cn>
> 
> Allocations from a fresh slab can consume all of its objects, and the
> freelist built during slab allocation is discarded immediately as a result.
> 
> Instead of special-casing the whole-slab bulk refill case, defer freelist
> construction until after objects are emitted from a fresh slab.
> new_slab() now only allocates the slab and initializes its metadata.
> refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
> emit objects directly, building a freelist only for the objects left
> unallocated; the same change is applied to alloc_single_from_new_slab().
> 
> To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
> small iterator abstraction for walking free objects in allocation order.
> The iterator is used both for filling the sheaf and for building the
> freelist of the remaining objects.
> 
> Also mark setup_object() inline. After this optimization, the compiler no
> longer consistently inlines this helper in the hot path, which can hurt
> performance. Explicitly marking it inline restores the expected code
> generation.
> 
> This reduces per-object overhead when allocating from a fresh slab.
> The most direct benefit is in the paths that allocate objects first and
> only build a freelist for the remainder afterward: bulk allocation from
> a new slab in refill_objects(), single-object allocation from a new slab
> in ___slab_alloc(), and the corresponding early-boot paths that now use
> the same deferred-freelist scheme. Since refill_objects() is also used to
> refill sheaves, the optimization is not limited to the small set of
> kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
> workloads may benefit as well when they refill from a fresh slab.
> 
> In slub_bulk_bench, the time per object drops by about 32% to 71% with
> CONFIG_SLAB_FREELIST_RANDOM=n, and by about 52% to 70% with
> CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
> cost removed by this change: each iteration allocates exactly
> slab->objects from a fresh slab. That makes it a near best-case scenario
> for deferred freelist construction, because the old path still built a
> full freelist even when no objects remained, while the new path avoids
> that work. Realistic workloads may see smaller end-to-end gains depending
> on how often allocations reach this fresh-slab refill path.
> 
> Benchmark results (slub_bulk_bench):
> Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
> Kernel: Linux 7.0.0-rc7-next-20260407
> Config: x86_64_defconfig
> Cpu: 0
> Rounds: 20
> Total: 256MB

[...]

Hi Shengming, it's been great to see how this patch has been improved
since v1 to where it is now. Thanks for taking the feedback and steadily
improving things along the way.

I think this is getting pretty close to being ready for mainline,
with just one little thing to fix in the code.

Other reviewers/maintainers may also take a look and leave comments
when they get a chance.

> Link: https://github.com/HSM6236/slub_bulk_test.git
> Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
> ---

If you think it's appropriate, please feel free to add:
Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>

In case this was assisted by AI or other tools, please disclose that
according to the process document:

  https://docs.kernel.org/process/generated-content.html
  https://docs.kernel.org/process/coding-assistants.html

Not that I think this was assisted by AI, just mentioning because
sometimes people using tools to develop the kernel are not aware that
they need to disclose the fact. It wouldn't hurt to remind people :-)

> Changes in v5:
> - Call build_slab_freelist() unconditionally, and remove the redundant "slab->freelist = NULL" initialization in allocate_slab().
> - Check the return value of alloc_from_new_slab() to prevent a potential use-after-free bug.
> - Refine the commit message with more precise test coverage descriptions.
> - Link to v4: https://lore.kernel.org/all/2026040823281824773ybHpC3kgUhR9OE1rGTl@zte.com.cn/
> 
> ---
>  mm/slab.h |  10 ++
>  mm/slub.c | 279 +++++++++++++++++++++++++++---------------------------
>  2 files changed, 147 insertions(+), 142 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 4927407c9699..9ff8af8c2f73 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3696,22 +3686,30 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
>  		 * corruption in theory could cause that.
>  		 * Leak memory of allocated slab.
>  		 */
> -		if (!allow_spin)
> -			spin_unlock_irqrestore(&n->list_lock, flags);
>  		return NULL;
>  	}
>  
> -	if (allow_spin)
> +	n = get_node(s, slab_nid(slab));
> +	if (allow_spin) {
>  		spin_lock_irqsave(&n->list_lock, flags);
> +	} else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> +		/*
> +		 * Unlucky, discard newly allocated slab.
> +		 * The slab is not fully free, but it's fine as
> +		 * objects are not allocated to users.
> +		 */
> +		free_new_slab_nolock(s, slab);
> +		return NULL;
> +	}
>  
> -	if (slab->inuse == slab->objects)
> -		add_full(s, n, slab);
> -	else
> +	if (needs_add_partial)
>  		add_partial(n, slab, ADD_TO_HEAD);
> +	else
> +		add_full(s, n, slab);
>  
> -	inc_slabs_node(s, nid, slab->objects);
>  	spin_unlock_irqrestore(&n->list_lock, flags);
>  
> +	inc_slabs_node(s, slab_nid(slab), slab->objects);

Ouch, I didn't catch this when it was added in v4. When slab debugging
feature is enabled for the cache, inc_slabs_node() should be done within
the spinlock to avoid race conditions with slab validation.

Perhaps it's worth adding a comment mentioning this :)

See commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
caches and make it safe") for more details.

With this fixed, please feel free to add:
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
  2026-04-13  3:45 ` Harry Yoo (Oracle)
@ 2026-04-13  5:14   ` hu.shengming
  2026-04-13  5:27     ` Harry Yoo (Oracle)
  0 siblings, 1 reply; 5+ messages in thread
From: hu.shengming @ 2026-04-13  5:14 UTC (permalink / raw)
  To: harry
  Cc: vbabka, akpm, hao.li, cl, rientjes, roman.gushchin, linux-mm,
	linux-kernel, zhang.run, xu.xin16, yang.tao172, yang.yang29

Harry wrote:
> On Thu, Apr 09, 2026 at 08:43:52PM +0800, hu.shengming@zte.com.cn wrote:
> > From: Shengming Hu <hu.shengming@zte.com.cn>
> > 
> > Allocations from a fresh slab can consume all of its objects, and the
> > freelist built during slab allocation is discarded immediately as a result.
> > 
> > Instead of special-casing the whole-slab bulk refill case, defer freelist
> > construction until after objects are emitted from a fresh slab.
> > new_slab() now only allocates the slab and initializes its metadata.
> > refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
> > emit objects directly, building a freelist only for the objects left
> > unallocated; the same change is applied to alloc_single_from_new_slab().
> > 
> > To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
> > small iterator abstraction for walking free objects in allocation order.
> > The iterator is used both for filling the sheaf and for building the
> > freelist of the remaining objects.
> > 
> > Also mark setup_object() inline. After this optimization, the compiler no
> > longer consistently inlines this helper in the hot path, which can hurt
> > performance. Explicitly marking it inline restores the expected code
> > generation.
> > 
> > This reduces per-object overhead when allocating from a fresh slab.
> > The most direct benefit is in the paths that allocate objects first and
> > only build a freelist for the remainder afterward: bulk allocation from
> > a new slab in refill_objects(), single-object allocation from a new slab
> > in ___slab_alloc(), and the corresponding early-boot paths that now use
> > the same deferred-freelist scheme. Since refill_objects() is also used to
> > refill sheaves, the optimization is not limited to the small set of
> > kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
> > workloads may benefit as well when they refill from a fresh slab.
> > 
> > In slub_bulk_bench, the time per object drops by about 32% to 71% with
> > CONFIG_SLAB_FREELIST_RANDOM=n, and by about 52% to 70% with
> > CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
> > cost removed by this change: each iteration allocates exactly
> > slab->objects from a fresh slab. That makes it a near best-case scenario
> > for deferred freelist construction, because the old path still built a
> > full freelist even when no objects remained, while the new path avoids
> > that work. Realistic workloads may see smaller end-to-end gains depending
> > on how often allocations reach this fresh-slab refill path.
> > 
> > Benchmark results (slub_bulk_bench):
> > Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
> > Kernel: Linux 7.0.0-rc7-next-20260407
> > Config: x86_64_defconfig
> > Cpu: 0
> > Rounds: 20
> > Total: 256MB
> 
> [...]
> 
> Hi Shengming, it's been great to see how this patch has been improved
> since v1 to where it is now. Thanks for taking the feedback and steadily
> improving things along the way.
> 

Hi Harry,

Thank you very much for your helpful reviews and suggestions from v1 through v5.
I really appreciate your patience and professionalism throughout the review process,
and I have learned a lot from your feedback.

> I think this is getting pretty close to being ready for mainline,
> with just one little thing to fix in the code.
> 
> Other reviewers/maintainers may also take a look and leave comments
> when they get a chance.
> 

I am also looking forward to any further comments or suggestions from
other reviewers and maintainers.

> > Link: https://github.com/HSM6236/slub_bulk_test.git
> > Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
> > ---
> 
> If you think it's appropriate, please feel free to add:
> Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>
> 

Sure, I will add:

  Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>

Thanks again for your continued review and guidance.

> In case this was assisted by AI or other tools, please disclose that
> according to the process document:
> 
>   https://docs.kernel.org/process/generated-content.html
>   https://docs.kernel.org/process/coding-assistants.html
> 
> Not that I think this was assisted by AI, just mentioning because
> sometimes people using tools to develop the kernel are not aware that
> they need to disclose the fact. It wouldn't hurt to remind people :-)
> 

Regarding AI disclosure: I only used an AI tool to polish the English wording
of the commit message, since I am not fully confident in my English writing. :-)

As I understand it, the documentation says that "spelling and grammar fix ups,
like rephrasing to imperative voice" are out of scope, so I believe an
Assisted-by tag is not needed in this case. Please let me know if you think otherwise.

> > Changes in v5:
> > - Call build_slab_freelist() unconditionally, and remove the redundant "slab->freelist = NULL" initialization in allocate_slab().
> > - Check the return value of alloc_from_new_slab() to prevent a potential use-after-free bug.
> > - Refine the commit message with more precise test coverage descriptions.
> > - Link to v4: https://lore.kernel.org/all/2026040823281824773ybHpC3kgUhR9OE1rGTl@zte.com.cn/
> > 
> > ---
> >  mm/slab.h |  10 ++
> >  mm/slub.c | 279 +++++++++++++++++++++++++++---------------------------
> >  2 files changed, 147 insertions(+), 142 deletions(-)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 4927407c9699..9ff8af8c2f73 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3696,22 +3686,30 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> >           * corruption in theory could cause that.
> >           * Leak memory of allocated slab.
> >           */
> > -        if (!allow_spin)
> > -            spin_unlock_irqrestore(&n->list_lock, flags);
> >          return NULL;
> >      }
> >  
> > -    if (allow_spin)
> > +    n = get_node(s, slab_nid(slab));
> > +    if (allow_spin) {
> >          spin_lock_irqsave(&n->list_lock, flags);
> > +    } else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> > +        /*
> > +         * Unlucky, discard newly allocated slab.
> > +         * The slab is not fully free, but it's fine as
> > +         * objects are not allocated to users.
> > +         */
> > +        free_new_slab_nolock(s, slab);
> > +        return NULL;
> > +    }
> >  
> > -    if (slab->inuse == slab->objects)
> > -        add_full(s, n, slab);
> > -    else
> > +    if (needs_add_partial)
> >          add_partial(n, slab, ADD_TO_HEAD);
> > +    else
> > +        add_full(s, n, slab);
> >  
> > -    inc_slabs_node(s, nid, slab->objects);
> >      spin_unlock_irqrestore(&n->list_lock, flags);
> >  
> > +    inc_slabs_node(s, slab_nid(slab), slab->objects);
> 
> Ouch, I didn't catch this when it was added in v4. When slab debugging
> feature is enabled for the cache, inc_slabs_node() should be done within
> the spinlock to avoid race conditions with slab validation.
> 
> Perhaps it's worth adding a comment mentioning this :)
> 
> See commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> caches and make it safe") for more details.
> 
> With this fixed, please feel free to add:
> Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> 

You are right about the inc_slabs_node() placement. I missed that change when
it was introduced in v4. Thank you very much for catching it.

After reading commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
caches and make it safe"), my understanding is that inc_slabs_node() should
remain under n->list_lock for debug caches, so that validation cannot observe
inconsistent state during list transitions. I will fix that in the next revision
and add a comment along these lines.

Would a comment like the following look good? :-)

/*
 * Debug caches require nr_slabs updates under n->list_lock so validation
 * cannot race with list transitions and observe inconsistent state.
 */

Thank you again for the careful review.

-- 
Cheers,
Shengming

> -- 
> Cheers,
> Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
  2026-04-13  5:14   ` hu.shengming
@ 2026-04-13  5:27     ` Harry Yoo (Oracle)
  2026-04-13  6:20       ` hu.shengming
  0 siblings, 1 reply; 5+ messages in thread
From: Harry Yoo (Oracle) @ 2026-04-13  5:27 UTC (permalink / raw)
  To: hu.shengming
  Cc: vbabka, akpm, hao.li, cl, rientjes, roman.gushchin, linux-mm,
	linux-kernel, zhang.run, xu.xin16, yang.tao172, yang.yang29

On Mon, Apr 13, 2026 at 01:14:23PM +0800, hu.shengming@zte.com.cn wrote:
> Harry wrote:
> > On Thu, Apr 09, 2026 at 08:43:52PM +0800, hu.shengming@zte.com.cn wrote:
> > > From: Shengming Hu <hu.shengming@zte.com.cn>
> > > 
> > > Allocations from a fresh slab can consume all of its objects, and the
> > > freelist built during slab allocation is discarded immediately as a result.
> > > 
> > > Instead of special-casing the whole-slab bulk refill case, defer freelist
> > > construction until after objects are emitted from a fresh slab.
> > > new_slab() now only allocates the slab and initializes its metadata.
> > > refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
> > > emit objects directly, building a freelist only for the objects left
> > > unallocated; the same change is applied to alloc_single_from_new_slab().
> > > 
> > > To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
> > > small iterator abstraction for walking free objects in allocation order.
> > > The iterator is used both for filling the sheaf and for building the
> > > freelist of the remaining objects.
> > > 
> > > Also mark setup_object() inline. After this optimization, the compiler no
> > > longer consistently inlines this helper in the hot path, which can hurt
> > > performance. Explicitly marking it inline restores the expected code
> > > generation.
> > > 
> > > This reduces per-object overhead when allocating from a fresh slab.
> > > The most direct benefit is in the paths that allocate objects first and
> > > only build a freelist for the remainder afterward: bulk allocation from
> > > a new slab in refill_objects(), single-object allocation from a new slab
> > > in ___slab_alloc(), and the corresponding early-boot paths that now use
> > > the same deferred-freelist scheme. Since refill_objects() is also used to
> > > refill sheaves, the optimization is not limited to the small set of
> > > kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
> > > workloads may benefit as well when they refill from a fresh slab.
> > > 
> > > In slub_bulk_bench, the time per object drops by about 32% to 71% with
> > > CONFIG_SLAB_FREELIST_RANDOM=n, and by about 52% to 70% with
> > > CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
> > > cost removed by this change: each iteration allocates exactly
> > > slab->objects from a fresh slab. That makes it a near best-case scenario
> > > for deferred freelist construction, because the old path still built a
> > > full freelist even when no objects remained, while the new path avoids
> > > that work. Realistic workloads may see smaller end-to-end gains depending
> > > on how often allocations reach this fresh-slab refill path.
> > > 
> > > Benchmark results (slub_bulk_bench):
> > > Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
> > > Kernel: Linux 7.0.0-rc7-next-20260407
> > > Config: x86_64_defconfig
> > > Cpu: 0
> > > Rounds: 20
> > > Total: 256MB
> > 
> > [...]
> > 
> > Hi Shengming, it's been great to see how this patch has been improved
> > since v1 to where it is now. Thanks for taking the feedback and steadily
> > improving things along the way.
> > 
> 
> Hi Harry,
> 
> Thank you very much for your helpful reviews and suggestions from v1 through v5.
> I really appreciate your patience and professionalism throughout the review process,
> and I have learned a lot from your feedback.

You're welcome!

> > I think this is getting pretty close to being ready for mainline,
> > with just one little thing to fix in the code.
> > 
> > Other reviewers/maintainers may also take a look and leave comments
> > when they get a chance.
> > 
> 
> I am also looking forward to any further comments or suggestions from
> other reviewers and maintainers.
> 
> > > Link: https://github.com/HSM6236/slub_bulk_test.git
> > > Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
> > > ---
> > 
> > If you think it's appropriate, please feel free to add:
> > Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>
> > 
> 
> Sure, I will add:
> 
>   Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>

Thanks, and

> Thanks again for your continued review and guidance.

no problem!

> > In case this was assisted by AI or other tools, please disclose that
> > according to the process document:
> > 
> >   https://docs.kernel.org/process/generated-content.html
> >   https://docs.kernel.org/process/coding-assistants.html
> > 
> > Not that I think this was assisted by AI, just mentioning because
> > sometimes people using tools to develop the kernel are not aware that
> > they need to disclose the fact. It wouldn't hurt to remind people :-)
> > 
> 
> Regarding AI disclosure: I only used an AI tool to polish the English wording
> of the commit message, since I am not fully confident in my English writing. :-)

Haha, that's fine and I'm not a native english speaker either :)

> As I understand it, the documentation says that "spelling and grammar fix ups,
> like rephrasing to imperative voice" are out of scope, so I believe an
> Assisted-by tag is not needed in this case.

You are right. In that case it's not necessary.
Thanks for clarifying!

> > > Changes in v5:
> > > - Call build_slab_freelist() unconditionally, and remove the redundant "slab->freelist = NULL" initialization in allocate_slab().
> > > - Check the return value of alloc_from_new_slab() to prevent a potential use-after-free bug.
> > > - Refine the commit message with more precise test coverage descriptions.
> > > - Link to v4: https://lore.kernel.org/all/2026040823281824773ybHpC3kgUhR9OE1rGTl@zte.com.cn/
> > > 
> > > ---
> > >  mm/slab.h |  10 ++
> > >  mm/slub.c | 279 +++++++++++++++++++++++++++---------------------------
> > >  2 files changed, 147 insertions(+), 142 deletions(-)
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 4927407c9699..9ff8af8c2f73 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3696,22 +3686,30 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
> > >           * corruption in theory could cause that.
> > >           * Leak memory of allocated slab.
> > >           */
> > > -        if (!allow_spin)
> > > -            spin_unlock_irqrestore(&n->list_lock, flags);
> > >          return NULL;
> > >      }
> > >  
> > > -    if (allow_spin)
> > > +    n = get_node(s, slab_nid(slab));
> > > +    if (allow_spin) {
> > >          spin_lock_irqsave(&n->list_lock, flags);
> > > +    } else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> > > +        /*
> > > +         * Unlucky, discard newly allocated slab.
> > > +         * The slab is not fully free, but it's fine as
> > > +         * objects are not allocated to users.
> > > +         */
> > > +        free_new_slab_nolock(s, slab);
> > > +        return NULL;
> > > +    }
> > >  
> > > -    if (slab->inuse == slab->objects)
> > > -        add_full(s, n, slab);
> > > -    else
> > > +    if (needs_add_partial)
> > >          add_partial(n, slab, ADD_TO_HEAD);
> > > +    else
> > > +        add_full(s, n, slab);
> > >  
> > > -    inc_slabs_node(s, nid, slab->objects);
> > >      spin_unlock_irqrestore(&n->list_lock, flags);
> > >  
> > > +    inc_slabs_node(s, slab_nid(slab), slab->objects);
> > 
> > Ouch, I didn't catch this when it was added in v4. When slab debugging
> > feature is enabled for the cache, inc_slabs_node() should be done within
> > the spinlock to avoid race conditions with slab validation.
> > 
> > Perhaps it's worth adding a comment mentioning this :)
> > 
> > See commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> > caches and make it safe") for more details.
> > 
> > With this fixed, please feel free to add:
> > Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> > 
> 
> You are right about the inc_slabs_node() placement. I missed that change when
> it was introduced in v4. Thank you very much for catching it.
> 
> After reading commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> caches and make it safe"), my understanding is that inc_slabs_node() should
> remain under n->list_lock for debug caches, so that validation cannot observe
> inconsistent state during list transitions. I will fix that in the next revision
> and add a comment along these lines.
> 
> Would a comment like the following look good? :-)
> 
> /*
>  * Debug caches require nr_slabs updates under n->list_lock so validation
>  * cannot race with list transitions and observe inconsistent state.
>  */

Mostly LGTM, but perhaps it's better to say
"validation cannot race with slab (de)allocations"
because it's not about transitions between lists,
but rather about slabs being added to or removed from the node while
validation code is iterating over partial and full lists.

Thanks!

-- 
Cheers,
 Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
  2026-04-13  5:27     ` Harry Yoo (Oracle)
@ 2026-04-13  6:20       ` hu.shengming
  0 siblings, 0 replies; 5+ messages in thread
From: hu.shengming @ 2026-04-13  6:20 UTC (permalink / raw)
  To: harry
  Cc: vbabka, akpm, hao.li, cl, rientjes, roman.gushchin, linux-mm,
	linux-kernel, zhang.run, xu.xin16, yang.tao172, yang.yang29

Harry wrote:
> On Mon, Apr 13, 2026 at 01:14:23PM +0800, hu.shengming@zte.com.cn wrote:
> > Harry wrote:
> > > On Thu, Apr 09, 2026 at 08:43:52PM +0800, hu.shengming@zte.com.cn wrote:
> > > > From: Shengming Hu <hu.shengming@zte.com.cn>
> > > > [...]
> > > > -    inc_slabs_node(s, nid, slab->objects);
> > > >      spin_unlock_irqrestore(&n->list_lock, flags);
> > > >  
> > > > +    inc_slabs_node(s, slab_nid(slab), slab->objects);
> > > 
> > > Ouch, I didn't catch this when it was added in v4. When slab debugging
> > > feature is enabled for the cache, inc_slabs_node() should be done within
> > > the spinlock to avoid race conditions with slab validation.
> > > 
> > > Perhaps it's worth adding a comment mentioning this :)
> > > 
> > > See commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> > > caches and make it safe") for more details.
> > > 
> > > With this fixed, please feel free to add:
> > > Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
> > > 
> > 
> > You are right about the inc_slabs_node() placement. I missed that change when
> > it was introduced in v4. Thank you very much for catching it.
> > 
> > After reading commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
> > caches and make it safe"), my understanding is that inc_slabs_node() should
> > remain under n->list_lock for debug caches, so that validation cannot observe
> > inconsistent state during list transitions. I will fix that in the next revision
> > and add a comment along these lines.
> > 
> > Would a comment like the following look good? :-)
> > 
> > /*
> >  * Debug caches require nr_slabs updates under n->list_lock so validation
> >  * cannot race with list transitions and observe inconsistent state.
> >  */
> 
> Mostly LGTM, but perhaps it's better to say
> "validation cannot race with slab (de)allocations"
> because it's not about transitions between lists,
> but rather about slabs being added to or removed from the node while
> validation code is iterating over partial and full lists.
> 
> Thanks!
> 

Thanks, that makes sense.

You're right that the issue is better described in terms of slab
(de)allocations rather than list transitions.

I'll update the comment.

--
Cheers,
Shengming


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-13  6:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-09 12:43 [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab hu.shengming
2026-04-13  3:45 ` Harry Yoo (Oracle)
2026-04-13  5:14   ` hu.shengming
2026-04-13  5:27     ` Harry Yoo (Oracle)
2026-04-13  6:20       ` hu.shengming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox