linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: hu.shengming@zte.com.cn
Cc: vbabka@kernel.org, akpm@linux-foundation.org, hao.li@linux.dev,
	cl@gentwo.org, rientjes@google.com, roman.gushchin@linux.dev,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	zhang.run@zte.com.cn, xu.xin16@zte.com.cn,
	yang.tao172@zte.com.cn, yang.yang29@zte.com.cn
Subject: Re: [PATCH v5] mm/slub: defer freelist construction until after bulk allocation from a new slab
Date: Mon, 13 Apr 2026 12:45:41 +0900	[thread overview]
Message-ID: <adxm5az9EfHr2aYg@hyeyoo> (raw)
In-Reply-To: <20260409204352095kKWVYKtZImN59ybO6iRNj@zte.com.cn>

On Thu, Apr 09, 2026 at 08:43:52PM +0800, hu.shengming@zte.com.cn wrote:
> From: Shengming Hu <hu.shengming@zte.com.cn>
> 
> Allocations from a fresh slab can consume all of its objects, and the
> freelist built during slab allocation is discarded immediately as a result.
> 
> Instead of special-casing the whole-slab bulk refill case, defer freelist
> construction until after objects are emitted from a fresh slab.
> new_slab() now only allocates the slab and initializes its metadata.
> refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
> emit objects directly, building a freelist only for the objects left
> unallocated; the same change is applied to alloc_single_from_new_slab().
> 
> To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
> small iterator abstraction for walking free objects in allocation order.
> The iterator is used both for filling the sheaf and for building the
> freelist of the remaining objects.
> 
> Also mark setup_object() inline. After this optimization, the compiler no
> longer consistently inlines this helper in the hot path, which can hurt
> performance. Explicitly marking it inline restores the expected code
> generation.
> 
> This reduces per-object overhead when allocating from a fresh slab.
> The most direct benefit is in the paths that allocate objects first and
> only build a freelist for the remainder afterward: bulk allocation from
> a new slab in refill_objects(), single-object allocation from a new slab
> in ___slab_alloc(), and the corresponding early-boot paths that now use
> the same deferred-freelist scheme. Since refill_objects() is also used to
> refill sheaves, the optimization is not limited to the small set of
> kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
> workloads may benefit as well when they refill from a fresh slab.
> 
> In slub_bulk_bench, the time per object drops by about 32% to 71% with
> CONFIG_SLAB_FREELIST_RANDOM=n, and by about 52% to 70% with
> CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
> cost removed by this change: each iteration allocates exactly
> slab->objects from a fresh slab. That makes it a near best-case scenario
> for deferred freelist construction, because the old path still built a
> full freelist even when no objects remained, while the new path avoids
> that work. Realistic workloads may see smaller end-to-end gains depending
> on how often allocations reach this fresh-slab refill path.
> 
> Benchmark results (slub_bulk_bench):
> Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
> Kernel: Linux 7.0.0-rc7-next-20260407
> Config: x86_64_defconfig
> Cpu: 0
> Rounds: 20
> Total: 256MB

[...]

Hi Shengming, it's been great to see how this patch has been improved
since v1 to where it is now. Thanks for taking the feedback and steadily
improving things along the way.

I think this is getting pretty close to being ready for mainline,
with just one little thing to fix in the code.

Other reviewers/maintainers may also take a look and leave comments
when they get a chance.

> Link: https://github.com/HSM6236/slub_bulk_test.git
> Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
> ---

If you think it's appropriate, please feel free to add:
Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>

In case this was assisted by AI or other tools, please disclose that
according to the process document:

  https://docs.kernel.org/process/generated-content.html
  https://docs.kernel.org/process/coding-assistants.html

Not that I think this was assisted by AI, just mentioning because
sometimes people using tools to develop the kernel are not aware that
they need to disclose the fact. It wouldn't hurt to remind people :-)

> Changes in v5:
> - Call build_slab_freelist() unconditionally, and remove the redundant "slab->freelist = NULL" initialization in allocate_slab().
> - Check the return value of alloc_from_new_slab() to prevent a potential use-after-free bug.
> - Refine the commit message with more precise test coverage descriptions.
> - Link to v4: https://lore.kernel.org/all/2026040823281824773ybHpC3kgUhR9OE1rGTl@zte.com.cn/
> 
> ---
>  mm/slab.h |  10 ++
>  mm/slub.c | 279 +++++++++++++++++++++++++++---------------------------
>  2 files changed, 147 insertions(+), 142 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 4927407c9699..9ff8af8c2f73 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3696,22 +3686,30 @@ static void *alloc_single_from_new_slab(struct kmem_cache *s, struct slab *slab,
>  		 * corruption in theory could cause that.
>  		 * Leak memory of allocated slab.
>  		 */
> -		if (!allow_spin)
> -			spin_unlock_irqrestore(&n->list_lock, flags);
>  		return NULL;
>  	}
>  
> -	if (allow_spin)
> +	n = get_node(s, slab_nid(slab));
> +	if (allow_spin) {
>  		spin_lock_irqsave(&n->list_lock, flags);
> +	} else if (!spin_trylock_irqsave(&n->list_lock, flags)) {
> +		/*
> +		 * Unlucky, discard newly allocated slab.
> +		 * The slab is not fully free, but it's fine as
> +		 * objects are not allocated to users.
> +		 */
> +		free_new_slab_nolock(s, slab);
> +		return NULL;
> +	}
>  
> -	if (slab->inuse == slab->objects)
> -		add_full(s, n, slab);
> -	else
> +	if (needs_add_partial)
>  		add_partial(n, slab, ADD_TO_HEAD);
> +	else
> +		add_full(s, n, slab);
>  
> -	inc_slabs_node(s, nid, slab->objects);
>  	spin_unlock_irqrestore(&n->list_lock, flags);
>  
> +	inc_slabs_node(s, slab_nid(slab), slab->objects);

Ouch, I didn't catch this when it was added in v4. When slab debugging
feature is enabled for the cache, inc_slabs_node() should be done within
the spinlock to avoid race conditions with slab validation.

Perhaps it's worth adding a comment mentioning this :)

See commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug
caches and make it safe") for more details.

With this fixed, please feel free to add:
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>

-- 
Cheers,
Harry / Hyeonggon


  reply	other threads:[~2026-04-13  3:45 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 12:43 hu.shengming
2026-04-13  3:45 ` Harry Yoo (Oracle) [this message]
2026-04-13  5:14   ` hu.shengming
2026-04-13  5:27     ` Harry Yoo (Oracle)
2026-04-13  6:20       ` hu.shengming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adxm5az9EfHr2aYg@hyeyoo \
    --to=harry@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=hu.shengming@zte.com.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=vbabka@kernel.org \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.tao172@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    --cc=zhang.run@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox