From: Vlastimil Babka <vbabka@suse.cz>
To: Harry Yoo <harry.yoo@oracle.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Mateusz Guzik <mjguzik@gmail.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-kernel@vger.kernel.org,
"Christoph Lameter (Ampere)" <cl@gentwo.org>
Subject: Re: [RFC PATCH 01/15] static kmem_cache instances for core caches
Date: Thu, 15 Jan 2026 17:59:12 +0100 [thread overview]
Message-ID: <19e0c58f-114c-4bbd-9bc0-25382d7d5cbb@suse.cz> (raw)
In-Reply-To: <aWdGEI6iQBl3Xibi@hyeyoo>
On 1/14/26 08:30, Harry Yoo wrote:
> On Sat, Jan 10, 2026 at 04:02:03AM +0000, Al Viro wrote:
>> kmem_cache_create() and friends create new instances of
>> struct kmem_cache and return pointers to those. Quite a few things in
>> core kernel are allocated from such caches; each allocation involves
>> dereferencing an assign-once pointer and for sufficiently hot ones that
>> dereferencing does show in profiles.
>>
>> There had been patches floating around switching some of those
>> to runtime_const infrastructure. Unfortunately, it's arch-specific
>> and most of the architectures lack it.
>>
>> There's an alternative approach applicable at least to the caches
>> that are never destroyed, which covers a lot of them. No matter what,
>> runtime_const for pointers is not going to be faster than plain &,
>> so if we had struct kmem_cache instances with static storage duration, we
>> would be at least no worse off than we are with runtime_const variants.
>>
>> There are obstacles to doing that, but they turn out to be easy
>> to deal with.
>>
>> 1) as it is, struct kmem_cache is opaque for anything outside of a few
>> files in mm/*; that avoids serious headache with header dependencies,
>> etc., and it's not something we want to lose. Solution: struct
>> kmem_cache_opaque, with the size and alignment identical to struct
>> kmem_cache. Calculation of size and alignment can be done via the same
>> mechanism we use for asm-offsets.h and rq-offsets.h, with build-time
>> check for mismatches. With that done, we get an opaque type defined in
>> linux/slab-static.h that can be used for declaring those caches.
>> In linux/slab.h we add a forward declaration of kmem_cache_opaque +
>> helper (to_kmem_cache()) converting a pointer to kmem_cache_opaque
>> into pointer to kmem_cache.
>>
>> 2) real constructor of kmem_cache needs to be taught to deal with
>> preallocated instances. That turns out to be easy - we already pass an
>> obscene amount of optional arguments via struct kmem_cache_args, so we
>> can stash the pointer to preallocated instance in there. Changes in
>> mm/slab_common.c are very minor - we should treat preallocated caches
>> as unmergable, use the instance passed to us instead of allocating a
>> new one and we should not free them. That's it.
>
> SLAB_NO_MERGE prevents both side of merging - when 1) creating the cache,
> and when 2) another cache tries to create an alias from it.
>
> Avoiding 1) makes sense, but is there a reason to prevent 2)?
>
> If it's fine for other caches to merge into a cache with static
> duration, then it's sufficient to update find_mergeable() to not attempt
> creating an alias during cache creation if args->preallocated is
> specified (instead of using SLAB_NO_MERGE).
The merging prevention is my biggest concern with the approach. We could
potentially solve it by moving the sharing to a different layer than today's
sharing of kmem_cache objects with refcount, and instead have separate
instances that point to the same underlying storage (mainly the per-node and
per-cpu slabs/sheaves). It's possible it would also simplify the suboptimal
sysfs handling of today as the aliases could know their cache name and own
their symlinks.
However slabs and sheaves do have a parent kmem_cache pointer. It's how e.g.
kfree() works by virt_to_slab(obj) -> kmem_cache and then being like
kmem_cache_free().
So we could have kmem_cache->primary_cache field where the primary would
just point to self and aliasing caches to the primary, and newly created
slabs and sheaves would read that ->primary_cache to assign their kmem_cache
pointer. This is not a fasthpath operation so it shouldn't matter, and with
that there wouldn't be any mix of differing cache pointers so the aliases
could be destroyed easily. And then the primary cache wouldn't be able go
away as long as there are aliases, as it is today.
Only a dynamic cache or a non-module static cache thus could become a
primary, for module unload reasons.
For this to work fully mergeable in all scenarios of the order of creating
static vs dynamic aliases, there would however have to be a weird quirk for
static module caches - when such a cache is created, and there's no
compatible primary to become alias of, a dynamic, otherwise unused primary
would need to be created just to become the owner of the slabs and sheaves.
Because if a mergeable dynamic cache appears later, it would not be able to
become a primary for the static module cache to become alias of, because the
static module cache would already have existing slabs and sheaves pointing
to it.
And there might be other issues with this scheme I don't immediately see.
But maybe it's feasible.
next prev parent reply other threads:[~2026-01-15 16:59 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-10 4:02 [RFC PATCH 00/15] kmem_cache instances with static storage duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 01/15] static kmem_cache instances for core caches Al Viro
2026-01-10 5:40 ` Matthew Wilcox
2026-01-10 6:23 ` Al Viro
2026-01-14 7:30 ` Harry Yoo
2026-01-14 7:38 ` Al Viro
2026-01-15 16:59 ` Vlastimil Babka [this message]
2026-01-10 4:02 ` [RFC PATCH 02/15] allow static-duration kmem_cache in modules Al Viro
2026-01-10 4:02 ` [RFC PATCH 03/15] make mnt_cache static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 04/15] turn thread_cache static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 05/15] turn signal_cache static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 06/15] turn bh_cachep static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 07/15] turn dentry_cache static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 08/15] turn files_cachep static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 09/15] make filp and bfilp caches static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 10/15] turn sighand_cache static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 11/15] turn mm_cachep static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 12/15] turn task_struct_cachep static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 13/15] turn fs_cachep static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 14/15] turn inode_cachep static-duration Al Viro
2026-01-10 4:02 ` [RFC PATCH 15/15] turn ufs_inode_cache static-duration Al Viro
2026-01-10 5:33 ` [RFC PATCH 00/15] kmem_cache instances with static storage duration Linus Torvalds
2026-01-10 6:16 ` Al Viro
2026-01-14 7:12 ` Harry Yoo
2026-01-15 0:46 ` Christoph Lameter (Ampere)
2026-01-15 2:08 ` Al Viro
2026-01-15 19:10 ` Christoph Lameter (Ampere)
2026-01-15 19:44 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=19e0c58f-114c-4bbd-9bc0-25382d7d5cbb@suse.cz \
--to=vbabka@suse.cz \
--cc=brauner@kernel.org \
--cc=cl@gentwo.org \
--cc=harry.yoo@oracle.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox