From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2287FD1CDAB for ; Wed, 3 Dec 2025 23:23:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E4916B002A; Wed, 3 Dec 2025 18:23:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6953A6B002B; Wed, 3 Dec 2025 18:23:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AAF06B002C; Wed, 3 Dec 2025 18:23:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 48B176B002A for ; Wed, 3 Dec 2025 18:23:56 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 998A41407FB for ; Wed, 3 Dec 2025 23:23:53 +0000 (UTC) X-FDA: 84179739546.17.5FAFD7F Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) by imf07.hostedemail.com (Postfix) with ESMTP id B3D9940002 for ; Wed, 3 Dec 2025 23:23:51 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b=B8349ss5; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk; spf=none (imf07.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1764804232; a=rsa-sha256; cv=none; b=jXul1ivKMitWSNjzfk+iBMZ7Em0tTnVPwIrWonBNSSvcWIC9CbGrvsdiJ8pxsvx35GDKgT WRY9eInpKf1qxZRrQrjFGernv9YPL+9PIevqBM2ixA7jk0JvwKiAF95VhMJFkDHtjpiE3Y isiQnWcJYckqdRipMudMJ1e+9NButUc= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b=B8349ss5; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk; spf=none (imf07.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1764804232; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=vBvkSuFDgJD3CUJ+IXbZLR1pyMERqJBhIZl55Uq7cUo=; b=j6uIpiYgebfH1kgowAFNux9btq28cgc8AHeewSPcud/Gd7dIRnXHlfrcL3OOm8MJklSaCY kl1Cd9neC9bt0vu0JHrdQvf1XLsQbD20n9tCfD3/jtA/K6kaGHC/azUZ9WjTrX+q4kl/XP O99YiG3IBgIJPh54frlK0ROeqXwvDZM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:Content-Type:MIME-Version: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To:References; bh=vBvkSuFDgJD3CUJ+IXbZLR1pyMERqJBhIZl55Uq7cUo=; b=B8349ss5wp7bRyLmDKwTglBcoY WoEpAiPcVXehN49DqsFj+4YVFWdRD7XlGs4bmJMN9x7UbtygYtlJ0GBP5UtKxTd2QMLRe/6UpRgpL OubHQx3i1pcT0c2KUc9JKJagp/HUK9vB3Z5DnAeXVp7UXekCnkc4vuOiqj4MjUF1+WCr/Ewk4g0as KUT0DvGjXj5madgB8Njyga1PMqkRcifmScI0sg3f+iaTgn77X1UnjwvEhIJBl8t/9EUl4fI6n/mVy jtxHRK80ZwuPiYmg9bSAnPE0uPMxdQkOe4GE81XQRkdZqykCNlg/b/olwB4fdmVG8RHRbVTWiKBUx mkCXbMgw==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.99 #2 (Red Hat Linux)) id 1vQwCc-00000000rpQ-0WeQ; Wed, 03 Dec 2025 23:24:06 +0000 Date: Wed, 3 Dec 2025 23:24:06 +0000 From: Al Viro To: Linus Torvalds Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: [PATCH][RFC][slab] static kmem_cache instances for core caches Message-ID: <20251203232406.GH1712166@ZenIV> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B3D9940002 X-Stat-Signature: rex96jeuhxi4e7dt7i9fpoeinpje9d9g X-Rspam-User: X-HE-Tag: 1764804231-148106 X-HE-Meta: U2FsdGVkX1/IXEhBmqcCPJh2Wf+W5jZQYTb+K7BfwcM96ShM8stYkMPMsNFuHmgbchF1+6GphK4XgIXjxachX6mDLXb+bc4t8dq/T2JiJyzBwKjHdsJQLoyJr40wyegpLVUdROcHKmeD4cnh1pBn29ZFICsWKC7MKuwM1oRosPFFnQUbpeVBMgHD+F7em2sUAhTHZIUgWsZJki9dIEvhsUPtUXnHSu13YsyoRYetD5Ros3KquOky/0D8q11ycdQ/oIOtqQC89O5MptAW8Mnr7oKXBMImpnxb1nnndFxB4vCTeEoNNqfqgcjHlqcjQOFMkgpALlhuXlfZftWNsZnz6tCqHD+lod7R2H1VEJIFP1EDacZq7syBD9a7ug+5WJ9/euLKVlaEsiDUUUJ197dWEz/RNcTSUWC9XOOO8vKupvT16RRzEJ/kmjxfKYten0CTUAKiboQ8GSHmDqsBTTpF7oTIgWSaXNu2F8HE4nN8eiJ9IvI68jM9Ejh5lro0FGxEx/HWoGJVwAi2c/GSXdsXUP0RIt5fvs3yqwWrMO5/TJvYRdQklgeww/y3XPfM5XlncL0MNPuIbZhEVg2OvN2xEoBjVF3VP9waK6+hDHv5/szAvVz/35fPKOGrvYYj0HYTjZXE82VZ5HY9M+Nm1jtVELi/rUzZ7zTEc2vHHFlqLvGgUAzYAnzTOiWFiSk8U80NdlHYjxmjfjMQRHvlTIfz7xryLXlAufXhs1J7umu5p+iBklJ1HjO9xYK5wNLd559cRnTNtes5nmVR4/mhJVJ3HPC55+iyk9FRUkHJj1Qg+gBhl6FDov7xUCR3HIHMkwpyfOdJWJjPg40RG30OYsMHyDp81lCesm3DVfDUyS0MpGhnSRuC6+aNYfcQl9hN+W2ZKyUTaVc8r3QBY1IqmIvz1G3r/bAIIH6rm8aTNS3dZ2z3yS4zi7HR3su+6HMDxS3vMerbLhpnrvCyEzt9vwX Wopt+QqC Xlo5oOJqnhlsP0KQTbrKDfAz2ViyqoZyMeaNBhfJJeKvUm1qwONATgp1Y1/WHg3pHMxRy3CwvCBAO0XqfGVViatZsxLFmRIVVoCDuhBaDjjDtOItsGmg5WODHm3NGQ7QE081+mK3ZorSjAwSCK0Et/wlQfyRpWjC7E6glrExyt4mTKPr9wa5LyBnZ+rP8JbXkSJFybc4JicgJjSaKDLlcetX0DsFXh6dq02Fbniyc8ceYFwyuVPDO50t+/Gk6Y8Dj0Gpgilhurw6P3mHex41XCuXrBJaXcaDWvL8F9TijzMreJWoBkjScjuwPQLM+yhn4Yw5emB9Ksax1tsDFJ9f7SjlXp5crY+KwsIkIAIqt74fQED7AvxU+HYMXcsFMoL5eWpQIWSY/cJCYY8jPMH7V7Syjz/WNXfuj19BoHiv+nWrtUgtOa61NPQLHlC09oVd0/ohkVy4kbahQZ2Tk0NQQceswuMuc1qORdUbzLkga4wOjI7bztX7eqqPYm4PxPcECvpBiNVgb8Oltk+h+HSsvy1EhSSqHK4e0JnKw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: kmem_cache_create() and friends create new instances of struct kmem_cache and return pointers to those. Quite a few things in core kernel are allocated from such caches; each allocation involves dereferencing an assign-once pointer and for sufficiently hot ones that dereferencing does show in profiles. There had been patches floating around switching some of those to runtime_constant infrastructure. Unfortunately, it's arch-specific and most of the architectures lack it. There's an alternative approach applicable to the caches that are never destroyed, which covers a lot of them. No matter what, runtime_constant for pointers is not going to be faster than plain &, so if we had struct kmem_cache instances with static storage duration, we would be at least no worse off than we are with runtime_constant variants. There are obstacles to doing that, but they turn out to be easy to deal with. 1) as it is, struct kmem_cache is opaque for anything outside of a few files in mm/*; that avoids serious headache with header dependencies, etc., and it's not something we want to lose. Solution: struct kmem_cache_static, with the size and alignment identical to struct kmem_cache. Calculation of size and alignment can be done via the same mechanism we use for asm-offsets.h and rq-offsets.h, with build-time check for mismatches. With that done, we get an opaque type defined in linux/slab.h that can be used for declaring those caches. 2) real constructor of kmem_cache needs to be taught to deal with preallocated instances. That turns out to be easy - we already pass an obscene amount of optional arguments via struct kmem_cache_args, so we can stash the pointer to preallocated instance in there. Changes in mm/slab_common.c are very minor - we should treat preallocated caches as unmergable, use the instance passed to us instead of allocating a new one and we should not free them on failure. That's it. Patch below implements that and converts several caches (mnt_cache, signal_cache and thread_stack_cache) to static allocation to demonstrate the uses. If we use that in mainline, these would obviously get split into separate commits. It seems to work. There's only one real limitation at the moment - we should never use kmem_cache_destroy() for such caches; unlike runtime_constant, we can *use* those caches from modules - there's no problem with that, as long as the cache itself is in the kernel proper. The obstacle to use of kmem_cache_destroy() might be possible to lift - the only tricky part is sysfs-related logics in kmem_cache_release(). I hadn't looked into that; there's a plenty of never-destroyed core caches, so that thing doesn't lack applications as it is. Review and comments would be very welcome. Signed-off-by: Al Viro --- diff --git a/Kbuild b/Kbuild index 13324b4bbe23..eb985a6614eb 100644 --- a/Kbuild +++ b/Kbuild @@ -45,13 +45,24 @@ kernel/sched/rq-offsets.s: $(offsets-file) $(rq-offsets-file): kernel/sched/rq-offsets.s FORCE $(call filechk,offsets,__RQ_OFFSETS_H__) +# generate kmem_cache_size.h + +kmem_cache_size-file := include/generated/kmem_cache_size.h + +targets += mm/kmem_cache_size.s + +mm/kmem_cache_size.s: $(rq-offsets-file) + +$(kmem_cache_size-file): mm/kmem_cache_size.s FORCE + $(call filechk,offsets,__KMEM_CACHE_SIZE_H__) + # Check for missing system calls quiet_cmd_syscalls = CALL $< cmd_syscalls = $(CONFIG_SHELL) $< $(CC) $(c_flags) $(missing_syscalls_flags) PHONY += missing-syscalls -missing-syscalls: scripts/checksyscalls.sh $(rq-offsets-file) +missing-syscalls: scripts/checksyscalls.sh $(kmem_cache_size-file) $(call cmd,syscalls) # Check the manual modification of atomic headers diff --git a/fs/namespace.c b/fs/namespace.c index 4272349650b1..53610ebbe3f5 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -85,7 +85,8 @@ static u64 mnt_id_ctr = MNT_UNIQUE_ID_OFFSET; static struct hlist_head *mount_hashtable __ro_after_init; static struct hlist_head *mountpoint_hashtable __ro_after_init; -static struct kmem_cache *mnt_cache __ro_after_init; +static struct kmem_cache_opaque __mnt_cache; +#define mnt_cache to_kmem_cache(&__mnt_cache) static DECLARE_RWSEM(namespace_sem); static HLIST_HEAD(unmounted); /* protected by namespace_sem */ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */ @@ -6016,8 +6017,9 @@ void __init mnt_init(void) { int err; - mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct mount), - 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); + kmem_cache_setup("mnt_cache", sizeof(struct mount), + 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, + NULL, mnt_cache); mount_hashtable = alloc_large_system_hash("Mount-cache", sizeof(struct hlist_head), diff --git a/include/linux/slab.h b/include/linux/slab.h index cf443f064a66..89de72bf2e99 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -22,6 +22,15 @@ #include #include +#ifndef COMPILE_OFFSETS +#include + +/* same size and alignment as struct kmem_cache: */ +struct kmem_cache_opaque { + unsigned char opaque[KMEM_CACHE_SIZE]; +} __aligned(KMEM_CACHE_ALIGN); +#endif + enum _slab_flag_bits { _SLAB_CONSISTENCY_CHECKS, _SLAB_RED_ZONE, @@ -261,11 +270,17 @@ enum _slab_flag_bits { struct list_lru; struct mem_cgroup; +struct kmem_cache_opaque; /* * struct kmem_cache related prototypes */ bool slab_is_available(void); +static inline struct kmem_cache *to_kmem_cache(struct kmem_cache_opaque *p) +{ + return (struct kmem_cache *)p; +} + /** * struct kmem_cache_args - Less common arguments for kmem_cache_create() * @@ -366,6 +381,7 @@ struct kmem_cache_args { * %0 means no sheaves will be created. */ unsigned int sheaf_capacity; + struct kmem_cache *preallocated; }; struct kmem_cache *__kmem_cache_create_args(const char *name, @@ -493,6 +509,34 @@ int kmem_cache_shrink(struct kmem_cache *s); .usersize = sizeof_field(struct __struct, __field), \ }, (__flags)) +static inline int +kmem_cache_setup_usercopy(const char *name, unsigned int size, + unsigned int align, slab_flags_t flags, + unsigned int useroffset, unsigned int usersize, + void (*ctor)(void *), struct kmem_cache *s) +{ + struct kmem_cache *res; + + res = __kmem_cache_create_args(name, size, + &(struct kmem_cache_args) { + .align = align, + .ctor = ctor, + .useroffset = useroffset, + .usersize = usersize, + .preallocated = s}, + flags); + return PTR_ERR_OR_ZERO(res); +} + +static inline int +kmem_cache_setup(const char *name, unsigned int size, + unsigned int align, slab_flags_t flags, + void (*ctor)(void *), struct kmem_cache *s) +{ + return kmem_cache_setup_usercopy(name, size, align, flags, + 0, 0, ctor, s); +} + /* * Common kmalloc functions provided by all allocators */ diff --git a/kernel/fork.c b/kernel/fork.c index 3da0f08615a9..e9fcbb55f2f6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -379,7 +379,8 @@ static void free_thread_stack(struct task_struct *tsk) #else /* !(THREAD_SIZE >= PAGE_SIZE) */ -static struct kmem_cache *thread_stack_cache; +static struct kmem_cache_opaque __thread_stack_cache; +#define thread_stack_cache to_kmem_cache(&__thread_stack_cache) static void thread_stack_free_rcu(struct rcu_head *rh) { @@ -410,17 +411,17 @@ static void free_thread_stack(struct task_struct *tsk) void thread_stack_cache_init(void) { - thread_stack_cache = kmem_cache_create_usercopy("thread_stack", - THREAD_SIZE, THREAD_SIZE, 0, 0, - THREAD_SIZE, NULL); - BUG_ON(thread_stack_cache == NULL); + kmem_cache_setup_usercopy("thread_stack", THREAD_SIZE, THREAD_SIZE, + SLAB_PANIC, 0, THREAD_SIZE, NULL, + thread_stack_cache); } #endif /* THREAD_SIZE >= PAGE_SIZE */ #endif /* CONFIG_VMAP_STACK */ /* SLAB cache for signal_struct structures (tsk->signal) */ -static struct kmem_cache *signal_cachep; +static struct kmem_cache_opaque signal_cache; +#define signal_cachep to_kmem_cache(&signal_cache) /* SLAB cache for sighand_struct structures (tsk->sighand) */ struct kmem_cache *sighand_cachep; @@ -2980,10 +2981,10 @@ void __init proc_caches_init(void) sizeof(struct sighand_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_RCU| SLAB_ACCOUNT, sighand_ctor); - signal_cachep = kmem_cache_create("signal_cache", + kmem_cache_setup("signal_cache", sizeof(struct signal_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, - NULL); + NULL, signal_cachep); files_cachep = kmem_cache_create("files_cache", sizeof(struct files_struct), 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, diff --git a/mm/kmem_cache_size.c b/mm/kmem_cache_size.c new file mode 100644 index 000000000000..1ddbfa41a507 --- /dev/null +++ b/mm/kmem_cache_size.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Generate definitions needed by the preprocessor. + * This code generates raw asm output which is post-processed + * to extract and format the required data. + */ + +#define COMPILE_OFFSETS +#include +#include "slab.h" + +int main(void) +{ + /* The constants to put into include/generated/kmem_cache_size.h */ + DEFINE(KMEM_CACHE_SIZE, sizeof(struct kmem_cache)); + DEFINE(KMEM_CACHE_ALIGN, __alignof(struct kmem_cache)); + /* End of constants */ + + return 0; +} diff --git a/mm/slab_common.c b/mm/slab_common.c index 932d13ada36c..fc6c2864fe8a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -224,33 +224,30 @@ static struct kmem_cache *create_cache(const char *name, struct kmem_cache_args *args, slab_flags_t flags) { - struct kmem_cache *s; + struct kmem_cache *s = args->preallocated; int err; /* If a custom freelist pointer is requested make sure it's sane. */ - err = -EINVAL; if (args->use_freeptr_offset && (args->freeptr_offset >= object_size || !(flags & SLAB_TYPESAFE_BY_RCU) || !IS_ALIGNED(args->freeptr_offset, __alignof__(freeptr_t)))) - goto out; + return ERR_PTR(-EINVAL); - err = -ENOMEM; - s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL); - if (!s) - goto out; + if (!s) { + s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL); + if (!s) + return ERR_PTR(-ENOMEM); + } err = do_kmem_cache_create(s, name, object_size, args, flags); - if (err) - goto out_free_cache; - + if (unlikely(err)) { + if (!args->preallocated) + kmem_cache_free(kmem_cache, s); + return ERR_PTR(err); + } s->refcount = 1; list_add(&s->list, &slab_caches); return s; - -out_free_cache: - kmem_cache_free(kmem_cache, s); -out: - return ERR_PTR(err); } /** @@ -324,6 +321,9 @@ struct kmem_cache *__kmem_cache_create_args(const char *name, object_size - args->usersize < args->useroffset)) args->usersize = args->useroffset = 0; + if (args->preallocated) + flags |= SLAB_NO_MERGE; + if (!args->usersize && !args->sheaf_capacity) s = __kmem_cache_alias(name, object_size, args->align, flags, args->ctor); diff --git a/mm/slub.c b/mm/slub.c index a0b905c2a557..faab13c6aaf9 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -8465,6 +8465,12 @@ void __init kmem_cache_init(void) boot_kmem_cache_node; int node; + /* verify that kmem_cache_opaque is correct */ + BUILD_BUG_ON(sizeof(struct kmem_cache) != + sizeof(struct kmem_cache_opaque)); + BUILD_BUG_ON(__alignof(struct kmem_cache) != + __alignof(struct kmem_cache_opaque)); + if (debug_guardpage_minorder()) slub_max_order = 0;