From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CE0CC54736 for ; Tue, 27 Aug 2024 10:45:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EF00D6B0083; Tue, 27 Aug 2024 06:45:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9FE36B0085; Tue, 27 Aug 2024 06:45:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D676E6B0088; Tue, 27 Aug 2024 06:45:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B88936B0083 for ; Tue, 27 Aug 2024 06:45:10 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 363BF16178A for ; Tue, 27 Aug 2024 10:45:10 +0000 (UTC) X-FDA: 82497693180.16.E0260F7 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf06.hostedemail.com (Postfix) with ESMTP id 874A0180004 for ; Tue, 27 Aug 2024 10:45:08 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PUZhyEAi; spf=pass (imf06.hostedemail.com: domain of rppt@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724755465; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cNGlXCrJ5FhI50PesKvJIYhslkijDyJ1xDYRtliNaLU=; b=jDmG0iZlT/4hlwCz2QIKMmwcLngUUTjeovTPYZMDgBlPARM0e7y1rlHfO+QHfHpPWw5nJQ G9Bp01nfMSrunFVR1z/GnKt8uLoeLDuvyJBRrmaX4O2Qn9Kw9lH0mhFRd+KikjYbXCUayI UQf5FnScHcH22W2PHGCwpP++TFDHNlY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PUZhyEAi; spf=pass (imf06.hostedemail.com: domain of rppt@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724755465; a=rsa-sha256; cv=none; b=mWHR/BTm13cA7lFNKNxmASnLMp/UY0bKQ4H0CxExsz6KDGw/HpwrEL0DDXc76fF4jLsHeB Sz5AAmH8QEfE7Nwb4295ud4SyqGb7VNQMkDl26HqRDJF0oXWy1SK1BBkcPfz77r6SJ7UhW v5TluvwdxLgAbTPorD/VMYdBUYU/2Yc= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 9106AA4171B; Tue, 27 Aug 2024 10:45:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38231C4FF0A; Tue, 27 Aug 2024 10:45:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1724755506; bh=at9LM6SLVxHeo4yKZH2oG/ZbGoXiZFuR5C+To/70mZM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PUZhyEAiTwzfeYfMkxuQLHAOdI8JI/ILaGbIFez0rK+ZUj7i2+9uN5/ckaEQAMcwC 7Ui2dhkajFR5bf2BTpJD9G5L7hYIMROSlH0unYRtv9D50+YtKYVDK63YA9FDyeAGyL hCiwmbbYL25r0622hhCtwN1HmFiCQ9va5LldzFyqlkSza0fbAK8ef94lNAp+KEIiaz CslpVX6pi77vPGlcUAgOTP/AieAfM1jWZiZKH9Gj4ByUN+JznIxkdUFJ8ESj5IFRVk QauVxBB/51ruc6mOD2S8WEZD4CsN3n4BxqgwdlVw7A9x9Pmhd5XwonWTja06ZtO1sm eYRXrOBWhKUTQ== Date: Tue, 27 Aug 2024 13:42:30 +0300 From: Mike Rapoport To: Christian Brauner Cc: Vlastimil Babka , Jens Axboe , "Paul E. McKenney" , Roman Gushchin , Linus Torvalds , Jann Horn , linux-mm@kvack.org Subject: Re: [PATCH] [RFC] mm: add kmem_cache_create_rcu() Message-ID: References: <20240826-zweihundert-luftkammer-9b2b47f4338e@brauner> <20240826-okkupieren-nachdenken-d88ac627e9bc@brauner> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240826-okkupieren-nachdenken-d88ac627e9bc@brauner> X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 874A0180004 X-Stat-Signature: 7r9wsny7xteknys84twrag3a8jfhs13z X-HE-Tag: 1724755508-18197 X-HE-Meta: U2FsdGVkX1/BFFiFAc55Tx4naXckrBAmxzqwygJP9YxHtoWDWP/f9unhwXn6A+91JhuymctX7jUkmBB4hLNTTwj8FAGv7nbH2bYiBC6eJK4azpiiJXR5due7+mbRTcn5ui6TCHTxbJkWyFkz6gCMMZhO5xvPO8Pt2rFg/Lopgk7QhIYf31ReiXhcuuey8UdwI+USpkCG7uS0fztMN+Xr636Dr8mfhp3qJlLyVUXomqb1QYbm5OVJ/i15SZFDBdFbZsI6+lT3SCNfBThtb1iQltl24slNxScRsDScWivhm3+FKcR97cC2ly/vOWuKmQQDY7UnInrxOpOH2ow9fkb5Pfik9pbCP8rW6Jmj5Wkmz9jvujtg4jlV3bBVx7BxJR5k5SJd105rgL+ziD+5RJk/A6xoA1YZjkqwhqGlbjRkWFdRt0WHyWa9Yyj0pw88c0P0ZljYZRp3ACpxFDY5ZCrkzq+PWY9dRGJLDUfYWbQVNoO+sHwg2uRxhvLphNJTWv+04IRuCEIb+OPmsGkUFgOoMCR1XDpZl5ifFXdyTjvAYrGl5Ox0XHGh4/gODIUrJCDvNfWb7smRjbWUhxivxchbSPrEU6g2VTA3EYqn7wjxQzN0ftgWX7gNTjma/rh/PVy3FX7PaQkp+OrBAowvvt55AYpU9TH8pHF7ylnwKCuwfTLUvI+jOdNsHowJMRAx7LkR5flF/fX9JhceHffZWSkIpVMTab5LfVOaSqslykWILOlmZv0t4X+sRchu7s3jv+woLz68Xct6CYJPw5jYEruSaXGT74WeNeAM0mHw+Jhvek4+xpc9Ah/YjXotbl/QzIBJnONVtOM//XJG3pBAJJ4MKOspbDqqCtjyuug/qp2cMJYCpNu4dQi07jSpV5PfKbgOnR18HQ4LcqytcYP9l9ELoeBeU4uzbYhKEmXv7fNu2iG/NUfRYNeysl0+YjWkHnYa5YXEtfkqZ0BlgvuQ09C OXcfmttQ m1DKOiS+rb5qDI4+MiH5J3DENAMBfwfp3NAwZcTdBA7rkWzoCzFDBOpLcsT1azH9qaIF/qvbgOBCgdiRoJ3NLqsKt92UOVVeDhh7xjTyATkmX6OhNIHKN/hnnaAJSjgkcDXH76DXqhqDWq/MupTpruEhIpzeJ/6CjQ60OrLs2tRob0Tlt3remLayhE12h7VO8hgIIevcapewR3pQ1HChzxR7gSDcZDGMz1xMjdFt9YYVoHrwLNAAFIfheoB2VSLyicpU5JZ7XmSBpz1nen2U+Bmcvu/SEdDbeCKhicFyzLkaf3MITLqF8lAc2b2XCRP/LT09mywvdFqlP/7U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Aug 26, 2024 at 06:04:13PM +0200, Christian Brauner wrote: > When a kmem cache is created with SLAB_TYPESAFE_BY_RCU the free pointer > must be located outside of the object because we don't know what part of > the memory can safely be overwritten as it may be needed to prevent > object recycling. > > That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a > new cacheline. This is the case for .e.g, struct file. After having it > shrunk down by 40 bytes and having it fit in three cachelines we still > have SLAB_TYPESAFE_BY_RCU adding a fourth cacheline because it needs to > accomodate the free pointer and is hardware cacheline aligned. > > I tried to find ways to rectify this as struct file is pretty much > everywhere and having it use less memory is a good thing. So here's a > proposal that might be totally the wrong api and broken but I thought I > give it a try. > > Signed-off-by: Christian Brauner > --- > fs/file_table.c | 7 ++-- > include/linux/fs.h | 1 + > include/linux/slab.h | 4 +++ > mm/slab.h | 1 + > mm/slab_common.c | 76 +++++++++++++++++++++++++++++++++++++------- > mm/slub.c | 22 +++++++++---- > 6 files changed, 91 insertions(+), 20 deletions(-) > > diff --git a/fs/file_table.c b/fs/file_table.c > index 694199a1a966..a69b8a71eacb 100644 > --- a/fs/file_table.c > +++ b/fs/file_table.c > @@ -514,9 +514,10 @@ EXPORT_SYMBOL(__fput_sync); > > void __init files_init(void) > { > - filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0, > - SLAB_TYPESAFE_BY_RCU | SLAB_HWCACHE_ALIGN | > - SLAB_PANIC | SLAB_ACCOUNT, NULL); > + filp_cachep = kmem_cache_create_rcu("filp", sizeof(struct file), > + offsetof(struct file, __f_slab_free_ptr), > + SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT, > + NULL); > percpu_counter_init(&nr_files, 0, GFP_KERNEL); > } > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 61097a9cf317..de509f5d1446 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1057,6 +1057,7 @@ struct file { > struct callback_head f_task_work; > struct llist_node f_llist; > struct file_ra_state f_ra; > + void *__f_slab_free_ptr; > }; > /* --- cacheline 3 boundary (192 bytes) --- */ > } __randomize_layout > diff --git a/include/linux/slab.h b/include/linux/slab.h > index eb2bf4629157..fc3c3cc9f689 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -242,6 +242,10 @@ struct kmem_cache *kmem_cache_create_usercopy(const char *name, > slab_flags_t flags, > unsigned int useroffset, unsigned int usersize, > void (*ctor)(void *)); > +struct kmem_cache *kmem_cache_create_rcu(const char *name, unsigned int size, > + unsigned int offset, Just 'offset' is to vague, no? Maybe freeptr_offset? > + slab_flags_t flags, > + void (*ctor)(void *)); > void kmem_cache_destroy(struct kmem_cache *s); > int kmem_cache_shrink(struct kmem_cache *s); > > diff --git a/mm/slab.h b/mm/slab.h > index dcdb56b8e7f5..122ca41fea34 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -261,6 +261,7 @@ struct kmem_cache { > unsigned int object_size; /* Object size without metadata */ > struct reciprocal_value reciprocal_size; > unsigned int offset; /* Free pointer offset */ > + bool dedicated_offset; /* Specific free pointer requested */ has_freeptr_offset? > #ifdef CONFIG_SLUB_CPU_PARTIAL > /* Number of per cpu partial objects to keep around */ > unsigned int cpu_partial; > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 40b582a014b8..b6ca63859b3a 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -202,10 +202,10 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align, > } > > static struct kmem_cache *create_cache(const char *name, > - unsigned int object_size, unsigned int align, > - slab_flags_t flags, unsigned int useroffset, > - unsigned int usersize, void (*ctor)(void *), > - struct kmem_cache *root_cache) > + unsigned int object_size, unsigned int offset, > + unsigned int align, slab_flags_t flags, > + unsigned int useroffset, unsigned int usersize, > + void (*ctor)(void *), struct kmem_cache *root_cache) > { > struct kmem_cache *s; > int err; > @@ -213,6 +213,10 @@ static struct kmem_cache *create_cache(const char *name, > if (WARN_ON(useroffset + usersize > object_size)) > useroffset = usersize = 0; > > + if (WARN_ON(offset >= object_size || > + (offset && !(flags & SLAB_TYPESAFE_BY_RCU)))) > + offset = 0; > + > err = -ENOMEM; > s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL); > if (!s) > @@ -226,6 +230,10 @@ static struct kmem_cache *create_cache(const char *name, > s->useroffset = useroffset; > s->usersize = usersize; > #endif > + if (offset > 0) { > + s->offset = offset; > + s->dedicated_offset = true; > + } > > err = __kmem_cache_create(s, flags); > if (err) > @@ -269,10 +277,10 @@ static struct kmem_cache *create_cache(const char *name, > * > * Return: a pointer to the cache on success, NULL on failure. > */ > -struct kmem_cache * > -kmem_cache_create_usercopy(const char *name, > - unsigned int size, unsigned int align, > - slab_flags_t flags, > +static struct kmem_cache * > +do_kmem_cache_create_usercopy(const char *name, > + unsigned int size, unsigned int offset, > + unsigned int align, slab_flags_t flags, > unsigned int useroffset, unsigned int usersize, > void (*ctor)(void *)) > { > @@ -332,7 +340,7 @@ kmem_cache_create_usercopy(const char *name, > goto out_unlock; > } > > - s = create_cache(cache_name, size, > + s = create_cache(cache_name, size, offset, > calculate_alignment(flags, align, size), > flags, useroffset, usersize, ctor, NULL); > if (IS_ERR(s)) { > @@ -356,6 +364,16 @@ kmem_cache_create_usercopy(const char *name, > } > return s; > } > + > +struct kmem_cache * > +kmem_cache_create_usercopy(const char *name, unsigned int size, > + unsigned int align, slab_flags_t flags, > + unsigned int useroffset, unsigned int usersize, > + void (*ctor)(void *)) > +{ > + return do_kmem_cache_create_usercopy(name, size, 0, align, flags, > + useroffset, usersize, ctor); > +} > EXPORT_SYMBOL(kmem_cache_create_usercopy); > > /** > @@ -387,11 +405,47 @@ struct kmem_cache * > kmem_cache_create(const char *name, unsigned int size, unsigned int align, > slab_flags_t flags, void (*ctor)(void *)) > { > - return kmem_cache_create_usercopy(name, size, align, flags, 0, 0, > - ctor); > + return do_kmem_cache_create_usercopy(name, size, 0, align, flags, 0, 0, > + ctor); > } > EXPORT_SYMBOL(kmem_cache_create); > > +/** > + * kmem_cache_create_rcu - Create a SLAB_TYPESAFE_BY_RCU cache. > + * @name: A string which is used in /proc/slabinfo to identify this cache. > + * @size: The size of objects to be created in this cache. > + * @offset: The offset into the memory to the free pointer > + * @flags: SLAB flags > + * @ctor: A constructor for the objects. > + * > + * Cannot be called within a interrupt, but can be interrupted. > + * The @ctor is run when new pages are allocated by the cache. > + * > + * The flags are > + * > + * %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5) > + * to catch references to uninitialised memory. > + * > + * %SLAB_RED_ZONE - Insert `Red` zones around the allocated memory to check > + * for buffer overruns. > + * > + * %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware > + * cacheline. This can be beneficial if you're counting cycles as closely > + * as davem. > + * > + * Return: a pointer to the cache on success, NULL on failure. > + */ > +struct kmem_cache *kmem_cache_create_rcu(const char *name, unsigned int size, > + unsigned int offset, > + slab_flags_t flags, > + void (*ctor)(void *)) > +{ > + return do_kmem_cache_create_usercopy(name, size, offset, 0, > + flags | SLAB_TYPESAFE_BY_RCU, 0, 0, > + ctor); > +} > +EXPORT_SYMBOL(kmem_cache_create_rcu); > + > static struct kmem_cache *kmem_buckets_cache __ro_after_init; > > /** > diff --git a/mm/slub.c b/mm/slub.c > index c9d8a2497fd6..34eac3f9a46e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3926,7 +3926,7 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s, > void *obj) > { > if (unlikely(slab_want_init_on_free(s)) && obj && > - !freeptr_outside_object(s)) > + !freeptr_outside_object(s) && !s->dedicated_offset) > memset((void *)((char *)kasan_reset_tag(obj) + s->offset), > 0, sizeof(void *)); > } > @@ -5153,6 +5153,7 @@ static int calculate_sizes(struct kmem_cache *s) > slab_flags_t flags = s->flags; > unsigned int size = s->object_size; > unsigned int order; > + bool must_use_freeptr_offset; > > /* > * Round up object size to the next word boundary. We can only > @@ -5189,9 +5190,12 @@ static int calculate_sizes(struct kmem_cache *s) > */ > s->inuse = size; > > - if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || s->ctor || > - ((flags & SLAB_RED_ZONE) && > - (s->object_size < sizeof(void *) || slub_debug_orig_size(s)))) { > + must_use_freeptr_offset = > + (flags & SLAB_POISON) || s->ctor || > + ((flags & SLAB_RED_ZONE) && > + (s->object_size < sizeof(void *) || slub_debug_orig_size(s))); > + > + if ((flags & SLAB_TYPESAFE_BY_RCU) || must_use_freeptr_offset) { > /* > * Relocate free pointer after the object if it is not > * permitted to overwrite the first word of the object on > @@ -5208,8 +5212,13 @@ static int calculate_sizes(struct kmem_cache *s) > * freeptr_outside_object() function. If that is no > * longer true, the function needs to be modified. > */ > - s->offset = size; > - size += sizeof(void *); > + if (!(flags & SLAB_TYPESAFE_BY_RCU) || must_use_freeptr_offset) { > + s->offset = size; > + size += sizeof(void *); > + s->dedicated_offset = false; > + } else { > + s->dedicated_offset = true; Hmm, this seem to set s->dedicated_offset for any SLAB_TYPESAFE_BY_RCU cache, even those that weren't created with kmem_cache_create_rcu(). Shouldn't we have must_use_freeptr_offset = ((flags & SLAB_TYPESAFE_BY_RCU) && !s->dedicated_offset) || (flags & SLAB_POISON) || s->ctor || ((flags & SLAB_RED_ZONE) && (s->object_size < sizeof(void *) || slub_debug_orig_size(s))); if (must_use_freeptr_offset) { ... } > + } > } else { > /* > * Store freelist pointer near middle of object to keep > @@ -5301,6 +5310,7 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) > if (get_order(s->size) > get_order(s->object_size)) { > s->flags &= ~DEBUG_METADATA_FLAGS; > s->offset = 0; > + s->dedicated_offset = false; > if (!calculate_sizes(s)) > goto error; > } > -- > 2.45.2 > > -- Sincerely yours, Mike.