From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Vlastimil Babka <vbabka@suse.cz>,
netdev@vger.kernel.org, linux-mm@kvack.org
Cc: brouer@redhat.com, Christoph Lameter <cl@linux.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@techsingularity.net>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
penberg@kernel.org, Jakub Kicinski <kuba@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
edumazet@google.com, pabeni@redhat.com,
Matthew Wilcox <willy@infradead.org>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
David Sterba <dsterba@suse.com>
Subject: Re: [PATCH RFC] mm+net: allow to set kmem_cache create flag for SLAB_NEVER_MERGE
Date: Wed, 31 May 2023 15:59:26 +0200 [thread overview]
Message-ID: <fe64a059-f4c8-5983-3b08-f84552d1ce61@redhat.com> (raw)
In-Reply-To: <81597717-0fed-5fd0-37d0-857d976b9d40@suse.cz>
On 31/05/2023 14.03, Vlastimil Babka wrote:
> On 1/17/23 14:40, Jesper Dangaard Brouer wrote:
>> Allow API users of kmem_cache_create to specify that they don't want
>> any slab merge or aliasing (with similar sized objects). Use this in
>> network stack and kfence_test.
>>
>> The SKB (sk_buff) kmem_cache slab is critical for network performance.
>> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
>> performance by amortising the alloc/free cost.
>>
>> For the bulk API to perform efficiently the slub fragmentation need to
>> be low. Especially for the SLUB allocator, the efficiency of bulk free
>> API depend on objects belonging to the same slab (page).
>>
>> When running different network performance microbenchmarks, I started
>> to notice that performance was reduced (slightly) when machines had
>> longer uptimes. I believe the cause was 'skbuff_head_cache' got
>> aliased/merged into the general slub for 256 bytes sized objects (with
>> my kernel config, without CONFIG_HARDENED_USERCOPY).
>>
>> For SKB kmem_cache network stack have reasons for not merging, but it
>> varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
>> We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache.
>>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> Since this idea was revived by David [1], and neither patch worked as is,
> but yours was more complete and first, I have fixed it up as below. The
> skbuff part itself will be best submitted separately afterwards so we don't
> get conflicts between trees etc. Comments?
>
Thanks for following up on this! :-)
I like the adjustments, ACKed below.
I'm okay with submitting changes to net/core/skbuff.c separately.
> ----8<----
> From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Tue, 17 Jan 2023 14:40:00 +0100
> Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
>
> Allow API users of kmem_cache_create to specify that they don't want
> any slab merge or aliasing (with similar sized objects). Use this in
> kfence_test.
>
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
>
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
>
> When running different network performance microbenchmarks, I started
> to notice that performance was reduced (slightly) when machines had
> longer uptimes. I believe the cause was 'skbuff_head_cache' got
> aliased/merged into the general slub for 256 bytes sized objects (with
> my kernel config, without CONFIG_HARDENED_USERCOPY).
>
> For SKB kmem_cache network stack have reasons for not merging, but it
> varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> We want to explicitly set SLAB_NO_MERGE for this kmem_cache.
>
> Another use case for the flag has been described by David Sterba [1]:
>
>> This can be used for more fine grained control over the caches or for
>> debugging builds where separate slabs can verify that no objects leak.
>
>> The slab_nomerge boot option is too coarse and would need to be
>> enabled on all testing hosts. There are some other ways how to disable
>> merging, e.g. a slab constructor but this disables poisoning besides
>> that it adds additional overhead. Other flags are internal and may
>> have other semantics.
>
>> A concrete example what motivates the flag. During 'btrfs balance'
>> slab top reported huge increase in caches like
>
>> 1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt
>> 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace
>> 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot
>
>> which was confusing and that it's because of slab merging was not the
>> first idea. After rebooting with slab_nomerge all the caches were
>> from btrfs_ namespace as expected.
>
> [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/
>
> [ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the
> one proposed by David so it does not collide with internal SLAB/SLUB
> flags, write a comment for the flag, expand changelog, drop the skbuff
> part to be handled spearately ]
>
> Reported-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com
> ---
> include/linux/slab.h | 12 ++++++++++++
> mm/kfence/kfence_test.c | 7 +++----
> mm/slab.h | 5 +++--
> mm/slab_common.c | 2 +-
> 4 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 6b3e155b70bf..72bc906d8bc7 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -106,6 +106,18 @@
> /* Avoid kmemleak tracing */
> #define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U)
>
> +/*
> + * Prevent merging with compatible kmem caches. This flag should be used
> + * cautiously. Valid use cases:
> + *
> + * - caches created for self-tests (e.g. kunit)
> + * - general caches created and used by a subsystem, only when a
> + * (subsystem-specific) debug option is enabled
> + * - performance critical caches, should be very rare and consulted with slab
> + * maintainers, and not used together with CONFIG_SLUB_TINY
> + */
> +#define SLAB_NO_MERGE ((slab_flags_t __force)0x01000000U)
> +
> /* Fault injection mark */
> #ifdef CONFIG_FAILSLAB
> # define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U)
> diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
> index 6aee19a79236..9e008a336d9f 100644
> --- a/mm/kfence/kfence_test.c
> +++ b/mm/kfence/kfence_test.c
> @@ -191,11 +191,10 @@ static size_t setup_test_cache(struct kunit *test, size_t size, slab_flags_t fla
> kunit_info(test, "%s: size=%zu, ctor=%ps\n", __func__, size, ctor);
>
> /*
> - * Use SLAB_NOLEAKTRACE to prevent merging with existing caches. Any
> - * other flag in SLAB_NEVER_MERGE also works. Use SLAB_ACCOUNT to
> - * allocate via memcg, if enabled.
> + * Use SLAB_NO_MERGE to prevent merging with existing caches.
> + * Use SLAB_ACCOUNT to allocate via memcg, if enabled.
> */
> - flags |= SLAB_NOLEAKTRACE | SLAB_ACCOUNT;
> + flags |= SLAB_NO_MERGE | SLAB_ACCOUNT;
> test_cache = kmem_cache_create("test", size, 1, flags, ctor);
> KUNIT_ASSERT_TRUE_MSG(test, test_cache, "could not create cache");
>
> diff --git a/mm/slab.h b/mm/slab.h
> index f01ac256a8f5..9005ddc51cf8 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -294,11 +294,11 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
> #if defined(CONFIG_SLAB)
> #define SLAB_CACHE_FLAGS (SLAB_MEM_SPREAD | SLAB_NOLEAKTRACE | \
> SLAB_RECLAIM_ACCOUNT | SLAB_TEMPORARY | \
> - SLAB_ACCOUNT)
> + SLAB_ACCOUNT | SLAB_NO_MERGE)
> #elif defined(CONFIG_SLUB)
> #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \
> SLAB_TEMPORARY | SLAB_ACCOUNT | \
> - SLAB_NO_USER_FLAGS | SLAB_KMALLOC)
> + SLAB_NO_USER_FLAGS | SLAB_KMALLOC | SLAB_NO_MERGE)
> #else
> #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE)
> #endif
> @@ -319,6 +319,7 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
> SLAB_TEMPORARY | \
> SLAB_ACCOUNT | \
> SLAB_KMALLOC | \
> + SLAB_NO_MERGE | \
> SLAB_NO_USER_FLAGS)
>
> bool __kmem_cache_empty(struct kmem_cache *);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 607249785c07..0e0a617eae7d 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -47,7 +47,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> */
> #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \
> SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \
> - SLAB_FAILSLAB | kasan_never_merge())
> + SLAB_FAILSLAB | SLAB_NO_MERGE | kasan_never_merge())
>
> #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
> SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
prev parent reply other threads:[~2023-05-31 13:59 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-17 13:40 Jesper Dangaard Brouer
2023-01-17 14:54 ` Christoph Lameter
2023-01-18 5:17 ` Matthew Wilcox
2023-01-19 18:08 ` Jesper Dangaard Brouer
2023-01-24 16:06 ` Hyeonggon Yoo
2023-01-18 7:36 ` Vlastimil Babka
2023-01-23 16:14 ` Jesper Dangaard Brouer
2023-05-31 12:03 ` Vlastimil Babka
2023-05-31 13:59 ` Jesper Dangaard Brouer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fe64a059-f4c8-5983-3b08-f84552d1ce61@redhat.com \
--to=jbrouer@redhat.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=brouer@redhat.com \
--cc=cl@linux.com \
--cc=davem@davemloft.net \
--cc=dsterba@suse.com \
--cc=edumazet@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kuba@kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=penberg@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox