From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1090BC77B73 for ; Wed, 31 May 2023 12:03:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C56C8E0002; Wed, 31 May 2023 08:03:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 94F046B009E; Wed, 31 May 2023 08:03:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C7C08E0002; Wed, 31 May 2023 08:03:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 65CB86B0078 for ; Wed, 31 May 2023 08:03:16 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0B0A9ADCEB for ; Wed, 31 May 2023 12:03:16 +0000 (UTC) X-FDA: 80850414792.25.EFC4806 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf13.hostedemail.com (Postfix) with ESMTP id 440F8200B2 for ; Wed, 31 May 2023 12:03:08 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YeG01zv5; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=GI4HuYm5; dmarc=none; spf=pass (imf13.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685534588; a=rsa-sha256; cv=none; b=EXUB5BDPsYYiIM0Lyh0ajgQ3OzduhTtgxKf7wy3nPFv0DjUAKc9ybO0UVOoP+Cl0xLSEho 0Fgy/iMN/2HjjbbSdfMS0wuX5hx+46rG4gtHwOrYckbs9ASi9fNGpeVen86Zt0a0o1QbVH 7oHfVRZTlwd/yubO/KvlLm818SRbsOQ= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=YeG01zv5; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=GI4HuYm5; dmarc=none; spf=pass (imf13.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685534588; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V1MgUINu6VN5FOnOzZYKCCSI90Win23BAGRGSHk2NhM=; b=uAdvBIEtOAyFge/5LN661L/uzwZSUJHKs69H4fjBH3OWbDF8vJewyAIx5LQoNzxxrKLlAA Iy1f7O0K8+mqOGU3dgRxkvjh9Cph+mUXjalpwwHeSQakVGtqI6o0U4Xf1dbNqfri+yvlit LtfiXCGTx9GUlxb+fmLNtjFFeAnJY8E= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8F9A82186F; Wed, 31 May 2023 12:03:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1685534585; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V1MgUINu6VN5FOnOzZYKCCSI90Win23BAGRGSHk2NhM=; b=YeG01zv5E06hAk2B+F6RsmzWsLfK1As+brW+QGXuVqEO0vnitZbi8chts9WT4VQtLMCQDs 9wdelJYryAp6yepqeLVmhRs5OY/wLKrO0CMoain1NQSRLzFLdH8KT7S3lNfaJGotyV7FBp 5cLxxoeJX0l0XStKJMSdDX4FwfGrkpg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1685534585; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V1MgUINu6VN5FOnOzZYKCCSI90Win23BAGRGSHk2NhM=; b=GI4HuYm5gHk3+dqrbtLqjGhg9bomTJrI28yBgAEGfD/Osq1HAar7EBX70u9h817kCS1Mgh zX8qW4/AtecpWgBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 570BE138E8; Wed, 31 May 2023 12:03:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0a5FFHk3d2SlWQAAMHmgww (envelope-from ); Wed, 31 May 2023 12:03:05 +0000 Message-ID: <81597717-0fed-5fd0-37d0-857d976b9d40@suse.cz> Date: Wed, 31 May 2023 14:03:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.2 Subject: Re: [PATCH RFC] mm+net: allow to set kmem_cache create flag for SLAB_NEVER_MERGE To: Jesper Dangaard Brouer , netdev@vger.kernel.org, linux-mm@kvack.org Cc: Christoph Lameter , Andrew Morton , Mel Gorman , Joonsoo Kim , penberg@kernel.org, Jakub Kicinski , "David S. Miller" , edumazet@google.com, pabeni@redhat.com, Matthew Wilcox , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , David Sterba References: <167396280045.539803.7540459812377220500.stgit@firesoul> Content-Language: en-US From: Vlastimil Babka In-Reply-To: <167396280045.539803.7540459812377220500.stgit@firesoul> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 440F8200B2 X-Stat-Signature: jt4bggene6ww4r9s84jhma8bsrn4x5i7 X-HE-Tag: 1685534588-499406 X-HE-Meta: U2FsdGVkX1/0/+S7NQzLRsOml9We1QF9nBYCw3hTvA4cyUY/z0LM1hgciu9qzkzmQiYDzOF5vzbn+b/sJzztevPmYkFVsyQlZIvokhPeHE1FQpp1s1i6pwEz0A54BW+KFLZwG5OxtPAPdlN4NQv01jfgw9NCYHTR5X8dMjYOxHPTVdJsCREEbmFhLSOLiOCtE9/VFVWvx6QjhOCNSA4QbwgbAnRC95QK9EpCflcUS9XNBAr4cC8JTw4Ba7ejGzMTn73CaQx6cYcLx4TXmqzsG3+n4jvBqjqMCO0qZf2nGlD38kjbOfFwM8fRQYWTGOY4Y/4TVJDWWFHQqIY6JsyJtCN02mPQtwJ15FxA9gLovDDRjFHQE9nHdyGgjjFFtcVObdwLygI/c7yP4Ac91eKoybSR8fpda8p8ug3WXbmiGgNm0j4f+2FDREGYBpYoONXKmmUJyB0xNsgO0NucOdwAlvSc+GLDzPXwo8BTEm8Aa1+qOhxL8fZkeT/ncyM4gnlhI+xwP68Kaf2QBlKzIOdAjFegbtHpivCn+UbFBwRnldLii3lLACDo0f/mX6Y/ZeWurgQQqzfbr2cczXtFtXtXNkkRfOgs4mRjsjBIIsGxM89+PSnuVKM2NnHpy1h9/4RqbFzOuriNdH+gCsol3h8Fntvzs2u05VoxCWHQmc5PQHRoq4iVn9UAfGi734jLDD+ii3r/r5nejtCoT5HTEjFUsRq44ZNA2zox7DO/wQpK7jZYReinuwCUCOnTT1TsI1qiY+GrWdvqMD0VfFbckgLNKQHQMGjC+eraxiE8GEuW83L713Zh3iQzPbKF+9dqhKqlWx0c35TlGT6drOJI/jI+LZD28I4UeU2D+RT5sL0yWE2KXsqQO1qr9UTXENZYVfc3EJei/F8hMdQr7J6C+2a/pu1qXTcrroyXP4v6a+Em7d8kRt/gQFR83Ne/CVDgBkgMipFmmNrapMKd5cyaHcx wY80PYGr g2d+tSRooXf8m8K5eaJ+lOx2zkL3Qyz9za5XQiphUNkYuAGYHkhoWPrNadbRRqn5N2NxZEUMU85aEUhSFDm6VtmpOc3E8TyTnpFE89daBg20Zu+WxVqFt/jCSnKvak8v4J0r3BIWp7GEojlaeEulncdQeU4WwK+swO/mrE0anqqJy41GZ7xUwb3hNC+96qZRawXk+aXP6TDTCvhoLcPaK1H9tZSl85m7OspcXM+cgIrDRGsLj+WppVo3KVSPJHOFFH4a+tTcbQogJrmX+SPzVOkUbQxIYqomhAbgv+UEiP9FS6qoC/r6EOSxxLkXnf9vNY4QwX0w+pcurc4+g6jLPyMaqRif9B5q1WOYrfKEmkN+ZtCzjp6PTKxl2Ecw6Gm5BB0y926v2pNMJ+NARewR8BYyIaaTDeF3dEY0loOdze0jIHZnUj3HG35M/vwEfHBjMfNQA1cRcbj0dbKk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/17/23 14:40, Jesper Dangaard Brouer wrote: > Allow API users of kmem_cache_create to specify that they don't want > any slab merge or aliasing (with similar sized objects). Use this in > network stack and kfence_test. > > The SKB (sk_buff) kmem_cache slab is critical for network performance. > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain > performance by amortising the alloc/free cost. > > For the bulk API to perform efficiently the slub fragmentation need to > be low. Especially for the SLUB allocator, the efficiency of bulk free > API depend on objects belonging to the same slab (page). > > When running different network performance microbenchmarks, I started > to notice that performance was reduced (slightly) when machines had > longer uptimes. I believe the cause was 'skbuff_head_cache' got > aliased/merged into the general slub for 256 bytes sized objects (with > my kernel config, without CONFIG_HARDENED_USERCOPY). > > For SKB kmem_cache network stack have reasons for not merging, but it > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). > We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache. > > Signed-off-by: Jesper Dangaard Brouer Since this idea was revived by David [1], and neither patch worked as is, but yours was more complete and first, I have fixed it up as below. The skbuff part itself will be best submitted separately afterwards so we don't get conflicts between trees etc. Comments? ----8<---- >From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Tue, 17 Jan 2023 14:40:00 +0100 Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE Allow API users of kmem_cache_create to specify that they don't want any slab merge or aliasing (with similar sized objects). Use this in kfence_test. The SKB (sk_buff) kmem_cache slab is critical for network performance. Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain performance by amortising the alloc/free cost. For the bulk API to perform efficiently the slub fragmentation need to be low. Especially for the SLUB allocator, the efficiency of bulk free API depend on objects belonging to the same slab (page). When running different network performance microbenchmarks, I started to notice that performance was reduced (slightly) when machines had longer uptimes. I believe the cause was 'skbuff_head_cache' got aliased/merged into the general slub for 256 bytes sized objects (with my kernel config, without CONFIG_HARDENED_USERCOPY). For SKB kmem_cache network stack have reasons for not merging, but it varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE for this kmem_cache. Another use case for the flag has been described by David Sterba [1]: > This can be used for more fine grained control over the caches or for > debugging builds where separate slabs can verify that no objects leak. > The slab_nomerge boot option is too coarse and would need to be > enabled on all testing hosts. There are some other ways how to disable > merging, e.g. a slab constructor but this disables poisoning besides > that it adds additional overhead. Other flags are internal and may > have other semantics. > A concrete example what motivates the flag. During 'btrfs balance' > slab top reported huge increase in caches like > 1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt > 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace > 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot > which was confusing and that it's because of slab merging was not the > first idea. After rebooting with slab_nomerge all the caches were > from btrfs_ namespace as expected. [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/ [ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the one proposed by David so it does not collide with internal SLAB/SLUB flags, write a comment for the flag, expand changelog, drop the skbuff part to be handled spearately ] Reported-by: David Sterba Signed-off-by: Jesper Dangaard Brouer Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 12 ++++++++++++ mm/kfence/kfence_test.c | 7 +++---- mm/slab.h | 5 +++-- mm/slab_common.c | 2 +- 4 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 6b3e155b70bf..72bc906d8bc7 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -106,6 +106,18 @@ /* Avoid kmemleak tracing */ #define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U) +/* + * Prevent merging with compatible kmem caches. This flag should be used + * cautiously. Valid use cases: + * + * - caches created for self-tests (e.g. kunit) + * - general caches created and used by a subsystem, only when a + * (subsystem-specific) debug option is enabled + * - performance critical caches, should be very rare and consulted with slab + * maintainers, and not used together with CONFIG_SLUB_TINY + */ +#define SLAB_NO_MERGE ((slab_flags_t __force)0x01000000U) + /* Fault injection mark */ #ifdef CONFIG_FAILSLAB # define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U) diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c index 6aee19a79236..9e008a336d9f 100644 --- a/mm/kfence/kfence_test.c +++ b/mm/kfence/kfence_test.c @@ -191,11 +191,10 @@ static size_t setup_test_cache(struct kunit *test, size_t size, slab_flags_t fla kunit_info(test, "%s: size=%zu, ctor=%ps\n", __func__, size, ctor); /* - * Use SLAB_NOLEAKTRACE to prevent merging with existing caches. Any - * other flag in SLAB_NEVER_MERGE also works. Use SLAB_ACCOUNT to - * allocate via memcg, if enabled. + * Use SLAB_NO_MERGE to prevent merging with existing caches. + * Use SLAB_ACCOUNT to allocate via memcg, if enabled. */ - flags |= SLAB_NOLEAKTRACE | SLAB_ACCOUNT; + flags |= SLAB_NO_MERGE | SLAB_ACCOUNT; test_cache = kmem_cache_create("test", size, 1, flags, ctor); KUNIT_ASSERT_TRUE_MSG(test, test_cache, "could not create cache"); diff --git a/mm/slab.h b/mm/slab.h index f01ac256a8f5..9005ddc51cf8 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -294,11 +294,11 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s) #if defined(CONFIG_SLAB) #define SLAB_CACHE_FLAGS (SLAB_MEM_SPREAD | SLAB_NOLEAKTRACE | \ SLAB_RECLAIM_ACCOUNT | SLAB_TEMPORARY | \ - SLAB_ACCOUNT) + SLAB_ACCOUNT | SLAB_NO_MERGE) #elif defined(CONFIG_SLUB) #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \ SLAB_TEMPORARY | SLAB_ACCOUNT | \ - SLAB_NO_USER_FLAGS | SLAB_KMALLOC) + SLAB_NO_USER_FLAGS | SLAB_KMALLOC | SLAB_NO_MERGE) #else #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE) #endif @@ -319,6 +319,7 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s) SLAB_TEMPORARY | \ SLAB_ACCOUNT | \ SLAB_KMALLOC | \ + SLAB_NO_MERGE | \ SLAB_NO_USER_FLAGS) bool __kmem_cache_empty(struct kmem_cache *); diff --git a/mm/slab_common.c b/mm/slab_common.c index 607249785c07..0e0a617eae7d 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -47,7 +47,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work, */ #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \ - SLAB_FAILSLAB | kasan_never_merge()) + SLAB_FAILSLAB | SLAB_NO_MERGE | kasan_never_merge()) #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ SLAB_CACHE_DMA32 | SLAB_ACCOUNT) -- 2.40.1