From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 934FAC7EE23 for ; Wed, 31 May 2023 13:59:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBE798E0003; Wed, 31 May 2023 09:59:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6FED8E0001; Wed, 31 May 2023 09:59:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E8D78E0003; Wed, 31 May 2023 09:59:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8BC138E0001 for ; Wed, 31 May 2023 09:59:37 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 141C1AE1AB for ; Wed, 31 May 2023 13:59:37 +0000 (UTC) X-FDA: 80850707994.07.A4F4947 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id C1A41180002 for ; Wed, 31 May 2023 13:59:33 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AvWwDOuj; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of jbrouer@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=jbrouer@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1685541573; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kVhCP6HTOR20StVaLzZ07Fh8IZ49b1eeP04qsYM8C9M=; b=BhJC+z6JR0G+vdxxy4HB6ZnxUSLNE6qP0B3f+EQYaFs78Mh/d6I+a3HxlfpzJkgzAf9Pv5 VytIRvDJ3VPSojN13njIhh+M8IRPsE3Qxxn1PKmUxce/XKMo0OJYyznknOag7/DqYwO2BY ExH8MVkB9sljb1k2G3wlEk6qLNpkDTE= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=AvWwDOuj; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf06.hostedemail.com: domain of jbrouer@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=jbrouer@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1685541573; a=rsa-sha256; cv=none; b=TblNhR+EB2iUY1CVPdcaV30o6SbGr/O4CSPkebidGPepHOTKgTQ3r0T7AK0f3Nk05vwwx1 mx1UxkesxspsO+g2kQDLw/kuN7yS2ulongzwFpEZCWPo36yM8/f1UVYydcWwYtANXpJfYk JyzV4pAxYBm5tUwUeY+pPcvwNXwR6GE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685541573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kVhCP6HTOR20StVaLzZ07Fh8IZ49b1eeP04qsYM8C9M=; b=AvWwDOujDUzHna/4YrPg6tk1ph1Ix5B/4pprsamE3LwmcTt7dpckKA1ibKz1RAyR1oohF7 6RiabX4e0FBOsx15775DVJdMZ9FSzJOnKFHZwWLr8jJXeHZCZp3L9HVlaKx3tbyiE/lFQ7 cqmbIMGXylP3Frl12r4nH1WgGQck+Ok= Received: from mail-ej1-f72.google.com (mail-ej1-f72.google.com [209.85.218.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-185-5GHkXW_wOdia6_I-xl_e9Q-1; Wed, 31 May 2023 09:59:30 -0400 X-MC-Unique: 5GHkXW_wOdia6_I-xl_e9Q-1 Received: by mail-ej1-f72.google.com with SMTP id a640c23a62f3a-96f83b44939so508089266b.1 for ; Wed, 31 May 2023 06:59:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685541569; x=1688133569; h=content-transfer-encoding:in-reply-to:references:to :content-language:subject:cc:user-agent:mime-version:date:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kVhCP6HTOR20StVaLzZ07Fh8IZ49b1eeP04qsYM8C9M=; b=EPMbM+ca8UuptogmVzwLR7n27ErTRzpOdcyGRtEL4qr1raLcR479eHx470TMOduCyx oM5UEmWjWydRUk9HG79hqV5Y1vdI3rnZ6wV52i0qMckSShV9ZGW0z4YNr+orHoHJreKm ICfNUmbhIStXS750lOKmTO4DmV1DC7hnGc+T+9ArPJtlxQZW5WnNkIE+fO242/aJHNqy Zxdif2MtsItv9Iky3I6nd0/sLEQeUMqD47gf/mTxXZsW75GsB5QwpyYzjDKwqApjfGYf ZUHuFRUVkosog5DOPVrYaUO7hwgPU8ogeIzgL7R/cx7pPnWoJqwZ/0uOB5/uAHNr537g W6FQ== X-Gm-Message-State: AC+VfDxqdR+FyC/3eQqVYBkBvCiEX3sHNYFXUzlzjICyTtyobFBIbNUD DHeGjknhqPAHyBuNz7zfi/jwbYP/RA/H6xu8SvTqm5FnUliH5VrZ81tw4z8qnrFRlZWQKqXsNrF Sk26T4SrAuPo= X-Received: by 2002:a17:907:1624:b0:973:dd5b:4072 with SMTP id hb36-20020a170907162400b00973dd5b4072mr5228248ejc.53.1685541568873; Wed, 31 May 2023 06:59:28 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6YA3vUQrzRmUww9U1hLq1c8lsXgIVyFbRK9gNaLerkSDpAH1Z6jr1KK44pKDbMaXNqCzbOzA== X-Received: by 2002:a17:907:1624:b0:973:dd5b:4072 with SMTP id hb36-20020a170907162400b00973dd5b4072mr5228230ejc.53.1685541568526; Wed, 31 May 2023 06:59:28 -0700 (PDT) Received: from [192.168.42.222] (194-45-78-10.static.kviknet.net. [194.45.78.10]) by smtp.gmail.com with ESMTPSA id r8-20020a170906c28800b00969e9fef151sm9116452ejz.97.2023.05.31.06.59.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 31 May 2023 06:59:27 -0700 (PDT) From: Jesper Dangaard Brouer X-Google-Original-From: Jesper Dangaard Brouer Message-ID: Date: Wed, 31 May 2023 15:59:26 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Cc: brouer@redhat.com, Christoph Lameter , Andrew Morton , Mel Gorman , Joonsoo Kim , penberg@kernel.org, Jakub Kicinski , "David S. Miller" , edumazet@google.com, pabeni@redhat.com, Matthew Wilcox , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , David Sterba Subject: Re: [PATCH RFC] mm+net: allow to set kmem_cache create flag for SLAB_NEVER_MERGE To: Vlastimil Babka , netdev@vger.kernel.org, linux-mm@kvack.org References: <167396280045.539803.7540459812377220500.stgit@firesoul> <81597717-0fed-5fd0-37d0-857d976b9d40@suse.cz> In-Reply-To: <81597717-0fed-5fd0-37d0-857d976b9d40@suse.cz> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: C1A41180002 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 39pdtbhqxsymmpg3rfer4nhjsgmf5tep X-HE-Tag: 1685541573-284222 X-HE-Meta: U2FsdGVkX1+Yi32WwUMrAkkLl4bxQTdVbu1IEQPETg/k3Nd/bbb5RSwHUs/LSWFuTDbUPhOrFJMeyNg6tqvYyoDAM4T51c2AKJ/98H2VzuUeRouB6hQQfitnjLxYNmh7H2Bw5n+VHCJk9tmQcOyjsnLoMXcxPZxQKW1AMxKRj3Qd473sFIyNg3CrcfaqMHnKLeiymWzY013I3erPS1gOpJe4omVI2y5IM56Ns6AR8SY5I17/6P/HjfXKlCjR1epQbe9zXSTHNo/wmJpz1A5m4sv0k9gcs78oVByrUFkKPkDJMhSnpuyQac5gzib7BkNo7Fpl3XL1KDQiFaVQnXsexaSC76nHMxWjGk6+dh/gY3CCHBcF88ieOE5QGuDIdLFyCxDUaJkkS/rAEwQ5S/OWNLxvHEGpHZcLGDLY+QF9OgsujMYPsZZY0HxKNFtbNbkZvHfqkHqmQdBMPpX0aSYFdqVdJzxmTH+hqSVQUnkeBhnIrOJOKaEFjU98TJKDXWBihf5cLaHHp7VenZC/R9t9h8HqtISMDMLrG9WIGT7KrAkDKjhmPpJMRDiLGrQKCX3YRFak5I4XvSbwjQVJ5pnseCBy+0celLJ/S5LP39UB+meHWlRE+x65XVTfEZo0iLmmGlsc7m0hnSByA8YAJLT1jsnwV92RG4z7bVjggR0+YjssqhVti6jX64aZMIX919EeKGGpPX/retaSgXRybXZ2BFTNaePMJEF5YY18SXdxqcAuo7+smDJEWGZMj40rB4Anjszep8Ww3DLbL2s4uGgtoRD86988mlkmgbGxB0a7bQftYtVUVjy+KqnmPFeUFus1ZLS13Adlse7ZuCR0lefRhVW1tdJzlss4hqzNRSrv1i0HNewHdjPnZCyXwBSaSVQCAGBURd5Nvc9k8/a7PWq10obKN9cghHnNNfMh1KooyT1TssBLHoPmBR6pOzgii0XTVzsB34mQ7C8xMpmgB+t 7Lxc6Bx9 HiVawevGc5uZdiNzkjh260AjfiVuGdB4mELEEKhNJyuzPoWbt6Ic2NE/z9xOjLeQGNPk6e5R079ewiJ7FpZ2owFEI3bmtnLo2HEVwneS/QaTmHxkMf3BOVM0IQygwBHdGSEFcgcJo6b07eoT9kGVRdMMSLcl4N8lcHhy2SrhWyBnLmqBzrZvcJwFVPkQV/cdP+/fEuTmlgZpftgNTSon0Vy8w+N2Yqxhgfp/1naGizVjg2ovZxGBwl8UtjNHUqziyHYwSHP0vnk6V+y/Bwii1dTvcgYf5lbJSxhSY4v53/C+NZmVb0ObUes/Z+OV/3Zri8h3y4je6tCO2hPKKk1JYayD0rmetbm9ffkUe/alJquF6zsecgIiVkmShlVuPCpvaEN9SGvJygb0TNQOOMc1QJosCRqggAq1uZZ85oLueb4ttPJSZreRt8ZoJvkb0Dcxgz5wmU0L5QmPcQPMEK8uOW8yqi5u8UY2ahleiek3qcmJwfHwiO6Geb0OUDGYfGKxXTo9BpirYHqZ9AjkORMnU2ODRjfF1xo4rc+WfADhErfk2fhffkNlzTHfZ8XstfEfmhordABiZGqFagPR/8e4fXL8rLgGj4lq6I3/vJM5SLAFmLNJnGIMaIbq9fQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 31/05/2023 14.03, Vlastimil Babka wrote: > On 1/17/23 14:40, Jesper Dangaard Brouer wrote: >> Allow API users of kmem_cache_create to specify that they don't want >> any slab merge or aliasing (with similar sized objects). Use this in >> network stack and kfence_test. >> >> The SKB (sk_buff) kmem_cache slab is critical for network performance. >> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain >> performance by amortising the alloc/free cost. >> >> For the bulk API to perform efficiently the slub fragmentation need to >> be low. Especially for the SLUB allocator, the efficiency of bulk free >> API depend on objects belonging to the same slab (page). >> >> When running different network performance microbenchmarks, I started >> to notice that performance was reduced (slightly) when machines had >> longer uptimes. I believe the cause was 'skbuff_head_cache' got >> aliased/merged into the general slub for 256 bytes sized objects (with >> my kernel config, without CONFIG_HARDENED_USERCOPY). >> >> For SKB kmem_cache network stack have reasons for not merging, but it >> varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). >> We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache. >> >> Signed-off-by: Jesper Dangaard Brouer > > Since this idea was revived by David [1], and neither patch worked as is, > but yours was more complete and first, I have fixed it up as below. The > skbuff part itself will be best submitted separately afterwards so we don't > get conflicts between trees etc. Comments? > Thanks for following up on this! :-) I like the adjustments, ACKed below. I'm okay with submitting changes to net/core/skbuff.c separately. > ----8<---- > From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001 > From: Jesper Dangaard Brouer > Date: Tue, 17 Jan 2023 14:40:00 +0100 > Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE > > Allow API users of kmem_cache_create to specify that they don't want > any slab merge or aliasing (with similar sized objects). Use this in > kfence_test. > > The SKB (sk_buff) kmem_cache slab is critical for network performance. > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain > performance by amortising the alloc/free cost. > > For the bulk API to perform efficiently the slub fragmentation need to > be low. Especially for the SLUB allocator, the efficiency of bulk free > API depend on objects belonging to the same slab (page). > > When running different network performance microbenchmarks, I started > to notice that performance was reduced (slightly) when machines had > longer uptimes. I believe the cause was 'skbuff_head_cache' got > aliased/merged into the general slub for 256 bytes sized objects (with > my kernel config, without CONFIG_HARDENED_USERCOPY). > > For SKB kmem_cache network stack have reasons for not merging, but it > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY). > We want to explicitly set SLAB_NO_MERGE for this kmem_cache. > > Another use case for the flag has been described by David Sterba [1]: > >> This can be used for more fine grained control over the caches or for >> debugging builds where separate slabs can verify that no objects leak. > >> The slab_nomerge boot option is too coarse and would need to be >> enabled on all testing hosts. There are some other ways how to disable >> merging, e.g. a slab constructor but this disables poisoning besides >> that it adds additional overhead. Other flags are internal and may >> have other semantics. > >> A concrete example what motivates the flag. During 'btrfs balance' >> slab top reported huge increase in caches like > >> 1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt >> 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace >> 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot > >> which was confusing and that it's because of slab merging was not the >> first idea. After rebooting with slab_nomerge all the caches were >> from btrfs_ namespace as expected. > > [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/ > > [ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the > one proposed by David so it does not collide with internal SLAB/SLUB > flags, write a comment for the flag, expand changelog, drop the skbuff > part to be handled spearately ] > > Reported-by: David Sterba > Signed-off-by: Jesper Dangaard Brouer > Signed-off-by: Vlastimil Babka Acked-by: Jesper Dangaard Brouer --- > include/linux/slab.h | 12 ++++++++++++ > mm/kfence/kfence_test.c | 7 +++---- > mm/slab.h | 5 +++-- > mm/slab_common.c | 2 +- > 4 files changed, 19 insertions(+), 7 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 6b3e155b70bf..72bc906d8bc7 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -106,6 +106,18 @@ > /* Avoid kmemleak tracing */ > #define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U) > > +/* > + * Prevent merging with compatible kmem caches. This flag should be used > + * cautiously. Valid use cases: > + * > + * - caches created for self-tests (e.g. kunit) > + * - general caches created and used by a subsystem, only when a > + * (subsystem-specific) debug option is enabled > + * - performance critical caches, should be very rare and consulted with slab > + * maintainers, and not used together with CONFIG_SLUB_TINY > + */ > +#define SLAB_NO_MERGE ((slab_flags_t __force)0x01000000U) > + > /* Fault injection mark */ > #ifdef CONFIG_FAILSLAB > # define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U) > diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c > index 6aee19a79236..9e008a336d9f 100644 > --- a/mm/kfence/kfence_test.c > +++ b/mm/kfence/kfence_test.c > @@ -191,11 +191,10 @@ static size_t setup_test_cache(struct kunit *test, size_t size, slab_flags_t fla > kunit_info(test, "%s: size=%zu, ctor=%ps\n", __func__, size, ctor); > > /* > - * Use SLAB_NOLEAKTRACE to prevent merging with existing caches. Any > - * other flag in SLAB_NEVER_MERGE also works. Use SLAB_ACCOUNT to > - * allocate via memcg, if enabled. > + * Use SLAB_NO_MERGE to prevent merging with existing caches. > + * Use SLAB_ACCOUNT to allocate via memcg, if enabled. > */ > - flags |= SLAB_NOLEAKTRACE | SLAB_ACCOUNT; > + flags |= SLAB_NO_MERGE | SLAB_ACCOUNT; > test_cache = kmem_cache_create("test", size, 1, flags, ctor); > KUNIT_ASSERT_TRUE_MSG(test, test_cache, "could not create cache"); > > diff --git a/mm/slab.h b/mm/slab.h > index f01ac256a8f5..9005ddc51cf8 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -294,11 +294,11 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s) > #if defined(CONFIG_SLAB) > #define SLAB_CACHE_FLAGS (SLAB_MEM_SPREAD | SLAB_NOLEAKTRACE | \ > SLAB_RECLAIM_ACCOUNT | SLAB_TEMPORARY | \ > - SLAB_ACCOUNT) > + SLAB_ACCOUNT | SLAB_NO_MERGE) > #elif defined(CONFIG_SLUB) > #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \ > SLAB_TEMPORARY | SLAB_ACCOUNT | \ > - SLAB_NO_USER_FLAGS | SLAB_KMALLOC) > + SLAB_NO_USER_FLAGS | SLAB_KMALLOC | SLAB_NO_MERGE) > #else > #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE) > #endif > @@ -319,6 +319,7 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s) > SLAB_TEMPORARY | \ > SLAB_ACCOUNT | \ > SLAB_KMALLOC | \ > + SLAB_NO_MERGE | \ > SLAB_NO_USER_FLAGS) > > bool __kmem_cache_empty(struct kmem_cache *); > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 607249785c07..0e0a617eae7d 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -47,7 +47,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work, > */ > #define SLAB_NEVER_MERGE (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER | \ > SLAB_TRACE | SLAB_TYPESAFE_BY_RCU | SLAB_NOLEAKTRACE | \ > - SLAB_FAILSLAB | kasan_never_merge()) > + SLAB_FAILSLAB | SLAB_NO_MERGE | kasan_never_merge()) > > #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ > SLAB_CACHE_DMA32 | SLAB_ACCOUNT)