From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75222D4921C for ; Mon, 18 Nov 2024 13:14:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DA7876B00B7; Mon, 18 Nov 2024 08:14:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D57886B00B8; Mon, 18 Nov 2024 08:14:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C1FAB6B00B9; Mon, 18 Nov 2024 08:14:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9FE236B00B7 for ; Mon, 18 Nov 2024 08:14:06 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4DB6AC014E for ; Mon, 18 Nov 2024 13:14:06 +0000 (UTC) X-FDA: 82799255448.20.E85AAED Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf22.hostedemail.com (Postfix) with ESMTP id 2EB9FC0006 for ; Mon, 18 Nov 2024 13:13:02 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dKA4ngKd; spf=pass (imf22.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731935444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=h8vUOATTSQCg4EagntX8WxKpBYFrnGu2LTsiskBXNkU=; b=BndkxHTk5kQm5owPzmjk7d7x4qY3a3AgQ8SBsJiUkYBWrnfPgxAwKPoycAsICXfZpZDq3u yf3aCiuwGPB+JXTK+269odom37QnkOWZYyN70Eo2IK20ZPMyKn4rs3dy+SLrZnQU4lVzB8 k0bLy8drvyCuAq/YP8ZmnQIZtRpjlh0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dKA4ngKd; spf=pass (imf22.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731935444; a=rsa-sha256; cv=none; b=aOcvA5ZUs6IuZkg3raYJ5Lj3Q6zBWb71DmrPaOo8ncAqTZ+NAXB7m2zjz1xfIarFOhyqaE buRecAj2IJdKgX2C+7Sbn+GoT2VVa7hADxyib/576ManD7yi6VywGvekGR3iNsS8SzKspX RmigeiGyOkzpDFc4s0D7g4YjBN0ovHk= Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2ff550d37a6so42002871fa.0 for ; Mon, 18 Nov 2024 05:14:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731935642; x=1732540442; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=h8vUOATTSQCg4EagntX8WxKpBYFrnGu2LTsiskBXNkU=; b=dKA4ngKdsiq2M+1KCLPVepGaxnCY25SpgcN8Z3h4eLMcTQ3aLclff31NeafDHWRZHR KU/Hj97pWrzKAbV2KWmPRng/TXmISCuTWTBjvx8ETeGqleV4Uk+eoBTamj42zyvzEYK8 GaLvoYyahYYwHu8V2BVnSNro6tPBfuDnegr8fnJdGaUvo0f3eR2vbWWVMjZ+O5+zjB6d t/TGNB4gY3eZBqMyllPT/W7NTW1S4aLtCRkbEfyRV5RvZaZQqLVfxgweuIs6d5wkLt2Y mXMbR2cp7aIBKrZi7QKufqh0wlVBNW986Rl4ERRA2nJSm+0vga2csYkQB/DZ9F2498bQ Ia3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731935642; x=1732540442; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h8vUOATTSQCg4EagntX8WxKpBYFrnGu2LTsiskBXNkU=; b=RO9URkv2EDYQkdodpn+/izXBIoK7dInavhQS1yLYXnWHxVBS576QRz8x//nHepLSES RZQyx0N3W/6qljy2Ib6Ipx4tQBm5zHt6QhLeRLYqPqQLZgeppuF0tpMoELkB4CqZ7VM1 MDbib6Pg14/us79rh81dyPdXoyedrbhV5eKIa1EK9fP5TCC3QAye6lFZV7PFiVIW7+1+ XjADyL1xdWwNOjPCnbs8aTv8Fmjz5w0DrM4yOlgeksCZsZQt9rkaG9dtmK04spina12H Nk46G+jXKRKb7QSso9gVdl/NPJkcv6BiHoJg5LPQlkyT4wyMUy7/FuN31lxMlizFekjt +0bw== X-Forwarded-Encrypted: i=1; AJvYcCUuis4BUAIH3/QVz1mmI6e3RSisLmyn1wK8RWBKFAaKxQ0Wp8kyglhg6yBG2hwEXK/Fsi1qNwbolg==@kvack.org X-Gm-Message-State: AOJu0YyMaa3wqy/hyfCG0n7IbYFwgofK1G2lUsl0FUVzHdOeStUr6ivJ XtIXZS9BHtZfx+rMwPLYXs8aEunZnNPlRIEDYKptQzdtdo5G0EvCWelR7Y+Cl5LAuKZWA3utqAc 0HZHcH5hhvWr+HnElDvSVAEFrPG8= X-Google-Smtp-Source: AGHT+IGRXbCOadoDXFTX1FRfSdaATe1H9mmUBJ3su9Mywu8Hgpb/IM2wLQgbCq5tcHKFeKVYqh8N2rwn7WnmYmGg1mI= X-Received: by 2002:a2e:a9a4:0:b0:2fb:5014:c941 with SMTP id 38308e7fff4ca-2ff60917f01mr56388391fa.14.1731935642054; Mon, 18 Nov 2024 05:14:02 -0800 (PST) MIME-Version: 1.0 References: <20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz> <20241112-slub-percpu-caches-v1-6-ddc0bdc27e05@suse.cz> In-Reply-To: <20241112-slub-percpu-caches-v1-6-ddc0bdc27e05@suse.cz> From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Mon, 18 Nov 2024 22:13:49 +0900 Message-ID: Subject: Re: [PATCH RFC 6/6] mm, slub: sheaf prefilling for guaranteed allocations To: Vlastimil Babka Cc: Suren Baghdasaryan , "Liam R. Howlett" , Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim , Roman Gushchin , "Paul E. McKenney" , Lorenzo Stoakes , Matthew Wilcox , Boqun Feng , Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2EB9FC0006 X-Stat-Signature: hfkfuidytdxgak5ay13c6g5ajwynx5h7 X-Rspam-User: X-HE-Tag: 1731935582-755201 X-HE-Meta: U2FsdGVkX1+i4pm4HeVy8yVZC35uw1JECMxkHGGhYzP2KVm18XaSKYByEiNuiIM16oTrHMSjIL3LbAw/L5z0/N6Wtj+yrKuudyvpLJBoNAGro3DWREIxP6hLL2G6fWHAMlLG/FNLSqatJCaI2PEIlhk7Q0RA3iDoL0LdzVCjRcGOC6FeL1uPizFH8AT5PjKgC8+k93NsUtB13I0eNstjH95NcXKuB2wv+x2BrF2z8ZZwb2PKuiB5nxmHg1fdut0P4lzOpI5Lnhlkvr8iuFNwvDV69iPKYIAC7NjKBl64AEZE3yf4nUf8wV58WFuMWKyVrhgggbqc1ahcEXPRJ/MHKXuzLNnxrffV0r8+/6djtNiBCCoD/xMwy7znhNBD5pWu4GYx+nmWvw4V+a2ygbjsyNgceCN7sVJjoZ0I02MMZRR+Fo+6F9tzed3Vl6Dx8+N7uuiDf8VvrKQH+rir00AEysfPutoWPaxl7oT0T8C6aqnGZj/IHzK/jZRpB8MUUylGX3XcUE1ojwnDQ+fS4V8h/LDjHLMUxtCGRDe4bjJDlm2A8QuaXyTmYJQOHKVY2ww8NQ+7ZAW5PgnzXAjCAXhmfKjzdR4cH6TE9coYWp+U0g5mpmPOqQZxbsqRxBdaEHBDNw2SEGRl9XAAI9HDPtQuVS9QOOBFJiQxyP7CDld+gomPIt2X3k3ODpWZjijxXbzk+WNyAmiHU7xDBnXZgeE9sO0p5LmYyxsBj4/EEp3zvptDzF11cDIfgwfa9vqAsF014Z6Tm+DmD0ruyHVEIrSjx9Ew8Zcb5BtTBfFbl2cWSzERBsdLjGWumQkn3xCr2dXzogSWrZQjg66V093Ks1a5HWjwczEIRQoSTCFwvsiuTq8gU3Ci3b+ShzJDFZ8hde8ZIUlcKWLO1kILxSbD742kGNFBrqsHqkCeT+Cv+S5OnoVkU7Vdi5/g3G6M1Ipbn+ZPAC6nyFB9+Vh54xkyzvF SLrkhOIn pzpvwc0yaZv42la/F7x4qBWDbxdH4zqGY2yMyfaOUzQ90MH1sQASe3YGcs11hoBlLTf5Ijato3GbTVJFWSsyN/SS6z3J5BiyCuidTULVFvh8MhopMv/yNST5QhChuyGaaTv5DOgnF6gz5I2RmFc/fAscKIr6GE9fjfoYkT7fd7PJyDQvAkZAkEv/eWYG48+vcsjOCdr2mxITmishy6/0Zy9QK1o+ELCLsJf/y5A5fAfmRZFjDshjkdceiFz/TrJC9RTykM0FoJu4U/gQdBEXfFfPW7UTjkIincevv93BQ5Ui9XWiIBiI1RKo20rnTePylukzEhxQmqKXGEUF8YLttOKrw9CdqUZRIDGRo8Y26shiUK+iOHJixz05FAhuvvkhpvpqXD5Ugj3415iXUZLGuKM1rKRt3m4dXis+JajnkpQHmH+Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 13, 2024 at 1:39=E2=80=AFAM Vlastimil Babka wr= ote: > > Add three functions for efficient guaranteed allocations in a critical > section (that cannot sleep) when the exact number of allocations is not > known beforehand, but an upper limit can be calculated. > > kmem_cache_prefill_sheaf() returns a sheaf containing at least given > number of objects. > > kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf > and is guaranteed not to fail until depleted. > > kmem_cache_return_sheaf() is for giving the sheaf back to the slab > allocator after the critical section. This will also attempt to refill > it to cache's sheaf capacity for better efficiency of sheaves handling, > but it's not stricly necessary to succeed. > > TODO: the current implementation is limited to cache's sheaf_capacity > > Signed-off-by: Vlastimil Babka > --- > include/linux/slab.h | 11 ++++ > mm/slub.c | 149 +++++++++++++++++++++++++++++++++++++++++++++= ++++++ > 2 files changed, 160 insertions(+) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 23904321992ad2eeb9389d0883cf4d5d5d71d896..a87dc3c6392fe235de2eabe17= 92df86d40c3bbf9 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -820,6 +820,17 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache= *s, gfp_t flags, > int node) __assume_slab_alignment __ma= lloc; > #define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node= _noprof(__VA_ARGS__)) > > +struct slab_sheaf * > +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int c= ount); > + > +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, > + struct slab_sheaf *sheaf); > + > +void *kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *cachep, gfp_= t gfp, > + struct slab_sheaf *sheaf) __assume_slab_alignment= __malloc; > +#define kmem_cache_alloc_from_sheaf(...) \ > + alloc_hooks(kmem_cache_alloc_from_sheaf_noprof(__= VA_ARGS__)) > + > /* > * These macros allow declaring a kmem_buckets * parameter alongside siz= e, which > * can be compiled out with CONFIG_SLAB_BUCKETS=3Dn so that a large numb= er of call > diff --git a/mm/slub.c b/mm/slub.c > index 1900afa6153ca6d88f9df7db3ce84d98629489e7..a0e2cb7dfb5173f39f36bea1e= b9760c3c1b99dd7 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -444,6 +444,7 @@ struct slab_sheaf { > union { > struct rcu_head rcu_head; > struct list_head barn_list; > + bool oversize; > }; > struct kmem_cache *cache; > unsigned int size; > @@ -2819,6 +2820,30 @@ static int barn_put_full_sheaf(struct node_barn *b= arn, struct slab_sheaf *sheaf, > return ret; > } > > +static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct node_barn = *barn) > +{ > + struct slab_sheaf *sheaf =3D NULL; > + unsigned long flags; > + > + spin_lock_irqsave(&barn->lock, flags); > + > + if (barn->nr_empty) { > + sheaf =3D list_first_entry(&barn->sheaves_empty, > + struct slab_sheaf, barn_list); > + list_del(&sheaf->barn_list); > + barn->nr_empty--; > + } else if (barn->nr_full) { > + sheaf =3D list_first_entry(&barn->sheaves_full, struct sl= ab_sheaf, > + barn_list); > + list_del(&sheaf->barn_list); > + barn->nr_full--; > + } > + > + spin_unlock_irqrestore(&barn->lock, flags); > + > + return sheaf; > +} > + > /* > * If a full sheaf is available, return it and put the supplied empty on= e to > * barn. We ignore the limit on empty sheaves as the number of sheaves d= oesn't > @@ -4893,6 +4918,130 @@ void *kmem_cache_alloc_node_noprof(struct kmem_ca= che *s, gfp_t gfpflags, int nod > } > EXPORT_SYMBOL(kmem_cache_alloc_node_noprof); > > + > +/* > + * returns a sheaf that has least the given count of objects > + * when prefilling is needed, do so with given gfp flags > + * > + * return NULL if prefilling failed, or when the requested count is > + * above cache's sheaf_capacity (TODO: lift this limitation) > + */ > +struct slab_sheaf * > +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int c= ount) > +{ > + struct slub_percpu_sheaves *pcs; > + struct slab_sheaf *sheaf =3D NULL; > + > + //TODO: handle via oversize sheaf > + if (count > s->sheaf_capacity) > + return NULL; > + > + pcs =3D cpu_sheaves_lock(s->cpu_sheaves); > + > + if (pcs->spare && pcs->spare->size > 0) { > + sheaf =3D pcs->spare; > + pcs->spare =3D NULL; > + } > + > + if (!sheaf) > + sheaf =3D barn_get_full_or_empty_sheaf(pcs->barn); > + > + cpu_sheaves_unlock(s->cpu_sheaves); > + > + if (!sheaf) > + sheaf =3D alloc_empty_sheaf(s, gfp); > + > + if (sheaf && sheaf->size < count) { > + if (refill_sheaf(s, sheaf, gfp)) { > + sheaf_flush(s, sheaf); > + free_empty_sheaf(s, sheaf); > + sheaf =3D NULL; > + } > + } > + > + return sheaf; > +} > + > +/* > + * Use this to return a sheaf obtained by kmem_cache_prefill_sheaf() > + * It tries to refill the sheaf back to the cache's sheaf_capacity > + * to avoid handling partially full sheaves. > + * > + * If the refill fails because gfp is e.g. GFP_NOWAIT, the sheaf is > + * instead dissolved > + */ > +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp, > + struct slab_sheaf *sheaf) > +{ > + struct slub_percpu_sheaves *pcs; > + bool refill =3D false; > + struct node_barn *barn; > + > + //TODO: handle oversize sheaf > + > + pcs =3D cpu_sheaves_lock(s->cpu_sheaves); > + > + if (!pcs->spare) { > + pcs->spare =3D sheaf; > + sheaf =3D NULL; > + } > + > + /* racy check */ > + if (!sheaf && pcs->barn->nr_full >=3D MAX_FULL_SHEAVES) { > + barn =3D pcs->barn; > + refill =3D true; > + } > + > + cpu_sheaves_unlock(s->cpu_sheaves); > + > + if (!sheaf) > + return; > + > + /* > + * if the barn is full of full sheaves or we fail to refill the s= heaf, > + * simply flush and free it > + */ > + if (!refill || refill_sheaf(s, sheaf, gfp)) { > + sheaf_flush(s, sheaf); > + free_empty_sheaf(s, sheaf); > + return; > + } > + > + /* we racily determined the sheaf would fit, so now force it */ > + barn_put_full_sheaf(barn, sheaf, true); > +} > + > +/* > + * Allocate from a sheaf obtained by kmem_cache_prefill_sheaf() > + * > + * Guaranteed not to fail as many allocations as was the requested count= . > + * After the sheaf is emptied, it fails - no fallback to the slab cache = itself. > + * > + * The gfp parameter is meant only to specify __GFP_ZERO or __GFP_ACCOUN= T > + * memcg charging is forced over limit if necessary, to avoid failure. > + */ > +void * > +kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp, > + struct slab_sheaf *sheaf) > +{ > + void *ret =3D NULL; > + bool init; > + > + if (sheaf->size =3D=3D 0) > + goto out; > + > + ret =3D sheaf->objects[--sheaf->size]; > + > + init =3D slab_want_init_on_alloc(gfp, s); > + > + /* add __GFP_NOFAIL to force successful memcg charging */ > + slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, = s->object_size); Maybe I'm missing something, but how can this be used for non-sleepable con= texts if __GFP_NOFAIL is used? I think we have to charge them when the sheaf is returned via kmem_cache_prefill_sheaf(), just like users of bulk alloc/free? Best, Hyeonggon > +out: > + trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE); > + > + return ret; > +} > + > /* > * To avoid unnecessary overhead, we pass through large allocation reque= sts > * directly to the page allocator. We use __GFP_COMP, because we will ne= ed to > > -- > 2.47.0 >