From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 75222D4921C
	for <linux-mm@archiver.kernel.org>; Mon, 18 Nov 2024 13:14:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DA7876B00B7; Mon, 18 Nov 2024 08:14:06 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D57886B00B8; Mon, 18 Nov 2024 08:14:06 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C1FAB6B00B9; Mon, 18 Nov 2024 08:14:06 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 9FE236B00B7
	for <linux-mm@kvack.org>; Mon, 18 Nov 2024 08:14:06 -0500 (EST)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 4DB6AC014E
	for <linux-mm@kvack.org>; Mon, 18 Nov 2024 13:14:06 +0000 (UTC)
X-FDA: 82799255448.20.E85AAED
Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177])
	by imf22.hostedemail.com (Postfix) with ESMTP id 2EB9FC0006
	for <linux-mm@kvack.org>; Mon, 18 Nov 2024 13:13:02 +0000 (UTC)
Authentication-Results: imf22.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=dKA4ngKd;
	spf=pass (imf22.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1731935444;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=h8vUOATTSQCg4EagntX8WxKpBYFrnGu2LTsiskBXNkU=;
	b=BndkxHTk5kQm5owPzmjk7d7x4qY3a3AgQ8SBsJiUkYBWrnfPgxAwKPoycAsICXfZpZDq3u
	yf3aCiuwGPB+JXTK+269odom37QnkOWZYyN70Eo2IK20ZPMyKn4rs3dy+SLrZnQU4lVzB8
	k0bLy8drvyCuAq/YP8ZmnQIZtRpjlh0=
ARC-Authentication-Results: i=1;
	imf22.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=dKA4ngKd;
	spf=pass (imf22.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731935444; a=rsa-sha256;
	cv=none;
	b=aOcvA5ZUs6IuZkg3raYJ5Lj3Q6zBWb71DmrPaOo8ncAqTZ+NAXB7m2zjz1xfIarFOhyqaE
	buRecAj2IJdKgX2C+7Sbn+GoT2VVa7hADxyib/576ManD7yi6VywGvekGR3iNsS8SzKspX
	RmigeiGyOkzpDFc4s0D7g4YjBN0ovHk=
Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2ff550d37a6so42002871fa.0
        for <linux-mm@kvack.org>; Mon, 18 Nov 2024 05:14:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1731935642; x=1732540442; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=h8vUOATTSQCg4EagntX8WxKpBYFrnGu2LTsiskBXNkU=;
        b=dKA4ngKdsiq2M+1KCLPVepGaxnCY25SpgcN8Z3h4eLMcTQ3aLclff31NeafDHWRZHR
         KU/Hj97pWrzKAbV2KWmPRng/TXmISCuTWTBjvx8ETeGqleV4Uk+eoBTamj42zyvzEYK8
         GaLvoYyahYYwHu8V2BVnSNro6tPBfuDnegr8fnJdGaUvo0f3eR2vbWWVMjZ+O5+zjB6d
         t/TGNB4gY3eZBqMyllPT/W7NTW1S4aLtCRkbEfyRV5RvZaZQqLVfxgweuIs6d5wkLt2Y
         mXMbR2cp7aIBKrZi7QKufqh0wlVBNW986Rl4ERRA2nJSm+0vga2csYkQB/DZ9F2498bQ
         Ia3g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1731935642; x=1732540442;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=h8vUOATTSQCg4EagntX8WxKpBYFrnGu2LTsiskBXNkU=;
        b=RO9URkv2EDYQkdodpn+/izXBIoK7dInavhQS1yLYXnWHxVBS576QRz8x//nHepLSES
         RZQyx0N3W/6qljy2Ib6Ipx4tQBm5zHt6QhLeRLYqPqQLZgeppuF0tpMoELkB4CqZ7VM1
         MDbib6Pg14/us79rh81dyPdXoyedrbhV5eKIa1EK9fP5TCC3QAye6lFZV7PFiVIW7+1+
         XjADyL1xdWwNOjPCnbs8aTv8Fmjz5w0DrM4yOlgeksCZsZQt9rkaG9dtmK04spina12H
         Nk46G+jXKRKb7QSso9gVdl/NPJkcv6BiHoJg5LPQlkyT4wyMUy7/FuN31lxMlizFekjt
         +0bw==
X-Forwarded-Encrypted: i=1; AJvYcCUuis4BUAIH3/QVz1mmI6e3RSisLmyn1wK8RWBKFAaKxQ0Wp8kyglhg6yBG2hwEXK/Fsi1qNwbolg==@kvack.org
X-Gm-Message-State: AOJu0YyMaa3wqy/hyfCG0n7IbYFwgofK1G2lUsl0FUVzHdOeStUr6ivJ
	XtIXZS9BHtZfx+rMwPLYXs8aEunZnNPlRIEDYKptQzdtdo5G0EvCWelR7Y+Cl5LAuKZWA3utqAc
	0HZHcH5hhvWr+HnElDvSVAEFrPG8=
X-Google-Smtp-Source: AGHT+IGRXbCOadoDXFTX1FRfSdaATe1H9mmUBJ3su9Mywu8Hgpb/IM2wLQgbCq5tcHKFeKVYqh8N2rwn7WnmYmGg1mI=
X-Received: by 2002:a2e:a9a4:0:b0:2fb:5014:c941 with SMTP id
 38308e7fff4ca-2ff60917f01mr56388391fa.14.1731935642054; Mon, 18 Nov 2024
 05:14:02 -0800 (PST)
MIME-Version: 1.0
References: <20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz> <20241112-slub-percpu-caches-v1-6-ddc0bdc27e05@suse.cz>
In-Reply-To: <20241112-slub-percpu-caches-v1-6-ddc0bdc27e05@suse.cz>
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Date: Mon, 18 Nov 2024 22:13:49 +0900
Message-ID: <CAB=+i9QoavVWZ6HxiOb8ypqov0rM+HAK4ge7nKHdQRPUaNPmkw@mail.gmail.com>
Subject: Re: [PATCH RFC 6/6] mm, slub: sheaf prefilling for guaranteed allocations
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>, "Liam R. Howlett" <Liam.Howlett@oracle.com>, 
	Christoph Lameter <cl@linux.com>, David Rientjes <rientjes@google.com>, Pekka Enberg <penberg@kernel.org>, 
	Joonsoo Kim <iamjoonsoo.kim@lge.com>, Roman Gushchin <roman.gushchin@linux.dev>, 
	"Paul E. McKenney" <paulmck@kernel.org>, Lorenzo Stoakes <lorenzo.stoakes@oracle.com>, 
	Matthew Wilcox <willy@infradead.org>, Boqun Feng <boqun.feng@gmail.com>, 
	Uladzislau Rezki <urezki@gmail.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	rcu@vger.kernel.org, maple-tree@lists.infradead.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 2EB9FC0006
X-Stat-Signature: hfkfuidytdxgak5ay13c6g5ajwynx5h7
X-Rspam-User: 
X-HE-Tag: 1731935582-755201
X-HE-Meta: U2FsdGVkX1+i4pm4HeVy8yVZC35uw1JECMxkHGGhYzP2KVm18XaSKYByEiNuiIM16oTrHMSjIL3LbAw/L5z0/N6Wtj+yrKuudyvpLJBoNAGro3DWREIxP6hLL2G6fWHAMlLG/FNLSqatJCaI2PEIlhk7Q0RA3iDoL0LdzVCjRcGOC6FeL1uPizFH8AT5PjKgC8+k93NsUtB13I0eNstjH95NcXKuB2wv+x2BrF2z8ZZwb2PKuiB5nxmHg1fdut0P4lzOpI5Lnhlkvr8iuFNwvDV69iPKYIAC7NjKBl64AEZE3yf4nUf8wV58WFuMWKyVrhgggbqc1ahcEXPRJ/MHKXuzLNnxrffV0r8+/6djtNiBCCoD/xMwy7znhNBD5pWu4GYx+nmWvw4V+a2ygbjsyNgceCN7sVJjoZ0I02MMZRR+Fo+6F9tzed3Vl6Dx8+N7uuiDf8VvrKQH+rir00AEysfPutoWPaxl7oT0T8C6aqnGZj/IHzK/jZRpB8MUUylGX3XcUE1ojwnDQ+fS4V8h/LDjHLMUxtCGRDe4bjJDlm2A8QuaXyTmYJQOHKVY2ww8NQ+7ZAW5PgnzXAjCAXhmfKjzdR4cH6TE9coYWp+U0g5mpmPOqQZxbsqRxBdaEHBDNw2SEGRl9XAAI9HDPtQuVS9QOOBFJiQxyP7CDld+gomPIt2X3k3ODpWZjijxXbzk+WNyAmiHU7xDBnXZgeE9sO0p5LmYyxsBj4/EEp3zvptDzF11cDIfgwfa9vqAsF014Z6Tm+DmD0ruyHVEIrSjx9Ew8Zcb5BtTBfFbl2cWSzERBsdLjGWumQkn3xCr2dXzogSWrZQjg66V093Ks1a5HWjwczEIRQoSTCFwvsiuTq8gU3Ci3b+ShzJDFZ8hde8ZIUlcKWLO1kILxSbD742kGNFBrqsHqkCeT+Cv+S5OnoVkU7Vdi5/g3G6M1Ipbn+ZPAC6nyFB9+Vh54xkyzvF
 SLrkhOIn
 pzpvwc0yaZv42la/F7x4qBWDbxdH4zqGY2yMyfaOUzQ90MH1sQASe3YGcs11hoBlLTf5Ijato3GbTVJFWSsyN/SS6z3J5BiyCuidTULVFvh8MhopMv/yNST5QhChuyGaaTv5DOgnF6gz5I2RmFc/fAscKIr6GE9fjfoYkT7fd7PJyDQvAkZAkEv/eWYG48+vcsjOCdr2mxITmishy6/0Zy9QK1o+ELCLsJf/y5A5fAfmRZFjDshjkdceiFz/TrJC9RTykM0FoJu4U/gQdBEXfFfPW7UTjkIincevv93BQ5Ui9XWiIBiI1RKo20rnTePylukzEhxQmqKXGEUF8YLttOKrw9CdqUZRIDGRo8Y26shiUK+iOHJixz05FAhuvvkhpvpqXD5Ugj3415iXUZLGuKM1rKRt3m4dXis+JajnkpQHmH+Q=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Nov 13, 2024 at 1:39=E2=80=AFAM Vlastimil Babka <vbabka@suse.cz> wr=
ote:
>
> Add three functions for efficient guaranteed allocations in a critical
> section (that cannot sleep) when the exact number of allocations is not
> known beforehand, but an upper limit can be calculated.
>
> kmem_cache_prefill_sheaf() returns a sheaf containing at least given
> number of objects.
>
> kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf
> and is guaranteed not to fail until depleted.
>
> kmem_cache_return_sheaf() is for giving the sheaf back to the slab
> allocator after the critical section. This will also attempt to refill
> it to cache's sheaf capacity for better efficiency of sheaves handling,
> but it's not stricly necessary to succeed.
>
> TODO: the current implementation is limited to cache's sheaf_capacity
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/slab.h |  11 ++++
>  mm/slub.c            | 149 +++++++++++++++++++++++++++++++++++++++++++++=
++++++
>  2 files changed, 160 insertions(+)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 23904321992ad2eeb9389d0883cf4d5d5d71d896..a87dc3c6392fe235de2eabe17=
92df86d40c3bbf9 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -820,6 +820,17 @@ void *kmem_cache_alloc_node_noprof(struct kmem_cache=
 *s, gfp_t flags,
>                                    int node) __assume_slab_alignment __ma=
lloc;
>  #define kmem_cache_alloc_node(...)     alloc_hooks(kmem_cache_alloc_node=
_noprof(__VA_ARGS__))
>
> +struct slab_sheaf *
> +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int c=
ount);
> +
> +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
> +                                      struct slab_sheaf *sheaf);
> +
> +void *kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *cachep, gfp_=
t gfp,
> +                       struct slab_sheaf *sheaf) __assume_slab_alignment=
 __malloc;
> +#define kmem_cache_alloc_from_sheaf(...)       \
> +                       alloc_hooks(kmem_cache_alloc_from_sheaf_noprof(__=
VA_ARGS__))
> +
>  /*
>   * These macros allow declaring a kmem_buckets * parameter alongside siz=
e, which
>   * can be compiled out with CONFIG_SLAB_BUCKETS=3Dn so that a large numb=
er of call
> diff --git a/mm/slub.c b/mm/slub.c
> index 1900afa6153ca6d88f9df7db3ce84d98629489e7..a0e2cb7dfb5173f39f36bea1e=
b9760c3c1b99dd7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -444,6 +444,7 @@ struct slab_sheaf {
>         union {
>                 struct rcu_head rcu_head;
>                 struct list_head barn_list;
> +               bool oversize;
>         };
>         struct kmem_cache *cache;
>         unsigned int size;
> @@ -2819,6 +2820,30 @@ static int barn_put_full_sheaf(struct node_barn *b=
arn, struct slab_sheaf *sheaf,
>         return ret;
>  }
>
> +static struct slab_sheaf *barn_get_full_or_empty_sheaf(struct node_barn =
*barn)
> +{
> +       struct slab_sheaf *sheaf =3D NULL;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&barn->lock, flags);
> +
> +       if (barn->nr_empty) {
> +               sheaf =3D list_first_entry(&barn->sheaves_empty,
> +                                        struct slab_sheaf, barn_list);
> +               list_del(&sheaf->barn_list);
> +               barn->nr_empty--;
> +       } else if (barn->nr_full) {
> +               sheaf =3D list_first_entry(&barn->sheaves_full, struct sl=
ab_sheaf,
> +                                       barn_list);
> +               list_del(&sheaf->barn_list);
> +               barn->nr_full--;
> +       }
> +
> +       spin_unlock_irqrestore(&barn->lock, flags);
> +
> +       return sheaf;
> +}
> +
>  /*
>   * If a full sheaf is available, return it and put the supplied empty on=
e to
>   * barn. We ignore the limit on empty sheaves as the number of sheaves d=
oesn't
> @@ -4893,6 +4918,130 @@ void *kmem_cache_alloc_node_noprof(struct kmem_ca=
che *s, gfp_t gfpflags, int nod
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_node_noprof);
>
> +
> +/*
> + * returns a sheaf that has least the given count of objects
> + * when prefilling is needed, do so with given gfp flags
> + *
> + * return NULL if prefilling failed, or when the requested count is
> + * above cache's sheaf_capacity (TODO: lift this limitation)
> + */
> +struct slab_sheaf *
> +kmem_cache_prefill_sheaf(struct kmem_cache *s, gfp_t gfp, unsigned int c=
ount)
> +{
> +       struct slub_percpu_sheaves *pcs;
> +       struct slab_sheaf *sheaf =3D NULL;
> +
> +       //TODO: handle via oversize sheaf
> +       if (count > s->sheaf_capacity)
> +               return NULL;
> +
> +       pcs =3D cpu_sheaves_lock(s->cpu_sheaves);
> +
> +       if (pcs->spare && pcs->spare->size > 0) {
> +               sheaf =3D pcs->spare;
> +               pcs->spare =3D NULL;
> +       }
> +
> +       if (!sheaf)
> +               sheaf =3D barn_get_full_or_empty_sheaf(pcs->barn);
> +
> +       cpu_sheaves_unlock(s->cpu_sheaves);
> +
> +       if (!sheaf)
> +               sheaf =3D alloc_empty_sheaf(s, gfp);
> +
> +       if (sheaf && sheaf->size < count) {
> +               if (refill_sheaf(s, sheaf, gfp)) {
> +                       sheaf_flush(s, sheaf);
> +                       free_empty_sheaf(s, sheaf);
> +                       sheaf =3D NULL;
> +               }
> +       }
> +
> +       return sheaf;
> +}
> +
> +/*
> + * Use this to return a sheaf obtained by kmem_cache_prefill_sheaf()
> + * It tries to refill the sheaf back to the cache's sheaf_capacity
> + * to avoid handling partially full sheaves.
> + *
> + * If the refill fails because gfp is e.g. GFP_NOWAIT, the sheaf is
> + * instead dissolved
> + */
> +void kmem_cache_return_sheaf(struct kmem_cache *s, gfp_t gfp,
> +                            struct slab_sheaf *sheaf)
> +{
> +       struct slub_percpu_sheaves *pcs;
> +       bool refill =3D false;
> +       struct node_barn *barn;
> +
> +       //TODO: handle oversize sheaf
> +
> +       pcs =3D cpu_sheaves_lock(s->cpu_sheaves);
> +
> +       if (!pcs->spare) {
> +               pcs->spare =3D sheaf;
> +               sheaf =3D NULL;
> +       }
> +
> +       /* racy check */
> +       if (!sheaf && pcs->barn->nr_full >=3D MAX_FULL_SHEAVES) {
> +               barn =3D pcs->barn;
> +               refill =3D true;
> +       }
> +
> +       cpu_sheaves_unlock(s->cpu_sheaves);
> +
> +       if (!sheaf)
> +               return;
> +
> +       /*
> +        * if the barn is full of full sheaves or we fail to refill the s=
heaf,
> +        * simply flush and free it
> +        */
> +       if (!refill || refill_sheaf(s, sheaf, gfp)) {
> +               sheaf_flush(s, sheaf);
> +               free_empty_sheaf(s, sheaf);
> +               return;
> +       }
> +
> +       /* we racily determined the sheaf would fit, so now force it */
> +       barn_put_full_sheaf(barn, sheaf, true);
> +}
> +
> +/*
> + * Allocate from a sheaf obtained by kmem_cache_prefill_sheaf()
> + *
> + * Guaranteed not to fail as many allocations as was the requested count=
.
> + * After the sheaf is emptied, it fails - no fallback to the slab cache =
itself.
> + *
> + * The gfp parameter is meant only to specify __GFP_ZERO or __GFP_ACCOUN=
T
> + * memcg charging is forced over limit if necessary, to avoid failure.
> + */
> +void *
> +kmem_cache_alloc_from_sheaf_noprof(struct kmem_cache *s, gfp_t gfp,
> +                                  struct slab_sheaf *sheaf)
> +{
> +       void *ret =3D NULL;
> +       bool init;
> +
> +       if (sheaf->size =3D=3D 0)
> +               goto out;
> +
> +       ret =3D sheaf->objects[--sheaf->size];
> +
> +       init =3D slab_want_init_on_alloc(gfp, s);
> +
> +       /* add __GFP_NOFAIL to force successful memcg charging */
> +       slab_post_alloc_hook(s, NULL, gfp | __GFP_NOFAIL, 1, &ret, init, =
s->object_size);

Maybe I'm missing something, but how can this be used for non-sleepable con=
texts
if __GFP_NOFAIL is used? I think we have to charge them when the sheaf
is returned
via kmem_cache_prefill_sheaf(), just like users of bulk alloc/free?

Best,
Hyeonggon

> +out:
> +       trace_kmem_cache_alloc(_RET_IP_, ret, s, gfp, NUMA_NO_NODE);
> +
> +       return ret;
> +}
> +
>  /*
>   * To avoid unnecessary overhead, we pass through large allocation reque=
sts
>   * directly to the page allocator. We use __GFP_COMP, because we will ne=
ed to
>
> --
> 2.47.0
>