From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1FA64C28B2F
	for <linux-mm@archiver.kernel.org>; Wed, 12 Mar 2025 15:14:37 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id D89C9280002; Wed, 12 Mar 2025 11:14:35 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D393F280001; Wed, 12 Mar 2025 11:14:35 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id BDA42280002; Wed, 12 Mar 2025 11:14:35 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 9B611280001
	for <linux-mm@kvack.org>; Wed, 12 Mar 2025 11:14:35 -0400 (EDT)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 07FE256AF9
	for <linux-mm@kvack.org>; Wed, 12 Mar 2025 15:14:36 +0000 (UTC)
X-FDA: 83213245752.16.1F675C5
Received: from mail-wm1-f52.google.com (mail-wm1-f52.google.com [209.85.128.52])
	by imf23.hostedemail.com (Postfix) with ESMTP id ECA2F14000D
	for <linux-mm@kvack.org>; Wed, 12 Mar 2025 15:14:33 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=0htRRgh1;
	spf=pass (imf23.hostedemail.com: domain of surenb@google.com designates 209.85.128.52 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1741792474;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=5n0X3g/8kvM/frR1vvwpQrEhKVq5chumc3tFrZ1kLVc=;
	b=Qfs6FIruTywBqraptegy3qC1Tl31wq3zNmu0nD8+Vf/hv63QkRavefKaus4O7f7hyelCSY
	YLEHLNt9XQLX4POzebV0OU1TiWgzmGvZ0kvksjP0P/weTLEAI/JAmE+9g8wfmBo5O2vDTb
	wqnzwzjrdWa8J135XZ1NrT8AllTmkh4=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741792474; a=rsa-sha256;
	cv=none;
	b=IV1+GdDjjE3cQltanXXdpQFOckYpVfh7TB/Mknh47sNEpc9UhMOhwjI4JqEzbUP1sdtdn4
	I6mFkC1u+zh0GTNqGOgmggEVcEAiu7AHj6H0BtEXw9WhQORZBY5Z28ykbw035HiBFzBX0y
	hcKYK+NMYNI9A4ud+9D/JEpaSb6ojOk=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=0htRRgh1;
	spf=pass (imf23.hostedemail.com: domain of surenb@google.com designates 209.85.128.52 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
Received: by mail-wm1-f52.google.com with SMTP id 5b1f17b1804b1-43cfe808908so55405e9.0
        for <linux-mm@kvack.org>; Wed, 12 Mar 2025 08:14:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1741792472; x=1742397272; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=5n0X3g/8kvM/frR1vvwpQrEhKVq5chumc3tFrZ1kLVc=;
        b=0htRRgh1r+bP5o3XeOLWhqrlk7J/X8nTGhKqeEpvcW0qKRgscfNy+DfGhT9yIeiVcw
         DXMm/onytzvy1+End8EkMTDNf7Xn2Vt1/WMVXzNLgvVvyakAhwy9VHEKDjaRLHrGNS85
         lRt7yrgmYg9Gw8+htcWTSZHZX7PK6TPm1TDQCtP4+qzHLdXU/zWUw5CiZYhCaSZZqj6J
         5bDWecCDTfCoIpkvGeKxst/1DsWIWSUPb/mDghwyh0g3NoKZpEQ5Vi5Wki/2YGz07GKP
         XDaiEWgU9JR9e1tvjB9+pd8cVdPCo8T1BygtlPqVZmsE4yARRtu47U3nG46/7UNm5wUl
         OG8Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1741792472; x=1742397272;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=5n0X3g/8kvM/frR1vvwpQrEhKVq5chumc3tFrZ1kLVc=;
        b=xCDXc6OeN0HuxLjYFOQ+vMgN5FkJlq4MVhMY2cRYlw/M1xih43g/nfnk/yJxuRWpzX
         m/hRoG0mmk1QoJTafOZ+JBbvTC8s+fDyultCKtj6uNp6sxauA6khENUYAf3vK0mEYs5o
         av4A5NLbr4uF84cNAxckIISAFtxKPBPcHZtgq1OOPnP8FS8+x/JBCznUa+ATobD3Xosy
         SRmxie/nT3pKJQpAmiXTdQ3Rz5h3KXjL5eU5EwN+Z1dcepI/ePQoyqdTjyciTw6x+pfh
         TPmJ/bLFxvojw5z3CynfiAnKdFKabWkJH+QldqlR4dd4smiw20xDTwoOmXCOzgTo176I
         FesA==
X-Forwarded-Encrypted: i=1; AJvYcCWOFz3EXpmEJvdQf0nA3jSse+QU1K9ayzGsAvg9u5uf4q6P7ynLV9K56Hn/pFBvzEsE29Pi49zy9g==@kvack.org
X-Gm-Message-State: AOJu0YwsxaasXFoj6ClC5Jw2c7a3VL08U8bA0hSAGphL4Pl5f/Gtc0Ec
	bdLx75MocBFBIbS0CyUlJ6KqGgaY/TtHKDD/JSQMnaxWszT/dfjatxnn0dLOoEYvGzd3LCEHSob
	VrxTrH0ILQrmztqp6JtBuEkAORwQmR0feBbkg
X-Gm-Gg: ASbGncuK5JyO/N/PudIEILyG3vSn1yTV8IJIY84pnuD0s7oRmQDZG03U1kVLS4Av4Gl
	ckM/rezwXMs1jQVWQ7IwcCULIAKsX7Tsx2eyVQ4eN7d7FCWAmPb342AOv3runLFkxZh2iKoAWZ/
	IIMApAnbIwalI52eeQI1u26Af8LBrPaZihaphJOMOYKPqccvUckLJicco1tz5CLu6ce8I=
X-Google-Smtp-Source: AGHT+IHp1U+4XQ6KAayZmuULflGUwRrj9KyoD/4FoUuCc/++eqYNJFxSC/LllGFKI6pJJRxWPF4rpOGx18SJkY2MJRg=
X-Received: by 2002:a7b:c389:0:b0:439:9434:1b6c with SMTP id
 5b1f17b1804b1-43d16042527mr47265e9.3.1741792471917; Wed, 12 Mar 2025 08:14:31
 -0700 (PDT)
MIME-Version: 1.0
References: <20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz>
 <20250214-slub-percpu-caches-v2-1-88592ee0966a@suse.cz> <CAJuCfpG4BYNWM24_Jha-SapfeaGdO0GKuteHwNE1hDdWXRS+1Q@mail.gmail.com>
 <befd17b0-160e-4933-96d9-8d5c4a774162@suse.cz>
In-Reply-To: <befd17b0-160e-4933-96d9-8d5c4a774162@suse.cz>
From: Suren Baghdasaryan <surenb@google.com>
Date: Wed, 12 Mar 2025 08:14:19 -0700
X-Gm-Features: AQ5f1JrjeGAOmsUwxBHV2t5RoC_LoxcisTbFL5bf2WIv6un6pR4tCAcjJLzwCH0
Message-ID: <CAJuCfpHwvAUep6YT3Eu2SCu-dDbUoN=WE8r9aEV4UDWKfHXV+g@mail.gmail.com>
Subject: Re: [PATCH RFC v2 01/10] slab: add opt-in caching layer of percpu sheaves
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>, Christoph Lameter <cl@linux.com>, 
	David Rientjes <rientjes@google.com>, Roman Gushchin <roman.gushchin@linux.dev>, 
	Hyeonggon Yoo <42.hyeyoo@gmail.com>, Uladzislau Rezki <urezki@gmail.com>, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org, rcu@vger.kernel.org, 
	maple-tree@lists.infradead.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: ECA2F14000D
X-Stat-Signature: gwhi9h84ndyubgs7xb918n3uetxi6e9u
X-HE-Tag: 1741792473-438330
X-HE-Meta: U2FsdGVkX19IEyhG8y+VpmAq0WFeVMVv5a1yNcf0ozOIRpZXPExiVsAuFegfWl85tgqcTmVe9ZIcB3eLLKL5rz8LsIOTlu7bQ78knlPXUPVxjAF+8KGB06jieqbrn0DoccXS/KmJNM7np3ccFsk99FaYmTXK/dSQ2Foaiwu7ibw4zjTsKdwVSqkLvZBV42pJjh4NsGfFIoW09LtvzSva+/MhrZVNqoWp1eiR37ODcnIHpcN/Cwrwusy6Svnvr+J9tEr2nK3Flgjf4J4+M5SLH3QHQuf72WatR2h7nnm0MWyMhV6u805komvRROf/0ojz4plJEFvadY5EfRBz/w2gGu91yjulNjPEjr2dh+j/d7OMwqF4+l33EcBh/hlqpmDWKs+2ULbdmkwdHPmv0IyD8jyChknu3yKqwyrbKX4mbZgo5uZAWukU/FNmwDHr9yAQlw05Gv5P9TudkddDIouUiGDjXGspRQ/+lcEyr62nNLlKIm7XkW/yMMqeu9Rn9FKb0vvxuLoatJVxGUNgCkqk8ccZx0Gr9BjbSRRDWFn/sCfpGMMqA2CisEnjVfuWhtmiayLnj85WyOF2/wqndY6R0g6i6AnEMjH0pKW5viZjnCv7Ehy9JcTEnccDdJPS9FJ1s91mqADazgbesLXQ7jsi6O40dx7VTyeeXcxZO0/7wcyKWnRxVNb8IlShQfwKoFpz+DYw0SqpDQT4mMExbgdP68+zHWFLCCSQVvBg0f+ZGjJMW/y+yBDQcTrZLaUmcI6n0bcojVbMNjCrhYFe5YRPnsHplC2z8wfg5YUzkRmS6MeyP1IGDC1ixjcrNPVS8TV0Yty0wDAoR/VtBT72S/eVrMVLHLCJ+rlYAlhEO+30/v89vVH8K8vynax8Pbk6qPlKKqJS2weVCY0Z6EgBhCHfBLErM7uW6g9T0Ij87pZEmUhzV5ldcwdv+sRW+o5qgpaPim1txje3uzruObqIIUc
 bseebnQe
 EGdiVV0mh/fye4rvOEBx5Ltds+Brpn5Hlv3lf/n0eGo05SdjvZRztgpwJH2DzyP9gwz6HiQXswX3E8iBf+dh93XOdkkVsUnI/AkUhmf9wbf5TmYZMH59+1+gd8EYGe3WfEaQKHVRBOm9vWlS5Af87DDow+ocaqrtx+UUJB5Zx1kJggX2bkQxZzp28WYxxQpQQIFHfGAYdLF1B7+6EWYxZZ77jnRvb744ApnvdXyKDFjUmDQuqnWKt7F3c3SmpcESJCTMVF7KdaLJXJZM=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Wed, Mar 12, 2025 at 7:58=E2=80=AFAM Vlastimil Babka <vbabka@suse.cz> wr=
ote:
>
> On 2/22/25 23:46, Suren Baghdasaryan wrote:
> > On Fri, Feb 14, 2025 at 8:27=E2=80=AFAM Vlastimil Babka <vbabka@suse.cz=
> wrote:
> >>
> >> Specifying a non-zero value for a new struct kmem_cache_args field
> >> sheaf_capacity will setup a caching layer of percpu arrays called
> >> sheaves of given capacity for the created cache.
> >>
> >> Allocations from the cache will allocate via the percpu sheaves (main =
or
> >> spare) as long as they have no NUMA node preference. Frees will also
> >> refill one of the sheaves.
> >>
> >> When both percpu sheaves are found empty during an allocation, an empt=
y
> >> sheaf may be replaced with a full one from the per-node barn. If none
> >> are available and the allocation is allowed to block, an empty sheaf i=
s
> >> refilled from slab(s) by an internal bulk alloc operation. When both
> >> percpu sheaves are full during freeing, the barn can replace a full on=
e
> >> with an empty one, unless over a full sheaves limit. In that case a
> >> sheaf is flushed to slab(s) by an internal bulk free operation. Flushi=
ng
> >> sheaves and barns is also wired to the existing cpu flushing and cache
> >> shrinking operations.
> >>
> >> The sheaves do not distinguish NUMA locality of the cached objects. If
> >> an allocation is requested with kmem_cache_alloc_node() with a specifi=
c
> >> node (not NUMA_NO_NODE), sheaves are bypassed.
> >>
> >> The bulk operations exposed to slab users also try to utilize the
> >> sheaves as long as the necessary (full or empty) sheaves are available
> >> on the cpu or in the barn. Once depleted, they will fallback to bulk
> >> alloc/free to slabs directly to avoid double copying.
> >>
> >> Sysfs stat counters alloc_cpu_sheaf and free_cpu_sheaf count objects
> >> allocated or freed using the sheaves. Counters sheaf_refill,
> >> sheaf_flush_main and sheaf_flush_other count objects filled or flushed
> >> from or to slab pages, and can be used to assess how effective the
> >> caching is. The refill and flush operations will also count towards th=
e
> >> usual alloc_fastpath/slowpath, free_fastpath/slowpath and other
> >> counters.
> >>
> >> Access to the percpu sheaves is protected by local_lock_irqsave()
> >> operations, each per-NUMA-node barn has a spin_lock.
> >>
> >> A current limitation is that when slub_debug is enabled for a cache wi=
th
> >> percpu sheaves, the objects in the array are considered as allocated f=
rom
> >> the slub_debug perspective, and the alloc/free debugging hooks occur
> >> when moving the objects between the array and slab pages. This means
> >> that e.g. an use-after-free that occurs for an object cached in the
> >> array is undetected. Collected alloc/free stacktraces might also be le=
ss
> >> useful. This limitation could be changed in the future.
> >>
> >> On the other hand, KASAN, kmemcg and other hooks are executed on actua=
l
> >> allocations and frees by kmem_cache users even if those use the array,
> >> so their debugging or accounting accuracy should be unaffected.
> >>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >
> > Only one possible issue in __pcs_flush_all_cpu(), all other comments
> > are nits and suggestions.
>
> Thanks.
>
> >> +        * Limitations: when slub_debug is enabled for the cache, all =
relevant
> >> +        * actions (i.e. poisoning, obtaining stacktraces) and checks =
happen
> >> +        * when objects move between sheaves and slab pages, which may=
 result in
> >> +        * e.g. not detecting a use-after-free while the object is in =
the array
> >> +        * cache, and the stacktraces may be less useful.
> >
> > I would also love to see a short comparison of sheaves (when objects
> > are freed using kfree_rcu()) vs SLAB_TYPESAFE_BY_RCU. I think both
> > mechanisms rcu-free objects in bulk but sheaves would not reuse an
> > object before RCU grace period is passed. Is that right?
>
> I don't think that's right. SLAB_TYPESAFE_BY_RCU doesn't rcu-free objects=
 in
> bulk, the objects are freed immediately. It only rcu-delays freeing the s=
lab
> folio once all objects are freed.

Yes, you are right.

>
> >> +struct slub_percpu_sheaves {
> >> +       local_lock_t lock;
> >> +       struct slab_sheaf *main; /* never NULL when unlocked */
> >> +       struct slab_sheaf *spare; /* empty or full, may be NULL */
> >> +       struct slab_sheaf *rcu_free;
> >
> > Would be nice to have a short comment for rcu_free as well. I could
> > guess what main and spare are but for rcu_free had to look further.
>
> Added.
>
> >> +static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags,
> >> +                                  size_t size, void **p);
> >> +
> >> +
> >> +static int refill_sheaf(struct kmem_cache *s, struct slab_sheaf *shea=
f,
> >> +                        gfp_t gfp)
> >> +{
> >> +       int to_fill =3D s->sheaf_capacity - sheaf->size;
> >> +       int filled;
> >> +
> >> +       if (!to_fill)
> >> +               return 0;
> >> +
> >> +       filled =3D __kmem_cache_alloc_bulk(s, gfp, to_fill,
> >> +                                        &sheaf->objects[sheaf->size])=
;
> >> +
> >> +       if (!filled)
> >> +               return -ENOMEM;
> >> +
> >> +       sheaf->size =3D s->sheaf_capacity;
> >
> > nit: __kmem_cache_alloc_bulk() either allocates requested number of
> > objects or returns 0, so the current code is fine but if at some point
> > the implementation changes so that it can return smaller number of
> > objects than requested (filled < to_fill) then the above assignment
> > will become invalid. I think a safer thing here would be to just:
> >
> >        sheaf->size +=3D filled;
> >
> > which also makes logical sense. Alternatively you could add
> > VM_BUG_ON(filled !=3D to_fill) but the increment I think would be
> > better.
>
> It's useful to indicate the refill was not successful, for patch 6. So I'=
m
> changing this to:
>
>         sheaf->size +=3D filled;
>
>         stat_add(s, SHEAF_REFILL, filled);
>
>         if (filled < to_fill)
>                 return -ENOMEM;
>
>         return 0;

That looks good to me.

>
> >> +
> >> +       stat_add(s, SHEAF_REFILL, filled);
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +
> >> +static struct slab_sheaf *alloc_full_sheaf(struct kmem_cache *s, gfp_=
t gfp)
> >> +{
> >> +       struct slab_sheaf *sheaf =3D alloc_empty_sheaf(s, gfp);
> >> +
> >> +       if (!sheaf)
> >> +               return NULL;
> >> +
> >> +       if (refill_sheaf(s, sheaf, gfp)) {
> >> +               free_empty_sheaf(s, sheaf);
> >> +               return NULL;
> >> +       }
> >> +
> >> +       return sheaf;
> >> +}
> >> +
> >> +/*
> >> + * Maximum number of objects freed during a single flush of main pcs =
sheaf.
> >> + * Translates directly to an on-stack array size.
> >> + */
> >> +#define PCS_BATCH_MAX  32U
> >> +
> > .> +static void __kmem_cache_free_bulk(struct kmem_cache *s, size_t
> > size, void **p);
> >> +
> >
> > A comment clarifying why you are freeing in PCS_BATCH_MAX batches here
> > would be helpful. My understanding is that you do that to free objects
> > outside of the cpu_sheaves->lock, so you isolate a batch, release the
> > lock and then free the batch.
>
> OK.
>
> >> +static void sheaf_flush_main(struct kmem_cache *s)
> >> +{
> >> +       struct slub_percpu_sheaves *pcs;
> >> +       unsigned int batch, remaining;
> >> +       void *objects[PCS_BATCH_MAX];
> >> +       struct slab_sheaf *sheaf;
> >> +       unsigned long flags;
> >> +
> >> +next_batch:
> >> +       local_lock_irqsave(&s->cpu_sheaves->lock, flags);
> >> +       pcs =3D this_cpu_ptr(s->cpu_sheaves);
> >> +       sheaf =3D pcs->main;
> >> +
> >> +       batch =3D min(PCS_BATCH_MAX, sheaf->size);
> >> +
> >> +       sheaf->size -=3D batch;
> >> +       memcpy(objects, sheaf->objects + sheaf->size, batch * sizeof(v=
oid *));
> >> +
> >> +       remaining =3D sheaf->size;
> >> +
> >> +       local_unlock_irqrestore(&s->cpu_sheaves->lock, flags);
> >> +
> >> +       __kmem_cache_free_bulk(s, batch, &objects[0]);
> >> +
> >> +       stat_add(s, SHEAF_FLUSH_MAIN, batch);
> >> +
> >> +       if (remaining)
> >> +               goto next_batch;
> >> +}
> >> +
> >
> > This function seems to be used against either isolated sheaves or in
> > slub_cpu_dead() --> __pcs_flush_all_cpu() path where we hold
> > slab_mutex and I think that guarantees that the sheaf is unused. Maybe
> > a short comment clarifying this requirement or rename the function to
> > reflect that? Something like flush_unused_sheaf()?
>
> It's not slab_mutex, but the fact slub_cpu_dead() is executed in a hotplu=
g
> phase when the given cpu is already not executing anymore and thus cannot=
 be
> manipulating its percpu sheaves, so we are the only one that does.
> So I will clarify and rename to sheaf_flush_unused().

I see. Thanks for explaining.

>
> >> +
> >> +static void __pcs_flush_all_cpu(struct kmem_cache *s, unsigned int cp=
u)
> >> +{
> >> +       struct slub_percpu_sheaves *pcs;
> >> +
> >> +       pcs =3D per_cpu_ptr(s->cpu_sheaves, cpu);
> >> +
> >> +       if (pcs->spare) {
> >> +               sheaf_flush(s, pcs->spare);
> >> +               free_empty_sheaf(s, pcs->spare);
> >> +               pcs->spare =3D NULL;
> >> +       }
> >> +
> >> +       // TODO: handle rcu_free
> >> +       BUG_ON(pcs->rcu_free);
> >> +
> >> +       sheaf_flush_main(s);
> >
> > Hmm. sheaf_flush_main() always flushes for this_cpu only, so IIUC this
> > call will not necessarily flush the main sheaf for the cpu passed to
> > __pcs_flush_all_cpu().
>
> Thanks, yes I need to call sheaf_flush_unused(pcs->main). It's ok to do
> given my reply above.
>
> >> +/*
> >> + * Free an object to the percpu sheaves.
> >> + * The object is expected to have passed slab_free_hook() already.
> >> + */
> >> +static __fastpath_inline
> >> +void free_to_pcs(struct kmem_cache *s, void *object)
> >> +{
> >> +       struct slub_percpu_sheaves *pcs;
> >> +       unsigned long flags;
> >> +
> >> +restart:
> >> +       local_lock_irqsave(&s->cpu_sheaves->lock, flags);
> >> +       pcs =3D this_cpu_ptr(s->cpu_sheaves);
> >> +
> >> +       if (unlikely(pcs->main->size =3D=3D s->sheaf_capacity)) {
> >> +
> >> +               struct slab_sheaf *empty;
> >> +
> >> +               if (!pcs->spare) {
> >> +                       empty =3D barn_get_empty_sheaf(pcs->barn);
> >> +                       if (empty) {
> >> +                               pcs->spare =3D pcs->main;
> >> +                               pcs->main =3D empty;
> >> +                               goto do_free;
> >> +                       }
> >> +                       goto alloc_empty;
> >> +               }
> >> +
> >> +               if (pcs->spare->size < s->sheaf_capacity) {
> >> +                       stat(s, SHEAF_SWAP);
> >> +                       swap(pcs->main, pcs->spare);
> >> +                       goto do_free;
> >> +               }
> >> +
> >> +               empty =3D barn_replace_full_sheaf(pcs->barn, pcs->main=
);
> >> +
> >> +               if (!IS_ERR(empty)) {
> >> +                       pcs->main =3D empty;
> >> +                       goto do_free;
> >> +               }
> >> +
> >> +               if (PTR_ERR(empty) =3D=3D -E2BIG) {
> >> +                       /* Since we got here, spare exists and is full=
 */
> >> +                       struct slab_sheaf *to_flush =3D pcs->spare;
> >> +
> >> +                       pcs->spare =3D NULL;
> >> +                       local_unlock_irqrestore(&s->cpu_sheaves->lock,=
 flags);
> >> +
> >> +                       sheaf_flush(s, to_flush);
> >> +                       empty =3D to_flush;
> >> +                       goto got_empty;
> >> +               }
> >> +
> >> +alloc_empty:
> >> +               local_unlock_irqrestore(&s->cpu_sheaves->lock, flags);
> >> +
> >> +               empty =3D alloc_empty_sheaf(s, GFP_NOWAIT);
> >> +
> >> +               if (!empty) {
> >> +                       sheaf_flush_main(s);
> >> +                       goto restart;
> >> +               }
> >> +
> >> +got_empty:
> >> +               local_lock_irqsave(&s->cpu_sheaves->lock, flags);
> >> +               pcs =3D this_cpu_ptr(s->cpu_sheaves);
> >> +
> >> +               /*
> >> +                * if we put any sheaf to barn here, it's because we r=
aced or
> >> +                * have been migrated to a different cpu, which should=
 be rare
> >> +                * enough so just ignore the barn's limits to simplify
> >> +                */
> >> +               if (unlikely(pcs->main->size < s->sheaf_capacity)) {
> >> +                       if (!pcs->spare)
> >> +                               pcs->spare =3D empty;
> >> +                       else
> >> +                               barn_put_empty_sheaf(pcs->barn, empty,=
 true);
> >> +                       goto do_free;
> >> +               }
> >> +
> >> +               if (!pcs->spare) {
> >> +                       pcs->spare =3D pcs->main;
> >> +                       pcs->main =3D empty;
> >> +                       goto do_free;
> >> +               }
> >> +
> >> +               barn_put_full_sheaf(pcs->barn, pcs->main, true);
> >> +               pcs->main =3D empty;
> >
> > I find the program flow in this function quite complex and hard to
> > follow. I think refactoring the above block starting from "pcs =3D
> > this_cpu_ptr(s->cpu_sheaves)" would somewhat simplify it. That
> > eliminates the need for the "got_empty" label and makes the
> > locking/unlocking sequence of s->cpu_sheaves->lock a bit more clear.
>
> I'm a bit lost, refactoring how exactly?

I thought moving the code above into a function above starting from
"pcs =3D this_cpu_ptr(s->cpu_sheaves)" into its own function would
simplify the flow. But as I said, it's a nit. If you try and don't
like that feel free to ignore this suggestion.

>
> >> +       }
> >> +
> >> +do_free:
> >> +       pcs->main->objects[pcs->main->size++] =3D object;
> >> +
> >> +       local_unlock_irqrestore(&s->cpu_sheaves->lock, flags);
> >> +
> >> +       stat(s, FREE_PCS);
> >> +}