From: Vlastimil Babka <vbabka@suse.cz>
To: Suren Baghdasaryan <surenb@google.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Christoph Lameter <cl@linux.com>,
David Rientjes <rientjes@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
rcu@vger.kernel.org, maple-tree@lists.infradead.org,
Vlastimil Babka <vbabka@suse.cz>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Alexei Starovoitov <ast@kernel.org>,
"Liam R. Howlett" <Liam.Howlett@Oracle.com>
Subject: [PATCH RFC v2 00/10] SLUB percpu sheaves
Date: Fri, 14 Feb 2025 17:27:36 +0100 [thread overview]
Message-ID: <20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz> (raw)
Hi,
This is the v2 RFC to add an opt-in percpu array-based caching layer to
SLUB. The name "sheaf" was invented by Matthew so we don't call it
magazine like the original Bonwick paper. The per-NUMA-node cache of
sheaves is thus called "barn".
This may seem similar to the arrays in SLAB, but the main differences
are:
- opt-in, not used for every cache
- does not distinguish NUMA locality, thus no "alien" arrays that would
need periodical flushing
- improves kfree_rcu() handling
- API for obtaining a preallocated sheaf that can be used for guaranteed
and efficient allocations in a restricted context, when the upper
bound for needed objects is known but rarely reached
The motivation comes mainly from the ongoing work related to VMA
scalability and the related maple tree operations. This is why maple
tree nodes are sheaf-enabled in the RFC, but it's not a full conversion
that would take benefits of the improved preallocation API. The VMA part
is currently left out as it's expected that Suren will land the VMA
TYPESAFE_BY_RCU conversion [3] soon and there would be conflict with that.
With both series applied it means just adding a line to kmem_cache_args
in proc_caches_init().
Some performance benefits were measured by Suren and Liam in previous
versions. I hope to have those numbers posted public as both this work
and the VMA and maple tree changes stabilize.
A sheaf-enabled cache has the following expected advantages:
- Cheaper fast paths. For allocations, instead of local double cmpxchg,
after Patch 5 it's preempt_disable() and no atomic operations. Same for
freeing, which is normally a local double cmpxchg only for a short
term allocations (so the same slab is still active on the same cpu when
freeing the object) and a more costly locked double cmpxchg otherwise.
The downside is the lack of NUMA locality guarantees for the allocated
objects.
- kfree_rcu() batching and recycling. kfree_rcu() will put objects to a
separate percpu sheaf and only submit the whole sheaf to call_rcu()
when full. After the grace period, the sheaf can be used for
allocations, which is more efficient than freeing and reallocating
individual slab objects (even with the batching done by kfree_rcu()
implementation itself). In case only some cpus are allowed to handle rcu
callbacks, the sheaf can still be made available to other cpus on the
same node via the shared barn. The maple_node cache uses kfree_rcu() and
thus can benefit from this.
- Preallocation support. A prefilled sheaf can be privately borrowed for
a short term operation that is not allowed to block in the middle and
may need to allocate some objects. If an upper bound (worst case) for
the number of allocations is known, but only much fewer allocations
actually needed on average, borrowing and returning a sheaf is much more
efficient then a bulk allocation for the worst case followed by a bulk
free of the many unused objects. Maple tree write operations should
benefit from this.
Patch 1 implements the basic sheaf functionality and using
local_lock_irqsave() for percpu sheaf locking.
Patch 2 adds the kfree_rcu() support.
Patch 3 is copied from the series "bpf, mm: Introduce try_alloc_pages()"
[2] to introduce a variant of local_lock that has a trylock operation.
Patch 4 adds a variant of the trylock without _irqsave. Patch 5 converts
percpu sheaves locking to the new variant of the lock.
Patch 6 implements borrowing prefilled sheaves, with maple tree being the
ancticipated user.
Patch 7 seeks to reduce barn spinlock contention. Separately for
possible evaluation.
Patches 8 and 9 by Liam add testing stubs that maple tree will use in
its userspace tests.
Patch 10 enables sheaves for the maple tree node cache, but does not
take advantage of prefilling yet.
(RFC) LIMITATIONS:
- with slub_debug enabled, objects in sheaves are considered allocated
so allocation/free stacktraces may become imprecise and checking of
e.g. redzone violations may be delayed
GIT TREES:
this series: https://git.kernel.org/vbabka/l/slub-percpu-sheaves-v2
To avoid conflicts, the series requires (and the branch above is based
on) the kfree_rcu() code refactoring scheduled for 6.15:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/kfree_rcu_tiny
To facilitate testing/benchmarking, there's also a branch with Liam's
maple tree changes from [4] adapted to the current code:
https://git.kernel.org/vbabka/l/slub-percpu-sheaves-v2-maple
There are also two optimization patches for sheaves by Liam for
evaluation as I suspect they might not be universal win.
Vlastimil
[1] https://lore.kernel.org/all/20241111205506.3404479-1-surenb@google.com/
[2] https://lore.kernel.org/all/20250213033556.9534-4-alexei.starovoitov@gmail.com/
[3] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.com/
[4] https://www.infradead.org/git/?p=users/jedix/linux-maple.git;a=shortlog;h=refs/heads/slub-percpu-sheaves-v2
---
Changes in v2:
- Removed kfree_rcu() destructors support as VMAs will not need it
anymore after [3] is merged.
- Changed to localtry_lock_t borrowed from [2] instead of an own
implementation of the same idea.
- Many fixes and improvements thanks to Liam's adoption for maple tree
nodes.
- Userspace Testing stubs by Liam.
- Reduced limitations/todos - hooking to kfree_rcu() is complete,
prefilled sheaves can exceed cache's sheaf_capacity.
- Link to v1: https://lore.kernel.org/r/20241112-slub-percpu-caches-v1-0-ddc0bdc27e05@suse.cz
---
Liam R. Howlett (2):
tools: Add testing support for changes to rcu and slab for sheaves
tools: Add sheafs support to testing infrastructure
Sebastian Andrzej Siewior (1):
locking/local_lock: Introduce localtry_lock_t
Vlastimil Babka (7):
slab: add opt-in caching layer of percpu sheaves
slab: add sheaf support for batching kfree_rcu() operations
locking/local_lock: add localtry_trylock()
slab: switch percpu sheaves locking to localtry_lock
slab: sheaf prefilling for guaranteed allocations
slab: determine barn status racily outside of lock
maple_tree: use percpu sheaves for maple_node_cache
include/linux/local_lock.h | 70 ++
include/linux/local_lock_internal.h | 146 ++++
include/linux/slab.h | 50 ++
lib/maple_tree.c | 11 +-
mm/slab.h | 4 +
mm/slab_common.c | 26 +-
mm/slub.c | 1403 +++++++++++++++++++++++++++++++--
tools/include/linux/slab.h | 65 +-
tools/testing/shared/linux.c | 108 ++-
tools/testing/shared/linux/rcupdate.h | 22 +
10 files changed, 1840 insertions(+), 65 deletions(-)
---
base-commit: 379487e17ca406b47392e7ab6cf35d1c3bacb371
change-id: 20231128-slub-percpu-caches-9441892011d7
prerequisite-message-id: 20250203-slub-tiny-kfree_rcu-v1-0-d4428bf9a8a1@suse.cz
prerequisite-patch-id: 1a4af92b5eb1b8bfc86bac8d7fc1ef0963e7d9d6
prerequisite-patch-id: f24a39c38103b7e09fbf2e6f84e6108499ab7980
prerequisite-patch-id: 23e90b23482f4775c95295821dd779ba4e3712e9
prerequisite-patch-id: 5c53a619477acdce07071abec0f40e79501ea40b
Best regards,
--
Vlastimil Babka <vbabka@suse.cz>
next reply other threads:[~2025-02-14 16:27 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-14 16:27 Vlastimil Babka [this message]
2025-02-14 16:27 ` [PATCH RFC v2 01/10] slab: add opt-in caching layer of " Vlastimil Babka
2025-02-22 22:46 ` Suren Baghdasaryan
2025-02-22 22:56 ` Suren Baghdasaryan
2025-03-12 14:57 ` Vlastimil Babka
2025-03-12 15:14 ` Suren Baghdasaryan
2025-03-17 10:09 ` Vlastimil Babka
2025-02-24 8:04 ` Harry Yoo
2025-03-12 14:59 ` Vlastimil Babka
2025-02-14 16:27 ` [PATCH RFC v2 02/10] slab: add sheaf support for batching kfree_rcu() operations Vlastimil Babka
2025-02-22 23:08 ` Suren Baghdasaryan
2025-03-12 16:19 ` Vlastimil Babka
2025-02-24 8:40 ` Harry Yoo
2025-03-12 16:16 ` Vlastimil Babka
2025-02-14 16:27 ` [PATCH RFC v2 03/10] locking/local_lock: Introduce localtry_lock_t Vlastimil Babka
2025-02-17 14:19 ` Sebastian Andrzej Siewior
2025-02-17 14:35 ` Vlastimil Babka
2025-02-17 15:07 ` Sebastian Andrzej Siewior
2025-02-18 18:41 ` Alexei Starovoitov
2025-02-26 17:00 ` Davidlohr Bueso
2025-02-26 17:15 ` Alexei Starovoitov
2025-02-26 19:28 ` Davidlohr Bueso
2025-02-14 16:27 ` [PATCH RFC v2 04/10] locking/local_lock: add localtry_trylock() Vlastimil Babka
2025-02-14 16:27 ` [PATCH RFC v2 05/10] slab: switch percpu sheaves locking to localtry_lock Vlastimil Babka
2025-02-23 2:33 ` Suren Baghdasaryan
2025-02-24 13:08 ` Harry Yoo
2025-02-14 16:27 ` [PATCH RFC v2 06/10] slab: sheaf prefilling for guaranteed allocations Vlastimil Babka
2025-02-23 3:54 ` Suren Baghdasaryan
2025-02-25 7:30 ` Harry Yoo
2025-03-12 17:09 ` Vlastimil Babka
2025-02-25 8:00 ` Harry Yoo
2025-03-12 18:16 ` Vlastimil Babka
2025-02-14 16:27 ` [PATCH RFC v2 07/10] slab: determine barn status racily outside of lock Vlastimil Babka
2025-02-23 4:00 ` Suren Baghdasaryan
2025-02-25 8:54 ` Harry Yoo
2025-03-12 18:23 ` Vlastimil Babka
2025-02-14 16:27 ` [PATCH RFC v2 08/10] tools: Add testing support for changes to rcu and slab for sheaves Vlastimil Babka
2025-02-23 4:24 ` Suren Baghdasaryan
2025-02-14 16:27 ` [PATCH RFC v2 09/10] tools: Add sheafs support to testing infrastructure Vlastimil Babka
2025-02-14 16:27 ` [PATCH RFC v2 10/10] maple_tree: use percpu sheaves for maple_node_cache Vlastimil Babka
2025-02-23 4:27 ` Suren Baghdasaryan
2025-02-14 18:28 ` [PATCH RFC v2 00/10] SLUB percpu sheaves Christoph Lameter (Ampere)
2025-02-23 0:19 ` Kent Overstreet
2025-02-23 4:44 ` Suren Baghdasaryan
2025-02-24 1:36 ` Suren Baghdasaryan
2025-02-24 1:43 ` Suren Baghdasaryan
2025-02-24 20:53 ` Vlastimil Babka
2025-02-24 21:12 ` Suren Baghdasaryan
2025-02-25 20:26 ` Suren Baghdasaryan
2025-03-04 10:54 ` Vlastimil Babka
2025-03-04 18:35 ` Suren Baghdasaryan
2025-03-04 19:08 ` Liam R. Howlett
2025-03-14 17:10 ` Suren Baghdasaryan
2025-03-17 11:08 ` Vlastimil Babka
2025-03-17 18:56 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz \
--to=vbabka@suse.cz \
--cc=42.hyeyoo@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=cl@linux.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maple-tree@lists.infradead.org \
--cc=rcu@vger.kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=surenb@google.com \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox