[PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy()
@ 2024-07-15 20:29 Vlastimil Babka
  2024-07-15 20:29 ` [PATCH RFC 1/6] mm, slab: make caches with refcount of 0 unmergeable Vlastimil Babka
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-15 20:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes
  Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Vlastimil Babka

First RFC, feel free to ignore for now if too busy with merge window.
Also in git:
https://git.kernel.org/vbabka/l/slab-kfree_rcu-destroy-v1r0

Based on slab/for-next:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-next

Since SLOB was removed, we have allowed kfree_rcu() for objects
allocated from any kmem_cache in addition to kmalloc().

Recently we have attempted to replace existing call_rcu() usage with
kfree_rcu() where the callback is a plain kmem_cache_free(), in a series
by Julia Lawall [1].

Jakub Kicinski pointed out [2] this was tried already in batman-adv but
had to be reverted due to kmem_cache_destroy() failing due to objects
remaining in the cache, despite rcu_barrier() being used.

Jason Donenfeld found the culprit [3] being a35d16905efc ("rcu: Add
basic support for kfree_rcu() batching") causing rcu_barrier() to be
insufficient.

This was never a problem for kfree_rcu() usage on kmalloc() objects as
the kmalloc caches are never destroyed, but arbitrary caches can be,
e.g. due to module unload.

Out of the possible solutions collected by Paul McKenney [4] the most
appealing to me is "kmem_cache_destroy() lingers for kfree_rcu()" as
it adds no additional concerns to kfree_rcu() users. We already have
the precedence in some parts of the kmem_cache cleanup being done
asynchronously for SLAB_TYPESAFE_BY_RCU caches.

This series implements the necessary changes to the slab allocator,
mainly that if there are objects remaining in the cache when
kmem_cache_destroy() is called it is assumed that it's due to pending
kfree_rcu(), and a asynchronous work is scheduled that performs the
necessary barrier and then tries again. If objects remain after the
barrier, the usual warning is reported and the cache remains
undestroyed.

Notably the proper barrier doesn't yet exist so hopefully Paul or the
RCU team can help here :) it should be able to make it pessimistic as it
won't hold up anything but the work item.

Some downsides of this approach for debugging exist but should be small
enough:

- we can no longer report the stack trace leading to a premature
  kmem_cache_destroy(), but arguably that's not that interesting as the
  allocation traces and other details about the remaining objects, which
  are still reported, just a bit later.
- objects that are freed after kmem_cache_destroy() but before the work
  item proceeds with the destroy after the barrier, are technically
  bugs, but we won't be able to catch them unless we add some checks into
  the freeing hotpaths. It's not worth it, IMHO.

There is also a bunch of preliminary steps. The potentially visible one
is that sysfs and debugfs directories are now removed immediately during
kmem_cache_destroy() - previously this would be delayed for
SLAB_TYPESAFE_BY_RCU caches or left around forever if remaining objects
were detected. The extra delay by asynchronous destroy due to
kfree_rcu() could mean that a module unload/load cycle could create a
new instance of the cache which would fail to create these directories -
a concern raised by Paul. The immediate removal is the simplest solution
(compared to e.g. renaming the directories) and should not make debugging
harder - while it won't be possible to check debugfs for allocation
traces of leaked objects, they are listed with more detail in dmesg
anyway.

[1] https://lore.kernel.org/all/20240609082726.32742-1-Julia.Lawall@inria.fr/
[2] https://lore.kernel.org/all/20240612143305.451abf58@kernel.org/
[3] https://lore.kernel.org/all/Zmo9-YGraiCj5-MI@zx2c4.com/
[4] https://docs.google.com/document/d/1v0rcZLvvjVGejT3523W0rDy_sLFu2LWc_NR3fQItZaA/edit

To: Paul E. McKenney <paulmck@kernel.org>
To: Joel Fernandes <joel@joelfernandes.org>
To: Josh Triplett <josh@joshtriplett.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
To: Christoph Lameter <cl@linux.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: rcu@vger.kernel.org

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Vlastimil Babka (6):
      mm, slab: make caches with refcount of 0 unmergeable
      mm, slab: always maintain per-node slab and object count
      mm, slab: unlink sysfs and debugfs immediately
      mm, slab: simplify kmem_cache_release()
      mm, slab: asynchronously destroy caches with outstanding objects
      kunit, slub: add test_kfree_rcu()

 lib/slub_kunit.c |  22 ++++++++++
 mm/slab.h        |   4 +-
 mm/slab_common.c | 121 +++++++++++++++++++++++++++++++++++--------------------
 mm/slub.c        |  58 ++++++++++++--------------
 4 files changed, 129 insertions(+), 76 deletions(-)
---
base-commit: 436381eaf2a423e60fc8340399f7d2458091b383
change-id: 20240715-b4-slab-kfree_rcu-destroy-85dd2b2ded92

Best regards,
-- 
Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 1/6] mm, slab: make caches with refcount of 0 unmergeable
  2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
@ 2024-07-15 20:29 ` Vlastimil Babka
  2024-07-21  2:36   ` David Rientjes
  2024-07-15 20:29 ` [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count Vlastimil Babka
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-15 20:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes
  Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Vlastimil Babka

Slab caches with refcount 0 are in the process of being destroyed so
it's undesirable for new caches to attempt merging with them. A
synchronous destruction happens under slab_mutex thus excluding
concurrent cache creation and merging. Full destruction of
SLAB_TYPESAFE_BY_RCU caches might be delayed, but the cache is still
taken off the slab_caches list immediately, thus unreachable by cache
creation.

However a cache where __kmem_cache_shutdown() fails because it contains
objects that were not freed (due to a bug in the cache user) will be
left on the slab_caches list and might be considered for merging.
Also the following patches will introduce a possibility of a cache with
refcount 0 being temporarily reachable on the slab_list even in case of
no bugs, due to kfree_rcu() in flight.

For these reasons, prevent merging with caches that have zero refcount.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 70943a4c1c4b..3ba205bda95d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -150,9 +150,11 @@ int slab_unmergeable(struct kmem_cache *s)
 #endif

 	/*
-	 * We may have set a slab to be unmergeable during bootstrap.
+	 * We may have set a cache to be unmergeable (-1) during bootstrap.
+	 * 0 is for cache being destroyed asynchronously, or cache that failed
+	 * to destroy due to outstanding objects.
 	 */
-	if (s->refcount < 0)
+	if (s->refcount <= 0)
 		return 1;

 	return 0;

-- 
2.45.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count
  2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
  2024-07-15 20:29 ` [PATCH RFC 1/6] mm, slab: make caches with refcount of 0 unmergeable Vlastimil Babka
@ 2024-07-15 20:29 ` Vlastimil Babka
  2024-07-21  2:37   ` David Rientjes
  2024-07-22 14:16   ` Xiongwei Song
  2024-07-15 20:29 ` [PATCH RFC 3/6] mm, slab: unlink sysfs and debugfs immediately Vlastimil Babka
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-15 20:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes
  Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Vlastimil Babka

Currently SLUB counts per-node slabs and total objects only with
CONFIG_SLUB_DEBUG, in order to minimize overhead. However, the detection
in __kmem_cache_shutdown() whether there are no outstanding object
relies on the per-node slab count (node_nr_slabs()) so it may be
unreliable without CONFIG_SLUB_DEBUG. Thus we might be failing to warn
about such situations, and instead destroy a cache while leaving its
slab(s) around (due to a buggy slab user creating such a scenario, not
in normal operation).

We will also need node_nr_slabs() to be reliable in the following work
to gracefully handle kmem_cache_destroy() with kfree_rcu() objects in
flight. Thus make the counting of per-node slabs and objects
unconditional.

Note that CONFIG_SLUB_DEBUG is the default anyway, and the counting is
done only when allocating or freeing a slab page, so even in
!CONFIG_SLUB_DEBUG configs the overhead should be negligible.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 49 +++++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 829a1f08e8a2..aa4d80109c49 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -426,9 +426,9 @@ struct kmem_cache_node {
 	spinlock_t list_lock;
 	unsigned long nr_partial;
 	struct list_head partial;
-#ifdef CONFIG_SLUB_DEBUG
 	atomic_long_t nr_slabs;
 	atomic_long_t total_objects;
+#ifdef CONFIG_SLUB_DEBUG
 	struct list_head full;
 #endif
 };
@@ -438,6 +438,26 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
 	return s->node[node];
 }
 
+static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
+{
+	return atomic_long_read(&n->nr_slabs);
+}
+
+static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	atomic_long_inc(&n->nr_slabs);
+	atomic_long_add(objects, &n->total_objects);
+}
+static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+
+	atomic_long_dec(&n->nr_slabs);
+	atomic_long_sub(objects, &n->total_objects);
+}
+
 /*
  * Iterator over all nodes. The body will be executed for each node that has
  * a kmem_cache_node structure allocated (which is true for all online nodes)
@@ -1511,26 +1531,6 @@ static void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct
 	list_del(&slab->slab_list);
 }
 
-static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
-{
-	return atomic_long_read(&n->nr_slabs);
-}
-
-static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
-{
-	struct kmem_cache_node *n = get_node(s, node);
-
-	atomic_long_inc(&n->nr_slabs);
-	atomic_long_add(objects, &n->total_objects);
-}
-static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
-{
-	struct kmem_cache_node *n = get_node(s, node);
-
-	atomic_long_dec(&n->nr_slabs);
-	atomic_long_sub(objects, &n->total_objects);
-}
-
 /* Object debug checks for alloc/free paths */
 static void setup_object_debug(struct kmem_cache *s, void *object)
 {
@@ -1871,13 +1871,6 @@ slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name)
 
 #define disable_higher_order_debug 0
 
-static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
-							{ return 0; }
-static inline void inc_slabs_node(struct kmem_cache *s, int node,
-							int objects) {}
-static inline void dec_slabs_node(struct kmem_cache *s, int node,
-							int objects) {}
-
 #ifndef CONFIG_SLUB_TINY
 static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
 			       void **freelist, void *nextfree)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 3/6] mm, slab: unlink sysfs and debugfs immediately
  2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
  2024-07-15 20:29 ` [PATCH RFC 1/6] mm, slab: make caches with refcount of 0 unmergeable Vlastimil Babka
  2024-07-15 20:29 ` [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count Vlastimil Babka
@ 2024-07-15 20:29 ` Vlastimil Babka
  2024-07-15 20:29 ` [PATCH RFC 4/6] mm, slab: simplify kmem_cache_release() Vlastimil Babka
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-15 20:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes
  Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Vlastimil Babka

kmem_cache_destroy() includes removing the associated sysfs and debugfs
directories. Currently this might not happen immediately when:

- the cache is SLAB_TYPESAFE_BY_RCU and the cleanup is delayed,
  including the directores removal
- __kmem_cache_shutdown() fails due to outstanding objects - the
  directories remain indefinitely

When a cache is recreated with the same name, such as due to module
unload followed by a load, the directories will fail to be recreated for
the new instance of the cache due to the old directories being present.

We also want to add another possibility of delayed cleanup due to
kfree_rcu() in flight so let's fix this first and have the directories
removed immediately in kmem_cache_destroy() and regardless of
__kmem_cache_shutdown() success.

This should not make debugging harder if __kmem_cache_shutdown() fails,
because a detailed report of outstanding objects is printed into dmesg
already due to the failure.

Note the record in /proc/slabinfo will remain until the cleanup is
finished (or indefinitely if __kmem_cache_shutdown() fails) but that
does not prevent a new record to be added for a new cache instance.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 65 ++++++++++++++++++++++++++++----------------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 3ba205bda95d..2eef5ad37fa7 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -492,14 +492,10 @@ EXPORT_SYMBOL(kmem_buckets_create);
  * once or there will be a use-after-free problem. The actual deletion
  * and release of the kobject does not need slab_mutex or cpu_hotplug_lock
  * protection. So they are now done without holding those locks.
- *
- * Note that there will be a slight delay in the deletion of sysfs files
- * if kmem_cache_release() is called indrectly from a work function.
  */
 static void kmem_cache_release(struct kmem_cache *s)
 {
 	if (slab_state >= FULL) {
-		sysfs_slab_unlink(s);
 		sysfs_slab_release(s);
 	} else {
 		slab_kmem_cache_release(s);
@@ -536,33 +532,11 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
 	rcu_barrier();
 
 	list_for_each_entry_safe(s, s2, &to_destroy, list) {
-		debugfs_slab_release(s);
 		kfence_shutdown_cache(s);
 		kmem_cache_release(s);
 	}
 }
 
-static int shutdown_cache(struct kmem_cache *s)
-{
-	/* free asan quarantined objects */
-	kasan_cache_shutdown(s);
-
-	if (__kmem_cache_shutdown(s) != 0)
-		return -EBUSY;
-
-	list_del(&s->list);
-
-	if (s->flags & SLAB_TYPESAFE_BY_RCU) {
-		list_add_tail(&s->list, &slab_caches_to_rcu_destroy);
-		schedule_work(&slab_caches_to_rcu_destroy_work);
-	} else {
-		kfence_shutdown_cache(s);
-		debugfs_slab_release(s);
-	}
-
-	return 0;
-}
-
 void slab_kmem_cache_release(struct kmem_cache *s)
 {
 	__kmem_cache_release(s);
@@ -572,8 +546,8 @@ void slab_kmem_cache_release(struct kmem_cache *s)
 
 void kmem_cache_destroy(struct kmem_cache *s)
 {
-	int err = -EBUSY;
 	bool rcu_set;
+	int err;
 
 	if (unlikely(!s) || !kasan_check_byte(s))
 		return;
@@ -581,20 +555,45 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	cpus_read_lock();
 	mutex_lock(&slab_mutex);
 
+	s->refcount--;
+	if (s->refcount) {
+		mutex_unlock(&slab_mutex);
+		cpus_read_unlock();
+		return;
+	}
+
 	rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
 
-	s->refcount--;
-	if (s->refcount)
-		goto out_unlock;
+	/* free asan quarantined objects */
+	kasan_cache_shutdown(s);
 
-	err = shutdown_cache(s);
+	err = __kmem_cache_shutdown(s);
 	WARN(err, "%s %s: Slab cache still has objects when called from %pS",
 	     __func__, s->name, (void *)_RET_IP_);
-out_unlock:
+
+	if (!err)
+		list_del(&s->list);
+
 	mutex_unlock(&slab_mutex);
 	cpus_read_unlock();
-	if (!err && !rcu_set)
+
+	if (slab_state >= FULL) {
+		sysfs_slab_unlink(s);
+	}
+	debugfs_slab_release(s);
+
+	if (err)
+		return;
+
+	if (rcu_set) {
+		mutex_lock(&slab_mutex);
+		list_add_tail(&s->list, &slab_caches_to_rcu_destroy);
+		schedule_work(&slab_caches_to_rcu_destroy_work);
+		mutex_unlock(&slab_mutex);
+	} else {
+		kfence_shutdown_cache(s);
 		kmem_cache_release(s);
+	}
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 

-- 
2.45.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 4/6] mm, slab: simplify kmem_cache_release()
  2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
                   ` (2 preceding siblings ...)
  2024-07-15 20:29 ` [PATCH RFC 3/6] mm, slab: unlink sysfs and debugfs immediately Vlastimil Babka
@ 2024-07-15 20:29 ` Vlastimil Babka
  2024-07-15 20:29 ` [PATCH RFC 5/6] mm, slab: asynchronously destroy caches with outstanding objects Vlastimil Babka
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-15 20:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes
  Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Vlastimil Babka

kfence_shutdown_cache() is now called always just before
kmem_cache_release() so move it there.

Also replace two variants of the functions by using
__is_defined(SLAB_SUPPORTS_SYSFS).

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 2eef5ad37fa7..57962e1a5a86 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -486,7 +486,6 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
 }
 EXPORT_SYMBOL(kmem_buckets_create);
 
-#ifdef SLAB_SUPPORTS_SYSFS
 /*
  * For a given kmem_cache, kmem_cache_destroy() should only be called
  * once or there will be a use-after-free problem. The actual deletion
@@ -495,18 +494,12 @@ EXPORT_SYMBOL(kmem_buckets_create);
  */
 static void kmem_cache_release(struct kmem_cache *s)
 {
-	if (slab_state >= FULL) {
+	kfence_shutdown_cache(s);
+	if (__is_defined(SLAB_SUPPORTS_SYSFS) && slab_state >= FULL)
 		sysfs_slab_release(s);
-	} else {
+	else
 		slab_kmem_cache_release(s);
-	}
-}
-#else
-static void kmem_cache_release(struct kmem_cache *s)
-{
-	slab_kmem_cache_release(s);
 }
-#endif
 
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
 {
@@ -531,10 +524,8 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
 
 	rcu_barrier();
 
-	list_for_each_entry_safe(s, s2, &to_destroy, list) {
-		kfence_shutdown_cache(s);
+	list_for_each_entry_safe(s, s2, &to_destroy, list)
 		kmem_cache_release(s);
-	}
 }
 
 void slab_kmem_cache_release(struct kmem_cache *s)
@@ -591,7 +582,6 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		schedule_work(&slab_caches_to_rcu_destroy_work);
 		mutex_unlock(&slab_mutex);
 	} else {
-		kfence_shutdown_cache(s);
 		kmem_cache_release(s);
 	}
 }

-- 
2.45.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 5/6] mm, slab: asynchronously destroy caches with outstanding objects
  2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
                   ` (3 preceding siblings ...)
  2024-07-15 20:29 ` [PATCH RFC 4/6] mm, slab: simplify kmem_cache_release() Vlastimil Babka
@ 2024-07-15 20:29 ` Vlastimil Babka
  2024-07-15 20:29 ` [PATCH RFC 6/6] kunit, slub: add test_kfree_rcu() Vlastimil Babka
  2024-07-21  2:39 ` [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() David Rientjes
  6 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-15 20:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes
  Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Vlastimil Babka

We would like to replace call_rcu() users with kfree_rcu() where the
existing callback is just a kmem_cache_free(). However this causes
issues when the cache can be destroyed (such as due to module unload).

Currently such modules should be issuing rcu_barrier() before
kmem_cache_destroy() to have their call_rcu() callbacks processed first.
This barrier is however not sufficient for kfree_rcu() in flight due
to the batching introduced by a35d16905efc ("rcu: Add basic support for
kfree_rcu() batching").

This is not a problem for kmalloc caches which are never destroyed, but
since removing SLOB, kfree_rcu() is allowed also for any other cache,
that might be destroyed.

In order not to complicate the API, put the responsibility for handling
outstanding kfree_rcu() in kmem_cache_destroy() itself. Use the result
of __kmem_cache_shutdown() to determine if there are still allocated
objects in the cache, and if there are, assume it's due to kfree_rcu().
In that case schedule a work item that will use the appropriate barrier
and then attempt __kmem_cache_shutdown() again. Only if that fails as
well, produce the usual warning about non-freed objects.

Sysfs and debugs directories are removed immediately, so the cache can
be recreated with the same name without issues, while the previous
instance is still pending removal.

Users of call_rcu() with arbitrary callbacks should still perform their
own synchronous barrier before destroying the cache and unloading the
module, as the callbacks may be invoking module code or perform other
actions that are necessary for a successful unload.

Note that another non-bug reason why there might be objects outstanding
is the kasan quarantine. In that case the cleanup also becomes
asynchronous, and flushing the quarantine by kasan_cache_shutdown(s) is
only done in the workfn.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab.h        |  4 +++-
 mm/slab_common.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
 mm/slub.c        |  9 +++++----
 3 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index ece18ef5dd04..390a4e265f03 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -279,6 +279,8 @@ struct kmem_cache {
 	unsigned int red_left_pad;	/* Left redzone padding size */
 	const char *name;		/* Name (only for display!) */
 	struct list_head list;		/* List of slab caches */
+	struct work_struct async_destroy_work;
+
 #ifdef CONFIG_SYSFS
 	struct kobject kobj;		/* For sysfs */
 #endif
@@ -478,7 +480,7 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
 			      SLAB_NO_USER_FLAGS)
 
 bool __kmem_cache_empty(struct kmem_cache *);
-int __kmem_cache_shutdown(struct kmem_cache *);
+int __kmem_cache_shutdown(struct kmem_cache *, bool);
 void __kmem_cache_release(struct kmem_cache *);
 int __kmem_cache_shrink(struct kmem_cache *);
 void slab_kmem_cache_release(struct kmem_cache *);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 57962e1a5a86..3e15525819b6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -44,6 +44,8 @@ static LIST_HEAD(slab_caches_to_rcu_destroy);
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work);
 static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
 		    slab_caches_to_rcu_destroy_workfn);
+static void kmem_cache_kfree_rcu_destroy_workfn(struct work_struct *work);
+
 
 /*
  * Set of flags that will prevent slab merging
@@ -235,6 +237,7 @@ static struct kmem_cache *create_cache(const char *name,
 
 	s->refcount = 1;
 	list_add(&s->list, &slab_caches);
+	INIT_WORK(&s->async_destroy_work, kmem_cache_kfree_rcu_destroy_workfn);
 	return s;
 
 out_free_cache:
@@ -535,6 +538,47 @@ void slab_kmem_cache_release(struct kmem_cache *s)
 	kmem_cache_free(kmem_cache, s);
 }
 
+static void kmem_cache_kfree_rcu_destroy_workfn(struct work_struct *work)
+{
+	struct kmem_cache *s;
+	bool rcu_set;
+	int err;
+
+	s = container_of(work, struct kmem_cache, async_destroy_work);
+
+	// XXX use the real kmem_cache_free_barrier() or similar thing here
+	rcu_barrier();
+
+	cpus_read_lock();
+	mutex_lock(&slab_mutex);
+
+	rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
+
+	/* free asan quarantined objects */
+	kasan_cache_shutdown(s);
+
+	err = __kmem_cache_shutdown(s, true);
+	WARN(err, "kmem_cache_destroy %s: Slab cache still has objects",
+	     s->name);
+
+	if (err)
+		goto out_unlock;
+
+	list_del(&s->list);
+
+	if (rcu_set) {
+		list_add_tail(&s->list, &slab_caches_to_rcu_destroy);
+		schedule_work(&slab_caches_to_rcu_destroy_work);
+	}
+
+out_unlock:
+	mutex_unlock(&slab_mutex);
+	cpus_read_unlock();
+
+	if (!err && !rcu_set)
+		kmem_cache_release(s);
+}
+
 void kmem_cache_destroy(struct kmem_cache *s)
 {
 	bool rcu_set;
@@ -558,9 +602,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	/* free asan quarantined objects */
 	kasan_cache_shutdown(s);
 
-	err = __kmem_cache_shutdown(s);
-	WARN(err, "%s %s: Slab cache still has objects when called from %pS",
-	     __func__, s->name, (void *)_RET_IP_);
+	err = __kmem_cache_shutdown(s, false);
 
 	if (!err)
 		list_del(&s->list);
@@ -573,8 +615,10 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	}
 	debugfs_slab_release(s);
 
-	if (err)
+	if (err) {
+		schedule_work(&s->async_destroy_work);
 		return;
+	}
 
 	if (rcu_set) {
 		mutex_lock(&slab_mutex);
diff --git a/mm/slub.c b/mm/slub.c
index aa4d80109c49..c1222467c346 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5352,7 +5352,8 @@ static void list_slab_objects(struct kmem_cache *s, struct slab *slab,
  * This is called from __kmem_cache_shutdown(). We must take list_lock
  * because sysfs file might still access partial list after the shutdowning.
  */
-static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
+static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n,
+			 bool warn_inuse)
 {
 	LIST_HEAD(discard);
 	struct slab *slab, *h;
@@ -5363,7 +5364,7 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
 		if (!slab->inuse) {
 			remove_partial(n, slab);
 			list_add(&slab->slab_list, &discard);
-		} else {
+		} else if (warn_inuse) {
 			list_slab_objects(s, slab,
 			  "Objects remaining in %s on __kmem_cache_shutdown()");
 		}
@@ -5388,7 +5389,7 @@ bool __kmem_cache_empty(struct kmem_cache *s)
 /*
  * Release all resources used by a slab cache.
  */
-int __kmem_cache_shutdown(struct kmem_cache *s)
+int __kmem_cache_shutdown(struct kmem_cache *s, bool warn_inuse)
 {
 	int node;
 	struct kmem_cache_node *n;
@@ -5396,7 +5397,7 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 	flush_all_cpus_locked(s);
 	/* Attempt to free all objects */
 	for_each_kmem_cache_node(s, node, n) {
-		free_partial(s, n);
+		free_partial(s, n, warn_inuse);
 		if (n->nr_partial || node_nr_slabs(n))
 			return 1;
 	}

-- 
2.45.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 6/6] kunit, slub: add test_kfree_rcu()
  2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
                   ` (4 preceding siblings ...)
  2024-07-15 20:29 ` [PATCH RFC 5/6] mm, slab: asynchronously destroy caches with outstanding objects Vlastimil Babka
@ 2024-07-15 20:29 ` Vlastimil Babka
  2024-07-21  2:39 ` [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() David Rientjes
  6 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-15 20:29 UTC (permalink / raw)
  To: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes
  Cc: Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Julia Lawall, Jakub Kicinski, Jason A. Donenfeld,
	Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu, Vlastimil Babka

Add a test that will create cache, allocate one object, kfree_rcu() it
and attempt to destroy it. If the asynchronous cache freeing works
correctly, there should be no warnings in dmesg.

Since the warnings in the failure case are produced by a work callback,
I don't know if it's possible to capture it in the kunit test result
properly.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 lib/slub_kunit.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
index e6667a28c014..e3e4d0ca40b7 100644
--- a/lib/slub_kunit.c
+++ b/lib/slub_kunit.c
@@ -5,6 +5,7 @@
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/rcupdate.h>
 #include "../mm/slab.h"
 
 static struct kunit_resource resource;
@@ -157,6 +158,26 @@ static void test_kmalloc_redzone_access(struct kunit *test)
 	kmem_cache_destroy(s);
 }
 
+struct test_kfree_rcu_struct {
+	struct rcu_head rcu;
+};
+
+static void test_kfree_rcu(struct kunit *test)
+{
+	struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu",
+				sizeof(struct test_kfree_rcu_struct),
+				SLAB_NO_MERGE);
+	struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL);
+
+	kasan_disable_current();
+
+	KUNIT_EXPECT_EQ(test, 0, slab_errors);
+
+	kasan_enable_current();
+	kfree_rcu(p, rcu);
+	kmem_cache_destroy(s);
+}
+
 static int test_init(struct kunit *test)
 {
 	slab_errors = 0;
@@ -177,6 +198,7 @@ static struct kunit_case test_cases[] = {
 
 	KUNIT_CASE(test_clobber_redzone_free),
 	KUNIT_CASE(test_kmalloc_redzone_access),
+	KUNIT_CASE(test_kfree_rcu),
 	{}
 };
 

-- 
2.45.2



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 1/6] mm, slab: make caches with refcount of 0 unmergeable
  2024-07-15 20:29 ` [PATCH RFC 1/6] mm, slab: make caches with refcount of 0 unmergeable Vlastimil Babka
@ 2024-07-21  2:36   ` David Rientjes
  0 siblings, 0 replies; 13+ messages in thread
From: David Rientjes @ 2024-07-21  2:36 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Julia Lawall, Jakub Kicinski,
	Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu

On Mon, 15 Jul 2024, Vlastimil Babka wrote:

> Slab caches with refcount 0 are in the process of being destroyed so
> it's undesirable for new caches to attempt merging with them. A
> synchronous destruction happens under slab_mutex thus excluding
> concurrent cache creation and merging. Full destruction of
> SLAB_TYPESAFE_BY_RCU caches might be delayed, but the cache is still
> taken off the slab_caches list immediately, thus unreachable by cache
> creation.
> 
> However a cache where __kmem_cache_shutdown() fails because it contains
> objects that were not freed (due to a bug in the cache user) will be
> left on the slab_caches list and might be considered for merging.
> Also the following patches will introduce a possibility of a cache with
> refcount 0 being temporarily reachable on the slab_list even in case of
> no bugs, due to kfree_rcu() in flight.
> 
> For these reasons, prevent merging with caches that have zero refcount.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count
  2024-07-15 20:29 ` [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count Vlastimil Babka
@ 2024-07-21  2:37   ` David Rientjes
  2024-07-22 14:16   ` Xiongwei Song
  1 sibling, 0 replies; 13+ messages in thread
From: David Rientjes @ 2024-07-21  2:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Julia Lawall, Jakub Kicinski,
	Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu

On Mon, 15 Jul 2024, Vlastimil Babka wrote:

> Currently SLUB counts per-node slabs and total objects only with
> CONFIG_SLUB_DEBUG, in order to minimize overhead. However, the detection
> in __kmem_cache_shutdown() whether there are no outstanding object
> relies on the per-node slab count (node_nr_slabs()) so it may be
> unreliable without CONFIG_SLUB_DEBUG. Thus we might be failing to warn
> about such situations, and instead destroy a cache while leaving its
> slab(s) around (due to a buggy slab user creating such a scenario, not
> in normal operation).
> 
> We will also need node_nr_slabs() to be reliable in the following work
> to gracefully handle kmem_cache_destroy() with kfree_rcu() objects in
> flight. Thus make the counting of per-node slabs and objects
> unconditional.
> 
> Note that CONFIG_SLUB_DEBUG is the default anyway, and the counting is
> done only when allocating or freeing a slab page, so even in
> !CONFIG_SLUB_DEBUG configs the overhead should be negligible.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy()
  2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
                   ` (5 preceding siblings ...)
  2024-07-15 20:29 ` [PATCH RFC 6/6] kunit, slub: add test_kfree_rcu() Vlastimil Babka
@ 2024-07-21  2:39 ` David Rientjes
  2024-07-22  6:49   ` Vlastimil Babka
  6 siblings, 1 reply; 13+ messages in thread
From: David Rientjes @ 2024-07-21  2:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Julia Lawall, Jakub Kicinski,
	Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu

On Mon, 15 Jul 2024, Vlastimil Babka wrote:

> First RFC, feel free to ignore for now if too busy with merge window.
> Also in git:
> https://git.kernel.org/vbabka/l/slab-kfree_rcu-destroy-v1r0
> 
> Based on slab/for-next:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-next
> 

Thanks Vlastimil, do you see value in testing of this series right now or 
should we wait for another series to be posted?

I'm happy to run this through testing on a few different architectures, 
but not sure if you have an update baking in the oven that would supersede 
it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy()
  2024-07-21  2:39 ` [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() David Rientjes
@ 2024-07-22  6:49   ` Vlastimil Babka
  0 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-22  6:49 UTC (permalink / raw)
  To: David Rientjes
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Julia Lawall, Jakub Kicinski,
	Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu

On 7/21/24 4:39 AM, David Rientjes wrote:
> On Mon, 15 Jul 2024, Vlastimil Babka wrote:
> 
>> First RFC, feel free to ignore for now if too busy with merge window.
>> Also in git:
>> https://git.kernel.org/vbabka/l/slab-kfree_rcu-destroy-v1r0
>> 
>> Based on slab/for-next:
>> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-next
>> 
> 
> Thanks Vlastimil, do you see value in testing of this series right now or 
> should we wait for another series to be posted?
> 
> I'm happy to run this through testing on a few different architectures, 
> but not sure if you have an update baking in the oven that would supersede 
> it.

Thanks, but such testing would quickly fail until the following is resolved
in patch 5/6 with the help of RCU experts :)

+	// XXX use the real kmem_cache_free_barrier() or similar thing here
+	rcu_barrier();



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count
  2024-07-15 20:29 ` [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count Vlastimil Babka
  2024-07-21  2:37   ` David Rientjes
@ 2024-07-22 14:16   ` Xiongwei Song
  2024-07-26 10:24     ` Vlastimil Babka
  1 sibling, 1 reply; 13+ messages in thread
From: Xiongwei Song @ 2024-07-22 14:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu

Don't we need the following changes for this patch?

diff --git a/mm/slub.c b/mm/slub.c
index c1222467c346..e6beb6743342 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4967,9 +4967,9 @@ init_kmem_cache_node(struct kmem_cache_node *n)
        n->nr_partial = 0;
        spin_lock_init(&n->list_lock);
        INIT_LIST_HEAD(&n->partial);
-#ifdef CONFIG_SLUB_DEBUG
        atomic_long_set(&n->nr_slabs, 0);
        atomic_long_set(&n->total_objects, 0);
+#ifdef CONFIG_SLUB_DEBUG
        INIT_LIST_HEAD(&n->full);
 #endif
 }

Thanks,
Xiongwei


On Tue, Jul 16, 2024 at 4:29 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Currently SLUB counts per-node slabs and total objects only with
> CONFIG_SLUB_DEBUG, in order to minimize overhead. However, the detection
> in __kmem_cache_shutdown() whether there are no outstanding object
> relies on the per-node slab count (node_nr_slabs()) so it may be
> unreliable without CONFIG_SLUB_DEBUG. Thus we might be failing to warn
> about such situations, and instead destroy a cache while leaving its
> slab(s) around (due to a buggy slab user creating such a scenario, not
> in normal operation).
>
> We will also need node_nr_slabs() to be reliable in the following work
> to gracefully handle kmem_cache_destroy() with kfree_rcu() objects in
> flight. Thus make the counting of per-node slabs and objects
> unconditional.
>
> Note that CONFIG_SLUB_DEBUG is the default anyway, and the counting is
> done only when allocating or freeing a slab page, so even in
> !CONFIG_SLUB_DEBUG configs the overhead should be negligible.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/slub.c | 49 +++++++++++++++++++++----------------------------
>  1 file changed, 21 insertions(+), 28 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 829a1f08e8a2..aa4d80109c49 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -426,9 +426,9 @@ struct kmem_cache_node {
>         spinlock_t list_lock;
>         unsigned long nr_partial;
>         struct list_head partial;
> -#ifdef CONFIG_SLUB_DEBUG
>         atomic_long_t nr_slabs;
>         atomic_long_t total_objects;
> +#ifdef CONFIG_SLUB_DEBUG
>         struct list_head full;
>  #endif
>  };
> @@ -438,6 +438,26 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>         return s->node[node];
>  }
>
> +static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
> +{
> +       return atomic_long_read(&n->nr_slabs);
> +}
> +
> +static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
> +{
> +       struct kmem_cache_node *n = get_node(s, node);
> +
> +       atomic_long_inc(&n->nr_slabs);
> +       atomic_long_add(objects, &n->total_objects);
> +}
> +static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
> +{
> +       struct kmem_cache_node *n = get_node(s, node);
> +
> +       atomic_long_dec(&n->nr_slabs);
> +       atomic_long_sub(objects, &n->total_objects);
> +}
> +
>  /*
>   * Iterator over all nodes. The body will be executed for each node that has
>   * a kmem_cache_node structure allocated (which is true for all online nodes)
> @@ -1511,26 +1531,6 @@ static void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct
>         list_del(&slab->slab_list);
>  }
>
> -static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
> -{
> -       return atomic_long_read(&n->nr_slabs);
> -}
> -
> -static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
> -{
> -       struct kmem_cache_node *n = get_node(s, node);
> -
> -       atomic_long_inc(&n->nr_slabs);
> -       atomic_long_add(objects, &n->total_objects);
> -}
> -static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
> -{
> -       struct kmem_cache_node *n = get_node(s, node);
> -
> -       atomic_long_dec(&n->nr_slabs);
> -       atomic_long_sub(objects, &n->total_objects);
> -}
> -
>  /* Object debug checks for alloc/free paths */
>  static void setup_object_debug(struct kmem_cache *s, void *object)
>  {
> @@ -1871,13 +1871,6 @@ slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name)
>
>  #define disable_higher_order_debug 0
>
> -static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
> -                                                       { return 0; }
> -static inline void inc_slabs_node(struct kmem_cache *s, int node,
> -                                                       int objects) {}
> -static inline void dec_slabs_node(struct kmem_cache *s, int node,
> -                                                       int objects) {}
> -
>  #ifndef CONFIG_SLUB_TINY
>  static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
>                                void **freelist, void *nextfree)
>
> --
> 2.45.2
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count
  2024-07-22 14:16   ` Xiongwei Song
@ 2024-07-26 10:24     ` Vlastimil Babka
  0 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2024-07-26 10:24 UTC (permalink / raw)
  To: Xiongwei Song
  Cc: Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
	Christoph Lameter, David Rientjes, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Julia Lawall,
	Jakub Kicinski, Jason A. Donenfeld, Uladzislau Rezki (Sony),
	Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
	linux-kernel, rcu

On 7/22/24 4:16 PM, Xiongwei Song wrote:
> Don't we need the following changes for this patch?

Yes thanks, will fix!

> diff --git a/mm/slub.c b/mm/slub.c
> index c1222467c346..e6beb6743342 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4967,9 +4967,9 @@ init_kmem_cache_node(struct kmem_cache_node *n)
>         n->nr_partial = 0;
>         spin_lock_init(&n->list_lock);
>         INIT_LIST_HEAD(&n->partial);
> -#ifdef CONFIG_SLUB_DEBUG
>         atomic_long_set(&n->nr_slabs, 0);
>         atomic_long_set(&n->total_objects, 0);
> +#ifdef CONFIG_SLUB_DEBUG
>         INIT_LIST_HEAD(&n->full);
>  #endif
>  }
> 
> Thanks,
> Xiongwei
> 
> 
> On Tue, Jul 16, 2024 at 4:29 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> Currently SLUB counts per-node slabs and total objects only with
>> CONFIG_SLUB_DEBUG, in order to minimize overhead. However, the detection
>> in __kmem_cache_shutdown() whether there are no outstanding object
>> relies on the per-node slab count (node_nr_slabs()) so it may be
>> unreliable without CONFIG_SLUB_DEBUG. Thus we might be failing to warn
>> about such situations, and instead destroy a cache while leaving its
>> slab(s) around (due to a buggy slab user creating such a scenario, not
>> in normal operation).
>>
>> We will also need node_nr_slabs() to be reliable in the following work
>> to gracefully handle kmem_cache_destroy() with kfree_rcu() objects in
>> flight. Thus make the counting of per-node slabs and objects
>> unconditional.
>>
>> Note that CONFIG_SLUB_DEBUG is the default anyway, and the counting is
>> done only when allocating or freeing a slab page, so even in
>> !CONFIG_SLUB_DEBUG configs the overhead should be negligible.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>  mm/slub.c | 49 +++++++++++++++++++++----------------------------
>>  1 file changed, 21 insertions(+), 28 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 829a1f08e8a2..aa4d80109c49 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -426,9 +426,9 @@ struct kmem_cache_node {
>>         spinlock_t list_lock;
>>         unsigned long nr_partial;
>>         struct list_head partial;
>> -#ifdef CONFIG_SLUB_DEBUG
>>         atomic_long_t nr_slabs;
>>         atomic_long_t total_objects;
>> +#ifdef CONFIG_SLUB_DEBUG
>>         struct list_head full;
>>  #endif
>>  };
>> @@ -438,6 +438,26 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>>         return s->node[node];
>>  }
>>
>> +static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
>> +{
>> +       return atomic_long_read(&n->nr_slabs);
>> +}
>> +
>> +static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
>> +{
>> +       struct kmem_cache_node *n = get_node(s, node);
>> +
>> +       atomic_long_inc(&n->nr_slabs);
>> +       atomic_long_add(objects, &n->total_objects);
>> +}
>> +static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
>> +{
>> +       struct kmem_cache_node *n = get_node(s, node);
>> +
>> +       atomic_long_dec(&n->nr_slabs);
>> +       atomic_long_sub(objects, &n->total_objects);
>> +}
>> +
>>  /*
>>   * Iterator over all nodes. The body will be executed for each node that has
>>   * a kmem_cache_node structure allocated (which is true for all online nodes)
>> @@ -1511,26 +1531,6 @@ static void remove_full(struct kmem_cache *s, struct kmem_cache_node *n, struct
>>         list_del(&slab->slab_list);
>>  }
>>
>> -static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
>> -{
>> -       return atomic_long_read(&n->nr_slabs);
>> -}
>> -
>> -static inline void inc_slabs_node(struct kmem_cache *s, int node, int objects)
>> -{
>> -       struct kmem_cache_node *n = get_node(s, node);
>> -
>> -       atomic_long_inc(&n->nr_slabs);
>> -       atomic_long_add(objects, &n->total_objects);
>> -}
>> -static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects)
>> -{
>> -       struct kmem_cache_node *n = get_node(s, node);
>> -
>> -       atomic_long_dec(&n->nr_slabs);
>> -       atomic_long_sub(objects, &n->total_objects);
>> -}
>> -
>>  /* Object debug checks for alloc/free paths */
>>  static void setup_object_debug(struct kmem_cache *s, void *object)
>>  {
>> @@ -1871,13 +1871,6 @@ slab_flags_t kmem_cache_flags(slab_flags_t flags, const char *name)
>>
>>  #define disable_higher_order_debug 0
>>
>> -static inline unsigned long node_nr_slabs(struct kmem_cache_node *n)
>> -                                                       { return 0; }
>> -static inline void inc_slabs_node(struct kmem_cache *s, int node,
>> -                                                       int objects) {}
>> -static inline void dec_slabs_node(struct kmem_cache *s, int node,
>> -                                                       int objects) {}
>> -
>>  #ifndef CONFIG_SLUB_TINY
>>  static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab,
>>                                void **freelist, void *nextfree)
>>
>> --
>> 2.45.2
>>
>>
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-07-26 10:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-15 20:29 [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() Vlastimil Babka
2024-07-15 20:29 ` [PATCH RFC 1/6] mm, slab: make caches with refcount of 0 unmergeable Vlastimil Babka
2024-07-21  2:36   ` David Rientjes
2024-07-15 20:29 ` [PATCH RFC 2/6] mm, slab: always maintain per-node slab and object count Vlastimil Babka
2024-07-21  2:37   ` David Rientjes
2024-07-22 14:16   ` Xiongwei Song
2024-07-26 10:24     ` Vlastimil Babka
2024-07-15 20:29 ` [PATCH RFC 3/6] mm, slab: unlink sysfs and debugfs immediately Vlastimil Babka
2024-07-15 20:29 ` [PATCH RFC 4/6] mm, slab: simplify kmem_cache_release() Vlastimil Babka
2024-07-15 20:29 ` [PATCH RFC 5/6] mm, slab: asynchronously destroy caches with outstanding objects Vlastimil Babka
2024-07-15 20:29 ` [PATCH RFC 6/6] kunit, slub: add test_kfree_rcu() Vlastimil Babka
2024-07-21  2:39 ` [PATCH RFC 0/6] mm, slub: handle pending kfree_rcu() in kmem_cache_destroy() David Rientjes
2024-07-22  6:49   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox