* [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB
@ 2025-01-23 10:37 Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of " Vlastimil Babka
` (6 more replies)
0 siblings, 7 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-01-23 10:37 UTC (permalink / raw)
To: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu,
Vlastimil Babka
Following the move of the TREE_RCU batching kvfree_rcu() implementation
to slab, we still have the simple non-batching implementation in tiny
RCU, and RCU implementation specific ifdefs in slab code.
Finish the move and integration into slab. Allow using the simple
call_rcu() based implementation also with tree RCU when SLUB_TINY is
enabled, as its goal is also to limit memory footprint with less concern
for top performance.
In order to avoid RCU having to recognize the fake callback function
pointers (__is_kvfree_rcu_offset()) when handling call_rcu(), implement
a callback that can calculate the object's address from the embedded
rcu_head pointer without knowing the specific offset (previously SLOB
would not have made it possible, but it's gone now).
After this series, AFAIK only the following kvfree_rcu specific code
remains in RCU:
- a call to kfree_rcu_scheduler_running() from rcu_set_runtime_mode()
- probably necessary and a generic registration interface would be
unnecessary bloat?
- declarations of kfree_rcu() API in include/linux/rcupdate.h
- could be moved to slab.h after checking for/fixing up potential
missing includes
git tree:
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slub-tiny-kfree_rcu
To: Christoph Lameter <cl@linux.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org
To: "Paul E. McKenney" <paulmck@kernel.org>
To: Joel Fernandes <joel@joelfernandes.org>
To: Josh Triplett <josh@joshtriplett.org>
To: Boqun Feng <boqun.feng@gmail.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Zqiang <qiang.zhang1211@gmail.com>
Cc: rcu@vger.kernel.org
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Vlastimil Babka (4):
slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
rcu: remove trace_rcu_kvfree_callback
rcu, slab: use a regular callback function for kvfree_rcu
slab: don't batch kvfree_rcu() with SLUB_TINY
include/linux/rcupdate.h | 29 ++++++++++++++---------------
include/linux/rcutiny.h | 36 ------------------------------------
include/linux/rcutree.h | 3 ---
include/linux/slab.h | 14 ++++++++++++++
include/trace/events/rcu.h | 34 ----------------------------------
kernel/rcu/tiny.c | 24 ------------------------
kernel/rcu/tree.c | 9 ++-------
mm/Kconfig | 4 ++++
mm/slab.h | 2 ++
mm/slab_common.c | 32 +++++++++++++++++++++++++-------
mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
11 files changed, 103 insertions(+), 126 deletions(-)
---
base-commit: e492fac3657b60b8dd78a6e8ca26d1d14706c7b3
change-id: 20250123-slub-tiny-kfree_rcu-bd65bfe222f2
Best regards,
--
Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
@ 2025-01-23 10:37 ` Vlastimil Babka
2025-01-24 12:56 ` Uladzislau Rezki
2025-01-23 10:37 ` [PATCH RFC 2/4] rcu: remove trace_rcu_kvfree_callback Vlastimil Babka
` (5 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2025-01-23 10:37 UTC (permalink / raw)
To: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu,
Vlastimil Babka
Following the move of TREE_RCU implementation, let's move also the
TINY_RCU one for consistency and subsequent refactoring.
For simplicity, remove the separate inline __kvfree_call_rcu() as
TINY_RCU is not meant for high-performance hardware anyway.
Declare kvfree_call_rcu() in rcupdate.h to avoid header dependency
issues.
Also move the kvfree_rcu_barrier() declaration to slab.h
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
include/linux/rcupdate.h | 5 +++++
include/linux/rcutiny.h | 36 ------------------------------------
include/linux/rcutree.h | 3 ---
include/linux/slab.h | 14 ++++++++++++++
kernel/rcu/tiny.c | 11 -----------
mm/slab_common.c | 20 ++++++++++++++++++--
6 files changed, 37 insertions(+), 52 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 48e5c03df1dd83c246a61d0fcc8aa638adcd7654..3f70d1c8144426f40553c8c589f07097ece8a706 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1082,6 +1082,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
#define kfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
#define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
+/*
+ * In mm/slab_common.c, no suitable header to include here.
+ */
+void kvfree_call_rcu(struct rcu_head *head, void *ptr);
+
#define kvfree_rcu_arg_2(ptr, rhf) \
do { \
typeof (ptr) ___p = (ptr); \
diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index fe42315f667fc5be7f2ed8eae6ea0c7193030846..f519cd6802286710bdd56588b5ff3d07bcd30b92 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -90,41 +90,6 @@ static inline void synchronize_rcu_expedited(void)
synchronize_rcu();
}
-/*
- * Add one more declaration of kvfree() here. It is
- * not so straight forward to just include <linux/mm.h>
- * where it is defined due to getting many compile
- * errors caused by that include.
- */
-extern void kvfree(const void *addr);
-
-static inline void __kvfree_call_rcu(struct rcu_head *head, void *ptr)
-{
- if (head) {
- call_rcu(head, (rcu_callback_t) ((void *) head - ptr));
- return;
- }
-
- // kvfree_rcu(one_arg) call.
- might_sleep();
- synchronize_rcu();
- kvfree(ptr);
-}
-
-static inline void kvfree_rcu_barrier(void)
-{
- rcu_barrier();
-}
-
-#ifdef CONFIG_KASAN_GENERIC
-void kvfree_call_rcu(struct rcu_head *head, void *ptr);
-#else
-static inline void kvfree_call_rcu(struct rcu_head *head, void *ptr)
-{
- __kvfree_call_rcu(head, ptr);
-}
-#endif
-
void rcu_qs(void);
static inline void rcu_softirq_qs(void)
@@ -164,7 +129,6 @@ static inline void rcu_end_inkernel_boot(void) { }
static inline bool rcu_inkernel_boot_has_ended(void) { return true; }
static inline bool rcu_is_watching(void) { return true; }
static inline void rcu_momentary_eqs(void) { }
-static inline void kfree_rcu_scheduler_running(void) { }
/* Avoid RCU read-side critical sections leaking across. */
static inline void rcu_all_qs(void) { barrier(); }
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 27d86d9127817e50f8d4dd79e1990d70a02435bb..dbe77b5fe06ec89a393b5444d6c479ced346a37b 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -34,12 +34,9 @@ static inline void rcu_virt_note_context_switch(void)
}
void synchronize_rcu_expedited(void);
-void kvfree_call_rcu(struct rcu_head *head, void *ptr);
-void kvfree_rcu_barrier(void);
void rcu_barrier(void);
void rcu_momentary_eqs(void);
-void kfree_rcu_scheduler_running(void);
struct rcu_gp_oldstate {
unsigned long rgos_norm;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 09eedaecf1205672bb2e7c8cd57ae8fccebc2737..bcc62e5656c35c6a3f4caf26fb33d7447dead39a 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -16,6 +16,7 @@
#include <linux/gfp.h>
#include <linux/overflow.h>
#include <linux/types.h>
+#include <linux/rcupdate.h>
#include <linux/workqueue.h>
#include <linux/percpu-refcount.h>
#include <linux/cleanup.h>
@@ -1082,6 +1083,19 @@ extern void kvfree_sensitive(const void *addr, size_t len);
unsigned int kmem_cache_size(struct kmem_cache *s);
+#ifdef CONFIG_TINY_RCU
+static inline void kvfree_rcu_barrier(void)
+{
+ rcu_barrier();
+}
+
+static inline void kfree_rcu_scheduler_running(void) { }
+#else
+void kvfree_rcu_barrier(void);
+
+void kfree_rcu_scheduler_running(void);
+#endif
+
/**
* kmalloc_size_roundup - Report allocation bucket size for the given size
*
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index b3b3ce34df6310f7bddba40b2be1bdf6c9f00232..0ec27093d0e14a4b1060ea08932c4ac13f9b0f26 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -246,17 +246,6 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
}
EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
-#ifdef CONFIG_KASAN_GENERIC
-void kvfree_call_rcu(struct rcu_head *head, void *ptr)
-{
- if (head)
- kasan_record_aux_stack_noalloc(ptr);
-
- __kvfree_call_rcu(head, ptr);
-}
-EXPORT_SYMBOL_GPL(kvfree_call_rcu);
-#endif
-
void __init rcu_init(void)
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 69f2d19010dedaa3e5b303ab9803c8cdd40152fa..330cdd8ebc5380090ee784c58e8ca1d1a52b3758 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1861,7 +1861,23 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
return true;
}
-#if !defined(CONFIG_TINY_RCU)
+#ifdef CONFIG_TINY_RCU
+
+void kvfree_call_rcu(struct rcu_head *head, void *ptr)
+{
+ if (head) {
+ kasan_record_aux_stack_noalloc(ptr);
+ call_rcu(head, (rcu_callback_t) ((void *) head - ptr));
+ return;
+ }
+
+ // kvfree_rcu(one_arg) call.
+ might_sleep();
+ synchronize_rcu();
+ kvfree(ptr);
+}
+
+#else /* !CONFIG_TINY_RCU */
static enum hrtimer_restart
schedule_page_work_fn(struct hrtimer *t)
@@ -2071,7 +2087,7 @@ void kvfree_rcu_barrier(void)
}
EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
-#endif /* #if !defined(CONFIG_TINY_RCU) */
+#endif /* !CONFIG_TINY_RCU */
static unsigned long
kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
--
2.48.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC 2/4] rcu: remove trace_rcu_kvfree_callback
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of " Vlastimil Babka
@ 2025-01-23 10:37 ` Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu Vlastimil Babka
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-01-23 10:37 UTC (permalink / raw)
To: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu,
Vlastimil Babka
Tree RCU does not handle kvfree_rcu() by queueing individual objects by
call_rcu() anymore, thus the tracepoint and associated
__is_kvfree_rcu_offset() check is dead code now. Remove it.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
include/trace/events/rcu.h | 34 ----------------------------------
kernel/rcu/tree.c | 9 ++-------
2 files changed, 2 insertions(+), 41 deletions(-)
diff --git a/include/trace/events/rcu.h b/include/trace/events/rcu.h
index e81431deaa50b21725b08ad7c1a2992a0074065a..ac3b28b8939b78843aecb44bed8cb76c451ab060 100644
--- a/include/trace/events/rcu.h
+++ b/include/trace/events/rcu.h
@@ -560,40 +560,6 @@ TRACE_EVENT_RCU(rcu_segcb_stats,
);
-/*
- * Tracepoint for the registration of a single RCU callback of the special
- * kvfree() form. The first argument is the RCU type, the second argument
- * is a pointer to the RCU callback, the third argument is the offset
- * of the callback within the enclosing RCU-protected data structure,
- * the fourth argument is the number of lazy callbacks queued, and the
- * fifth argument is the total number of callbacks queued.
- */
-TRACE_EVENT_RCU(rcu_kvfree_callback,
-
- TP_PROTO(const char *rcuname, struct rcu_head *rhp, unsigned long offset,
- long qlen),
-
- TP_ARGS(rcuname, rhp, offset, qlen),
-
- TP_STRUCT__entry(
- __field(const char *, rcuname)
- __field(void *, rhp)
- __field(unsigned long, offset)
- __field(long, qlen)
- ),
-
- TP_fast_assign(
- __entry->rcuname = rcuname;
- __entry->rhp = rhp;
- __entry->offset = offset;
- __entry->qlen = qlen;
- ),
-
- TP_printk("%s rhp=%p func=%ld %ld",
- __entry->rcuname, __entry->rhp, __entry->offset,
- __entry->qlen)
-);
-
/*
* Tracepoint for marking the beginning rcu_do_batch, performed to start
* RCU callback invocation. The first argument is the RCU flavor,
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6af042cde9727c8d7f7e3d88f26ba222d0d9c535..f3d5a9caebd9d0772931725bee93c1e172f8b6da 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2932,13 +2932,8 @@ static int __init rcu_spawn_core_kthreads(void)
static void rcutree_enqueue(struct rcu_data *rdp, struct rcu_head *head, rcu_callback_t func)
{
rcu_segcblist_enqueue(&rdp->cblist, head);
- if (__is_kvfree_rcu_offset((unsigned long)func))
- trace_rcu_kvfree_callback(rcu_state.name, head,
- (unsigned long)func,
- rcu_segcblist_n_cbs(&rdp->cblist));
- else
- trace_rcu_callback(rcu_state.name, head,
- rcu_segcblist_n_cbs(&rdp->cblist));
+ trace_rcu_callback(rcu_state.name, head,
+ rcu_segcblist_n_cbs(&rdp->cblist));
trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
}
--
2.48.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of " Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 2/4] rcu: remove trace_rcu_kvfree_callback Vlastimil Babka
@ 2025-01-23 10:37 ` Vlastimil Babka
2025-01-24 12:47 ` Uladzislau Rezki
2025-01-23 10:37 ` [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY Vlastimil Babka
` (3 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2025-01-23 10:37 UTC (permalink / raw)
To: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu,
Vlastimil Babka
RCU has been special-casing callback function pointers that are integers
lower than 4096 as offsets of rcu_head for kvfree() instead. The tree
RCU implementation no longer does that as the batched kvfree_rcu() is
not a simple call_rcu(). The tiny RCU still does, and the plan is also
to make tree RCU use call_rcu() for SLUB_TINY configurations.
Instead of teaching tree RCU again to special case the offsets, let's
remove the special casing completely. Since there's no SLOB anymore, it
is possible to create a callback function that can take a pointer to a
middle of slab object with unknown offset and determine the object's
pointer before freeing it, so implement that as kvfree_rcu_cb().
Large kmalloc and vmalloc allocations are handled simply by aligning
down to page size. For that we retain the requirement that the offset is
smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely
and instead just opencode the condition in the BUILD_BUG_ON() check.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
include/linux/rcupdate.h | 24 +++++++++---------------
kernel/rcu/tiny.c | 13 -------------
mm/slab.h | 2 ++
mm/slab_common.c | 5 +----
mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 54 insertions(+), 32 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
#define RCU_POINTER_INITIALIZER(p, v) \
.p = RCU_INITIALIZER(v)
-/*
- * Does the specified offset indicate that the corresponding rcu_head
- * structure can be handled by kvfree_rcu()?
- */
-#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
-
/**
* kfree_rcu() - kfree an object after a grace period.
* @ptr: pointer to kfree for double-argument invocations.
@@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
* when they are used in a kernel module, that module must invoke the
* high-latency rcu_barrier() function at module-unload time.
*
- * The kfree_rcu() function handles this issue. Rather than encoding a
- * function address in the embedded rcu_head structure, kfree_rcu() instead
- * encodes the offset of the rcu_head structure within the base structure.
- * Because the functions are not allowed in the low-order 4096 bytes of
- * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
+ * The kfree_rcu() function handles this issue. In order to have a universal
+ * callback function handling different offsets of rcu_head, the callback needs
+ * to determine the starting address of the freed object, which can be a large
+ * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to
+ * page boundary for those, only offsets up to 4095 bytes can be accommodated.
* If the offset is larger than 4095 bytes, a compile-time error will
* be generated in kvfree_rcu_arg_2(). If this error is triggered, you can
* either fall back to use of call_rcu() or rearrange the structure to
@@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr);
do { \
typeof (ptr) ___p = (ptr); \
\
- if (___p) { \
- BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
- kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
- } \
+ if (___p) { \
+ BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \
+ kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
+ } \
} while (0)
#define kvfree_rcu_arg_1(ptr) \
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index 0ec27093d0e14a4b1060ea08932c4ac13f9b0f26..77e0db0221364376a99ebeb17485650879385a6e 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -88,12 +88,6 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
unsigned long offset = (unsigned long)head->func;
rcu_lock_acquire(&rcu_callback_map);
- if (__is_kvfree_rcu_offset(offset)) {
- trace_rcu_invoke_kvfree_callback("", head, offset);
- kvfree((void *)head - offset);
- rcu_lock_release(&rcu_callback_map);
- return true;
- }
trace_rcu_invoke_callback("", head);
f = head->func;
@@ -159,10 +153,6 @@ void synchronize_rcu(void)
}
EXPORT_SYMBOL_GPL(synchronize_rcu);
-static void tiny_rcu_leak_callback(struct rcu_head *rhp)
-{
-}
-
/*
* Post an RCU callback to be invoked after the end of an RCU grace
* period. But since we have but one CPU, that would be after any
@@ -178,9 +168,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
pr_err("%s(): Double-freed CB %p->%pS()!!! ", __func__, head, head->func);
mem_dump_obj(head);
}
-
- if (!__is_kvfree_rcu_offset((unsigned long)head->func))
- WRITE_ONCE(head->func, tiny_rcu_leak_callback);
return;
}
diff --git a/mm/slab.h b/mm/slab.h
index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects, struct slabobj_ext *obj_exts);
#endif
+void kvfree_rcu_cb(struct rcu_head *head);
+
size_t __ksize(const void *objp);
static inline size_t slab_ksize(const struct kmem_cache *s)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head)
rcu_lock_acquire(&rcu_callback_map);
trace_rcu_invoke_kvfree_callback("slab", head, offset);
- if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
- kvfree(ptr);
-
rcu_lock_release(&rcu_callback_map);
cond_resched_tasks_rcu_qs();
}
@@ -1867,7 +1864,7 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
{
if (head) {
kasan_record_aux_stack_noalloc(ptr);
- call_rcu(head, (rcu_callback_t) ((void *) head - ptr));
+ call_rcu(head, kvfree_rcu_cb);
return;
}
diff --git a/mm/slub.c b/mm/slub.c
index c2151c9fee228d121a9cbcc220c3ae054769dacf..651381bf05566e88de8493e0550f121d23b757a1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -19,6 +19,7 @@
#include <linux/bitops.h>
#include <linux/slab.h>
#include "slab.h"
+#include <linux/vmalloc.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kasan.h>
@@ -4732,6 +4733,47 @@ static void free_large_kmalloc(struct folio *folio, void *object)
folio_put(folio);
}
+void kvfree_rcu_cb(struct rcu_head *head)
+{
+ void *obj = head;
+ struct folio *folio;
+ struct slab *slab;
+ struct kmem_cache *s;
+ void *slab_addr;
+
+ if (unlikely(is_vmalloc_addr(obj))) {
+ obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
+ vfree(obj);
+ return;
+ }
+
+ folio = virt_to_folio(obj);
+ if (unlikely(!folio_test_slab(folio))) {
+ /*
+ * rcu_head offset can be only less than page size so no need to
+ * consider folio order
+ */
+ obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
+ free_large_kmalloc(folio, obj);
+ return;
+ }
+
+ slab = folio_slab(folio);
+ s = slab->slab_cache;
+ slab_addr = folio_address(folio);
+
+ if (is_kfence_address(obj)) {
+ obj = kfence_object_start(obj);
+ } else {
+ unsigned int idx = __obj_to_index(s, slab_addr, obj);
+
+ obj = slab_addr + s->size * idx;
+ obj = fixup_red_left(s, obj);
+ }
+
+ slab_free(s, slab, obj, _RET_IP_);
+}
+
/**
* kfree - free previously allocated memory
* @object: pointer returned by kmalloc() or kmem_cache_alloc()
--
2.48.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
` (2 preceding siblings ...)
2025-01-23 10:37 ` [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu Vlastimil Babka
@ 2025-01-23 10:37 ` Vlastimil Babka
2025-01-24 12:11 ` Uladzislau Rezki
2025-01-23 19:37 ` [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Joel Fernandes
` (2 subsequent siblings)
6 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2025-01-23 10:37 UTC (permalink / raw)
To: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu,
Vlastimil Babka
kvfree_rcu() is batched for better performance except on TINY_RCU, which
is a simple implementation for small UP systems. Similarly SLUB_TINY is
an option intended for small systems, whether or not used together with
TINY_RCU. In case SLUB_TINY is used with !TINY_RCU, it makes arguably
sense to not do the batching and limit the memory footprint. It's also
suboptimal to have RCU-specific #ifdefs in slab code.
With that, add CONFIG_KFREE_RCU_BATCHED to determine whether batching
kvfree_rcu() implementation is used. It is not set by a user prompt, but
enabled by default and disabled in case TINY_RCU or SLUB_TINY are
enabled.
Use the new config for #ifdef's in slab code and extend their scope to
cover all code used by the batched kvfree_rcu(). For example there's no
need to perform kvfree_rcu_init() if the batching is disabled.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
include/linux/slab.h | 2 +-
mm/Kconfig | 4 ++++
mm/slab_common.c | 45 +++++++++++++++++++++++++--------------------
3 files changed, 30 insertions(+), 21 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index bcc62e5656c35c6a3f4caf26fb33d7447dead39a..9faf33734a8eee2425b90e679c0457ab459422a3 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1083,7 +1083,7 @@ extern void kvfree_sensitive(const void *addr, size_t len);
unsigned int kmem_cache_size(struct kmem_cache *s);
-#ifdef CONFIG_TINY_RCU
+#ifndef CONFIG_KFREE_RCU_BATCHED
static inline void kvfree_rcu_barrier(void)
{
rcu_barrier();
diff --git a/mm/Kconfig b/mm/Kconfig
index 84000b01680869801a10f56f06d0c43d6521a8d2..e513308a4aed640ee556ecb5793c7f3f195bbcae 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -242,6 +242,10 @@ menu "Slab allocator options"
config SLUB
def_bool y
+config KFREE_RCU_BATCHED
+ def_bool y
+ depends on !SLUB_TINY && !TINY_RCU
+
config SLUB_TINY
bool "Configure for minimal memory footprint"
depends on EXPERT
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f13d2c901daf1419993620459fbd5845eecb85f1..9f6d66313afc6684bdc0f32908fe01c83c60f283 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1284,6 +1284,28 @@ EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
EXPORT_TRACEPOINT_SYMBOL(kfree);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
+#ifndef CONFIG_KFREE_RCU_BATCHED
+
+void kvfree_call_rcu(struct rcu_head *head, void *ptr)
+{
+ if (head) {
+ kasan_record_aux_stack_noalloc(ptr);
+ call_rcu(head, kvfree_rcu_cb);
+ return;
+ }
+
+ // kvfree_rcu(one_arg) call.
+ might_sleep();
+ synchronize_rcu();
+ kvfree(ptr);
+}
+
+void __init kvfree_rcu_init(void)
+{
+}
+
+#else /* CONFIG_KFREE_RCU_BATCHED */
+
/*
* This rcu parameter is runtime-read-only. It reflects
* a minimum allowed number of objects which can be cached
@@ -1858,24 +1880,6 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
return true;
}
-#ifdef CONFIG_TINY_RCU
-
-void kvfree_call_rcu(struct rcu_head *head, void *ptr)
-{
- if (head) {
- kasan_record_aux_stack_noalloc(ptr);
- call_rcu(head, kvfree_rcu_cb);
- return;
- }
-
- // kvfree_rcu(one_arg) call.
- might_sleep();
- synchronize_rcu();
- kvfree(ptr);
-}
-
-#else /* !CONFIG_TINY_RCU */
-
static enum hrtimer_restart
schedule_page_work_fn(struct hrtimer *t)
{
@@ -2084,8 +2088,6 @@ void kvfree_rcu_barrier(void)
}
EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
-#endif /* !CONFIG_TINY_RCU */
-
static unsigned long
kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
{
@@ -2175,3 +2177,6 @@ void __init kvfree_rcu_init(void)
shrinker_register(kfree_rcu_shrinker);
}
+
+#endif /* CONFIG_KFREE_RCU_BATCHED */
+
--
2.48.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
` (3 preceding siblings ...)
2025-01-23 10:37 ` [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY Vlastimil Babka
@ 2025-01-23 19:37 ` Joel Fernandes
2025-01-23 20:26 ` Paul E. McKenney
2025-01-24 14:25 ` Vlastimil Babka
6 siblings, 0 replies; 13+ messages in thread
From: Joel Fernandes @ 2025-01-23 19:37 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Lameter, David Rientjes, Paul E. McKenney,
Josh Triplett, Boqun Feng, Uladzislau Rezki, Andrew Morton,
Roman Gushchin, Hyeonggon Yoo, linux-mm, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu
On Thu, Jan 23, 2025 at 11:37:17AM +0100, Vlastimil Babka wrote:
> Following the move of the TREE_RCU batching kvfree_rcu() implementation
> to slab, we still have the simple non-batching implementation in tiny
> RCU, and RCU implementation specific ifdefs in slab code.
>
> Finish the move and integration into slab. Allow using the simple
> call_rcu() based implementation also with tree RCU when SLUB_TINY is
> enabled, as its goal is also to limit memory footprint with less concern
> for top performance.
>
> In order to avoid RCU having to recognize the fake callback function
> pointers (__is_kvfree_rcu_offset()) when handling call_rcu(), implement
> a callback that can calculate the object's address from the embedded
> rcu_head pointer without knowing the specific offset (previously SLOB
> would not have made it possible, but it's gone now).
>
> After this series, AFAIK only the following kvfree_rcu specific code
> remains in RCU:
>
> - a call to kfree_rcu_scheduler_running() from rcu_set_runtime_mode()
>
> - probably necessary and a generic registration interface would be
> unnecessary bloat?
>
> - declarations of kfree_rcu() API in include/linux/rcupdate.h
>
> - could be moved to slab.h after checking for/fixing up potential
> missing includes
>
> git tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slub-tiny-kfree_rcu
Looks good to me, except typo in third patch 'kmalloc of vmalloc allocation'
should be 'kmalloc or vmalloc allocation'.
Maybe also add some comments on top of 'kvfree_rcu_cb()'. Like:
/*
* Given an rcu_head embedded within an object at an offset < 4k,
* free the object in question.
*/
I like the touch with the config option in the last patch!
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
thanks,
- Joel
> To: Christoph Lameter <cl@linux.com>
> To: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: linux-mm@kvack.org
> To: "Paul E. McKenney" <paulmck@kernel.org>
> To: Joel Fernandes <joel@joelfernandes.org>
> To: Josh Triplett <josh@joshtriplett.org>
> To: Boqun Feng <boqun.feng@gmail.com>
> To: Uladzislau Rezki <urezki@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Cc: rcu@vger.kernel.org
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> Vlastimil Babka (4):
> slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
> rcu: remove trace_rcu_kvfree_callback
> rcu, slab: use a regular callback function for kvfree_rcu
> slab: don't batch kvfree_rcu() with SLUB_TINY
>
> include/linux/rcupdate.h | 29 ++++++++++++++---------------
> include/linux/rcutiny.h | 36 ------------------------------------
> include/linux/rcutree.h | 3 ---
> include/linux/slab.h | 14 ++++++++++++++
> include/trace/events/rcu.h | 34 ----------------------------------
> kernel/rcu/tiny.c | 24 ------------------------
> kernel/rcu/tree.c | 9 ++-------
> mm/Kconfig | 4 ++++
> mm/slab.h | 2 ++
> mm/slab_common.c | 32 +++++++++++++++++++++++++-------
> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 11 files changed, 103 insertions(+), 126 deletions(-)
> ---
> base-commit: e492fac3657b60b8dd78a6e8ca26d1d14706c7b3
> change-id: 20250123-slub-tiny-kfree_rcu-bd65bfe222f2
>
> Best regards,
> --
> Vlastimil Babka <vbabka@suse.cz>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
` (4 preceding siblings ...)
2025-01-23 19:37 ` [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Joel Fernandes
@ 2025-01-23 20:26 ` Paul E. McKenney
2025-01-24 14:25 ` Vlastimil Babka
6 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2025-01-23 20:26 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Lameter, David Rientjes, Joel Fernandes, Josh Triplett,
Boqun Feng, Uladzislau Rezki, Andrew Morton, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, rcu
On Thu, Jan 23, 2025 at 11:37:17AM +0100, Vlastimil Babka wrote:
> Following the move of the TREE_RCU batching kvfree_rcu() implementation
> to slab, we still have the simple non-batching implementation in tiny
> RCU, and RCU implementation specific ifdefs in slab code.
>
> Finish the move and integration into slab. Allow using the simple
> call_rcu() based implementation also with tree RCU when SLUB_TINY is
> enabled, as its goal is also to limit memory footprint with less concern
> for top performance.
>
> In order to avoid RCU having to recognize the fake callback function
> pointers (__is_kvfree_rcu_offset()) when handling call_rcu(), implement
> a callback that can calculate the object's address from the embedded
> rcu_head pointer without knowing the specific offset (previously SLOB
> would not have made it possible, but it's gone now).
>
> After this series, AFAIK only the following kvfree_rcu specific code
> remains in RCU:
>
> - a call to kfree_rcu_scheduler_running() from rcu_set_runtime_mode()
>
> - probably necessary and a generic registration interface would be
> unnecessary bloat?
>
> - declarations of kfree_rcu() API in include/linux/rcupdate.h
>
> - could be moved to slab.h after checking for/fixing up potential
> missing includes
>
> git tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slub-tiny-kfree_rcu
>
> To: Christoph Lameter <cl@linux.com>
> To: David Rientjes <rientjes@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: linux-mm@kvack.org
> To: "Paul E. McKenney" <paulmck@kernel.org>
> To: Joel Fernandes <joel@joelfernandes.org>
> To: Josh Triplett <josh@joshtriplett.org>
> To: Boqun Feng <boqun.feng@gmail.com>
> To: Uladzislau Rezki <urezki@gmail.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Cc: rcu@vger.kernel.org
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
> ---
> Vlastimil Babka (4):
> slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
> rcu: remove trace_rcu_kvfree_callback
> rcu, slab: use a regular callback function for kvfree_rcu
> slab: don't batch kvfree_rcu() with SLUB_TINY
>
> include/linux/rcupdate.h | 29 ++++++++++++++---------------
> include/linux/rcutiny.h | 36 ------------------------------------
> include/linux/rcutree.h | 3 ---
> include/linux/slab.h | 14 ++++++++++++++
> include/trace/events/rcu.h | 34 ----------------------------------
> kernel/rcu/tiny.c | 24 ------------------------
> kernel/rcu/tree.c | 9 ++-------
> mm/Kconfig | 4 ++++
> mm/slab.h | 2 ++
> mm/slab_common.c | 32 +++++++++++++++++++++++++-------
> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 11 files changed, 103 insertions(+), 126 deletions(-)
> ---
> base-commit: e492fac3657b60b8dd78a6e8ca26d1d14706c7b3
> change-id: 20250123-slub-tiny-kfree_rcu-bd65bfe222f2
>
> Best regards,
> --
> Vlastimil Babka <vbabka@suse.cz>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY
2025-01-23 10:37 ` [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY Vlastimil Babka
@ 2025-01-24 12:11 ` Uladzislau Rezki
0 siblings, 0 replies; 13+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 12:11 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu
On Thu, Jan 23, 2025 at 11:37:21AM +0100, Vlastimil Babka wrote:
> kvfree_rcu() is batched for better performance except on TINY_RCU, which
> is a simple implementation for small UP systems. Similarly SLUB_TINY is
> an option intended for small systems, whether or not used together with
> TINY_RCU. In case SLUB_TINY is used with !TINY_RCU, it makes arguably
> sense to not do the batching and limit the memory footprint. It's also
> suboptimal to have RCU-specific #ifdefs in slab code.
>
> With that, add CONFIG_KFREE_RCU_BATCHED to determine whether batching
> kvfree_rcu() implementation is used. It is not set by a user prompt, but
> enabled by default and disabled in case TINY_RCU or SLUB_TINY are
> enabled.
>
> Use the new config for #ifdef's in slab code and extend their scope to
> cover all code used by the batched kvfree_rcu(). For example there's no
> need to perform kvfree_rcu_init() if the batching is disabled.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/slab.h | 2 +-
> mm/Kconfig | 4 ++++
> mm/slab_common.c | 45 +++++++++++++++++++++++++--------------------
> 3 files changed, 30 insertions(+), 21 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index bcc62e5656c35c6a3f4caf26fb33d7447dead39a..9faf33734a8eee2425b90e679c0457ab459422a3 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -1083,7 +1083,7 @@ extern void kvfree_sensitive(const void *addr, size_t len);
>
> unsigned int kmem_cache_size(struct kmem_cache *s);
>
> -#ifdef CONFIG_TINY_RCU
> +#ifndef CONFIG_KFREE_RCU_BATCHED
> static inline void kvfree_rcu_barrier(void)
> {
> rcu_barrier();
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 84000b01680869801a10f56f06d0c43d6521a8d2..e513308a4aed640ee556ecb5793c7f3f195bbcae 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -242,6 +242,10 @@ menu "Slab allocator options"
> config SLUB
> def_bool y
>
> +config KFREE_RCU_BATCHED
> + def_bool y
> + depends on !SLUB_TINY && !TINY_RCU
> +
> config SLUB_TINY
> bool "Configure for minimal memory footprint"
> depends on EXPERT
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index f13d2c901daf1419993620459fbd5845eecb85f1..9f6d66313afc6684bdc0f32908fe01c83c60f283 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1284,6 +1284,28 @@ EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
> EXPORT_TRACEPOINT_SYMBOL(kfree);
> EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
>
> +#ifndef CONFIG_KFREE_RCU_BATCHED
> +
> +void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> +{
> + if (head) {
> + kasan_record_aux_stack_noalloc(ptr);
> + call_rcu(head, kvfree_rcu_cb);
> + return;
> + }
> +
> + // kvfree_rcu(one_arg) call.
> + might_sleep();
> + synchronize_rcu();
> + kvfree(ptr);
> +}
> +
> +void __init kvfree_rcu_init(void)
> +{
> +}
> +
> +#else /* CONFIG_KFREE_RCU_BATCHED */
> +
> /*
> * This rcu parameter is runtime-read-only. It reflects
> * a minimum allowed number of objects which can be cached
> @@ -1858,24 +1880,6 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> return true;
> }
>
> -#ifdef CONFIG_TINY_RCU
> -
> -void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> -{
> - if (head) {
> - kasan_record_aux_stack_noalloc(ptr);
> - call_rcu(head, kvfree_rcu_cb);
> - return;
> - }
> -
> - // kvfree_rcu(one_arg) call.
> - might_sleep();
> - synchronize_rcu();
> - kvfree(ptr);
> -}
> -
> -#else /* !CONFIG_TINY_RCU */
> -
> static enum hrtimer_restart
> schedule_page_work_fn(struct hrtimer *t)
> {
> @@ -2084,8 +2088,6 @@ void kvfree_rcu_barrier(void)
> }
> EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>
> -#endif /* !CONFIG_TINY_RCU */
> -
> static unsigned long
> kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> {
> @@ -2175,3 +2177,6 @@ void __init kvfree_rcu_init(void)
>
> shrinker_register(kfree_rcu_shrinker);
> }
> +
> +#endif /* CONFIG_KFREE_RCU_BATCHED */
> +
>
> --
> 2.48.1
>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
A small nit: CONFIG_KVFREE_RCU_BATCHED?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu
2025-01-23 10:37 ` [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu Vlastimil Babka
@ 2025-01-24 12:47 ` Uladzislau Rezki
2025-01-24 14:16 ` Vlastimil Babka
0 siblings, 1 reply; 13+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 12:47 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu
On Thu, Jan 23, 2025 at 11:37:20AM +0100, Vlastimil Babka wrote:
> RCU has been special-casing callback function pointers that are integers
> lower than 4096 as offsets of rcu_head for kvfree() instead. The tree
> RCU implementation no longer does that as the batched kvfree_rcu() is
> not a simple call_rcu(). The tiny RCU still does, and the plan is also
> to make tree RCU use call_rcu() for SLUB_TINY configurations.
>
> Instead of teaching tree RCU again to special case the offsets, let's
> remove the special casing completely. Since there's no SLOB anymore, it
> is possible to create a callback function that can take a pointer to a
> middle of slab object with unknown offset and determine the object's
> pointer before freeing it, so implement that as kvfree_rcu_cb().
>
> Large kmalloc and vmalloc allocations are handled simply by aligning
> down to page size. For that we retain the requirement that the offset is
> smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely
> and instead just opencode the condition in the BUILD_BUG_ON() check.
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/rcupdate.h | 24 +++++++++---------------
> kernel/rcu/tiny.c | 13 -------------
> mm/slab.h | 2 ++
> mm/slab_common.c | 5 +----
> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 54 insertions(+), 32 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> #define RCU_POINTER_INITIALIZER(p, v) \
> .p = RCU_INITIALIZER(v)
>
> -/*
> - * Does the specified offset indicate that the corresponding rcu_head
> - * structure can be handled by kvfree_rcu()?
> - */
> -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
> -
> /**
> * kfree_rcu() - kfree an object after a grace period.
> * @ptr: pointer to kfree for double-argument invocations.
> @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> * when they are used in a kernel module, that module must invoke the
> * high-latency rcu_barrier() function at module-unload time.
> *
> - * The kfree_rcu() function handles this issue. Rather than encoding a
> - * function address in the embedded rcu_head structure, kfree_rcu() instead
> - * encodes the offset of the rcu_head structure within the base structure.
> - * Because the functions are not allowed in the low-order 4096 bytes of
> - * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
> + * The kfree_rcu() function handles this issue. In order to have a universal
> + * callback function handling different offsets of rcu_head, the callback needs
> + * to determine the starting address of the freed object, which can be a large
> + * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to
> + * page boundary for those, only offsets up to 4095 bytes can be accommodated.
> * If the offset is larger than 4095 bytes, a compile-time error will
> * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can
> * either fall back to use of call_rcu() or rearrange the structure to
> @@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr);
> do { \
> typeof (ptr) ___p = (ptr); \
> \
> - if (___p) { \
> - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
> - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> - } \
> + if (___p) { \
> + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \
> + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> + } \
>
Why removing the macro? At least __is_kvfree_rcu_offset() makes it
clear what and why + it has a nice comment what it is used for. 4096
looks like hard-coded value.
Or you do not want that someone else started to use that macro in
another places?
> } while (0)
>
> #define kvfree_rcu_arg_1(ptr) \
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index 0ec27093d0e14a4b1060ea08932c4ac13f9b0f26..77e0db0221364376a99ebeb17485650879385a6e 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -88,12 +88,6 @@ static inline bool rcu_reclaim_tiny(struct rcu_head *head)
> unsigned long offset = (unsigned long)head->func;
>
> rcu_lock_acquire(&rcu_callback_map);
> - if (__is_kvfree_rcu_offset(offset)) {
> - trace_rcu_invoke_kvfree_callback("", head, offset);
> - kvfree((void *)head - offset);
> - rcu_lock_release(&rcu_callback_map);
> - return true;
> - }
>
> trace_rcu_invoke_callback("", head);
> f = head->func;
> @@ -159,10 +153,6 @@ void synchronize_rcu(void)
> }
> EXPORT_SYMBOL_GPL(synchronize_rcu);
>
> -static void tiny_rcu_leak_callback(struct rcu_head *rhp)
> -{
> -}
> -
> /*
> * Post an RCU callback to be invoked after the end of an RCU grace
> * period. But since we have but one CPU, that would be after any
> @@ -178,9 +168,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
> pr_err("%s(): Double-freed CB %p->%pS()!!! ", __func__, head, head->func);
> mem_dump_obj(head);
> }
> -
> - if (!__is_kvfree_rcu_offset((unsigned long)head->func))
> - WRITE_ONCE(head->func, tiny_rcu_leak_callback);
> return;
> }
>
> diff --git a/mm/slab.h b/mm/slab.h
> index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> void **p, int objects, struct slabobj_ext *obj_exts);
> #endif
>
> +void kvfree_rcu_cb(struct rcu_head *head);
> +
> size_t __ksize(const void *objp);
>
> static inline size_t slab_ksize(const struct kmem_cache *s)
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head)
> rcu_lock_acquire(&rcu_callback_map);
> trace_rcu_invoke_kvfree_callback("slab", head, offset);
>
> - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> - kvfree(ptr);
> -
This is not correct unless i miss something. Why do you remove this?
>
> diff --git a/mm/slub.c b/mm/slub.c
> index c2151c9fee228d121a9cbcc220c3ae054769dacf..651381bf05566e88de8493e0550f121d23b757a1 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -19,6 +19,7 @@
> #include <linux/bitops.h>
> #include <linux/slab.h>
> #include "slab.h"
> +#include <linux/vmalloc.h>
> #include <linux/proc_fs.h>
> #include <linux/seq_file.h>
> #include <linux/kasan.h>
> @@ -4732,6 +4733,47 @@ static void free_large_kmalloc(struct folio *folio, void *object)
> folio_put(folio);
> }
>
> +void kvfree_rcu_cb(struct rcu_head *head)
> +{
> + void *obj = head;
> + struct folio *folio;
> + struct slab *slab;
> + struct kmem_cache *s;
> + void *slab_addr;
> +
> + if (unlikely(is_vmalloc_addr(obj))) {
> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> + vfree(obj);
> + return;
> + }
> +
> + folio = virt_to_folio(obj);
> + if (unlikely(!folio_test_slab(folio))) {
> + /*
> + * rcu_head offset can be only less than page size so no need to
> + * consider folio order
> + */
> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> + free_large_kmalloc(folio, obj);
> + return;
> + }
> +
> + slab = folio_slab(folio);
> + s = slab->slab_cache;
> + slab_addr = folio_address(folio);
> +
> + if (is_kfence_address(obj)) {
> + obj = kfence_object_start(obj);
> + } else {
> + unsigned int idx = __obj_to_index(s, slab_addr, obj);
> +
> + obj = slab_addr + s->size * idx;
> + obj = fixup_red_left(s, obj);
> + }
> +
> + slab_free(s, slab, obj, _RET_IP_);
> +}
> +
Tiny computer case. Just small nit, maybe remove unlikely() but you decide :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of kvfree_rcu() to SLAB
2025-01-23 10:37 ` [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of " Vlastimil Babka
@ 2025-01-24 12:56 ` Uladzislau Rezki
0 siblings, 0 replies; 13+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 12:56 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki,
Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu
On Thu, Jan 23, 2025 at 11:37:18AM +0100, Vlastimil Babka wrote:
> Following the move of TREE_RCU implementation, let's move also the
> TINY_RCU one for consistency and subsequent refactoring.
>
> For simplicity, remove the separate inline __kvfree_call_rcu() as
> TINY_RCU is not meant for high-performance hardware anyway.
>
> Declare kvfree_call_rcu() in rcupdate.h to avoid header dependency
> issues.
>
> Also move the kvfree_rcu_barrier() declaration to slab.h
>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> include/linux/rcupdate.h | 5 +++++
> include/linux/rcutiny.h | 36 ------------------------------------
> include/linux/rcutree.h | 3 ---
> include/linux/slab.h | 14 ++++++++++++++
> kernel/rcu/tiny.c | 11 -----------
> mm/slab_common.c | 20 ++++++++++++++++++--
> 6 files changed, 37 insertions(+), 52 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 48e5c03df1dd83c246a61d0fcc8aa638adcd7654..3f70d1c8144426f40553c8c589f07097ece8a706 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -1082,6 +1082,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> #define kfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
> #define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
>
> +/*
> + * In mm/slab_common.c, no suitable header to include here.
> + */
> +void kvfree_call_rcu(struct rcu_head *head, void *ptr);
> +
> #define kvfree_rcu_arg_2(ptr, rhf) \
> do { \
> typeof (ptr) ___p = (ptr); \
> diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
> index fe42315f667fc5be7f2ed8eae6ea0c7193030846..f519cd6802286710bdd56588b5ff3d07bcd30b92 100644
> --- a/include/linux/rcutiny.h
> +++ b/include/linux/rcutiny.h
> @@ -90,41 +90,6 @@ static inline void synchronize_rcu_expedited(void)
> synchronize_rcu();
> }
>
> -/*
> - * Add one more declaration of kvfree() here. It is
> - * not so straight forward to just include <linux/mm.h>
> - * where it is defined due to getting many compile
> - * errors caused by that include.
> - */
> -extern void kvfree(const void *addr);
> -
> -static inline void __kvfree_call_rcu(struct rcu_head *head, void *ptr)
> -{
> - if (head) {
> - call_rcu(head, (rcu_callback_t) ((void *) head - ptr));
> - return;
> - }
> -
> - // kvfree_rcu(one_arg) call.
> - might_sleep();
> - synchronize_rcu();
> - kvfree(ptr);
> -}
> -
> -static inline void kvfree_rcu_barrier(void)
> -{
> - rcu_barrier();
> -}
> -
> -#ifdef CONFIG_KASAN_GENERIC
> -void kvfree_call_rcu(struct rcu_head *head, void *ptr);
> -#else
> -static inline void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> -{
> - __kvfree_call_rcu(head, ptr);
> -}
> -#endif
> -
> void rcu_qs(void);
>
> static inline void rcu_softirq_qs(void)
> @@ -164,7 +129,6 @@ static inline void rcu_end_inkernel_boot(void) { }
> static inline bool rcu_inkernel_boot_has_ended(void) { return true; }
> static inline bool rcu_is_watching(void) { return true; }
> static inline void rcu_momentary_eqs(void) { }
> -static inline void kfree_rcu_scheduler_running(void) { }
>
> /* Avoid RCU read-side critical sections leaking across. */
> static inline void rcu_all_qs(void) { barrier(); }
> diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
> index 27d86d9127817e50f8d4dd79e1990d70a02435bb..dbe77b5fe06ec89a393b5444d6c479ced346a37b 100644
> --- a/include/linux/rcutree.h
> +++ b/include/linux/rcutree.h
> @@ -34,12 +34,9 @@ static inline void rcu_virt_note_context_switch(void)
> }
>
> void synchronize_rcu_expedited(void);
> -void kvfree_call_rcu(struct rcu_head *head, void *ptr);
> -void kvfree_rcu_barrier(void);
>
> void rcu_barrier(void);
> void rcu_momentary_eqs(void);
> -void kfree_rcu_scheduler_running(void);
>
> struct rcu_gp_oldstate {
> unsigned long rgos_norm;
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 09eedaecf1205672bb2e7c8cd57ae8fccebc2737..bcc62e5656c35c6a3f4caf26fb33d7447dead39a 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -16,6 +16,7 @@
> #include <linux/gfp.h>
> #include <linux/overflow.h>
> #include <linux/types.h>
> +#include <linux/rcupdate.h>
> #include <linux/workqueue.h>
> #include <linux/percpu-refcount.h>
> #include <linux/cleanup.h>
> @@ -1082,6 +1083,19 @@ extern void kvfree_sensitive(const void *addr, size_t len);
>
> unsigned int kmem_cache_size(struct kmem_cache *s);
>
> +#ifdef CONFIG_TINY_RCU
> +static inline void kvfree_rcu_barrier(void)
> +{
> + rcu_barrier();
> +}
> +
> +static inline void kfree_rcu_scheduler_running(void) { }
> +#else
> +void kvfree_rcu_barrier(void);
> +
> +void kfree_rcu_scheduler_running(void);
> +#endif
> +
> /**
> * kmalloc_size_roundup - Report allocation bucket size for the given size
> *
> diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
> index b3b3ce34df6310f7bddba40b2be1bdf6c9f00232..0ec27093d0e14a4b1060ea08932c4ac13f9b0f26 100644
> --- a/kernel/rcu/tiny.c
> +++ b/kernel/rcu/tiny.c
> @@ -246,17 +246,6 @@ bool poll_state_synchronize_rcu(unsigned long oldstate)
> }
> EXPORT_SYMBOL_GPL(poll_state_synchronize_rcu);
>
> -#ifdef CONFIG_KASAN_GENERIC
> -void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> -{
> - if (head)
> - kasan_record_aux_stack_noalloc(ptr);
> -
> - __kvfree_call_rcu(head, ptr);
> -}
> -EXPORT_SYMBOL_GPL(kvfree_call_rcu);
> -#endif
> -
> void __init rcu_init(void)
> {
> open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 69f2d19010dedaa3e5b303ab9803c8cdd40152fa..330cdd8ebc5380090ee784c58e8ca1d1a52b3758 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1861,7 +1861,23 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
> return true;
> }
>
> -#if !defined(CONFIG_TINY_RCU)
> +#ifdef CONFIG_TINY_RCU
> +
> +void kvfree_call_rcu(struct rcu_head *head, void *ptr)
> +{
> + if (head) {
> + kasan_record_aux_stack_noalloc(ptr);
> + call_rcu(head, (rcu_callback_t) ((void *) head - ptr));
> + return;
> + }
> +
> + // kvfree_rcu(one_arg) call.
> + might_sleep();
> + synchronize_rcu();
> + kvfree(ptr);
> +}
> +
> +#else /* !CONFIG_TINY_RCU */
>
> static enum hrtimer_restart
> schedule_page_work_fn(struct hrtimer *t)
> @@ -2071,7 +2087,7 @@ void kvfree_rcu_barrier(void)
> }
> EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
>
> -#endif /* #if !defined(CONFIG_TINY_RCU) */
> +#endif /* !CONFIG_TINY_RCU */
>
> static unsigned long
> kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
>
> --
> 2.48.1
>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu
2025-01-24 12:47 ` Uladzislau Rezki
@ 2025-01-24 14:16 ` Vlastimil Babka
2025-01-24 14:19 ` Uladzislau Rezki
0 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2025-01-24 14:16 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Andrew Morton,
Roman Gushchin, Hyeonggon Yoo, linux-mm, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu
On 1/24/25 13:47, Uladzislau Rezki wrote:
> On Thu, Jan 23, 2025 at 11:37:20AM +0100, Vlastimil Babka wrote:
>> RCU has been special-casing callback function pointers that are integers
>> lower than 4096 as offsets of rcu_head for kvfree() instead. The tree
>> RCU implementation no longer does that as the batched kvfree_rcu() is
>> not a simple call_rcu(). The tiny RCU still does, and the plan is also
>> to make tree RCU use call_rcu() for SLUB_TINY configurations.
>>
>> Instead of teaching tree RCU again to special case the offsets, let's
>> remove the special casing completely. Since there's no SLOB anymore, it
>> is possible to create a callback function that can take a pointer to a
>> middle of slab object with unknown offset and determine the object's
>> pointer before freeing it, so implement that as kvfree_rcu_cb().
>>
>> Large kmalloc and vmalloc allocations are handled simply by aligning
>> down to page size. For that we retain the requirement that the offset is
>> smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely
>> and instead just opencode the condition in the BUILD_BUG_ON() check.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>> include/linux/rcupdate.h | 24 +++++++++---------------
>> kernel/rcu/tiny.c | 13 -------------
>> mm/slab.h | 2 ++
>> mm/slab_common.c | 5 +----
>> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>> 5 files changed, 54 insertions(+), 32 deletions(-)
>>
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>> #define RCU_POINTER_INITIALIZER(p, v) \
>> .p = RCU_INITIALIZER(v)
>>
>> -/*
>> - * Does the specified offset indicate that the corresponding rcu_head
>> - * structure can be handled by kvfree_rcu()?
>> - */
>> -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
>> -
>> /**
>> * kfree_rcu() - kfree an object after a grace period.
>> * @ptr: pointer to kfree for double-argument invocations.
>> @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>> * when they are used in a kernel module, that module must invoke the
>> * high-latency rcu_barrier() function at module-unload time.
>> *
>> - * The kfree_rcu() function handles this issue. Rather than encoding a
>> - * function address in the embedded rcu_head structure, kfree_rcu() instead
>> - * encodes the offset of the rcu_head structure within the base structure.
>> - * Because the functions are not allowed in the low-order 4096 bytes of
>> - * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
>> + * The kfree_rcu() function handles this issue. In order to have a universal
>> + * callback function handling different offsets of rcu_head, the callback needs
>> + * to determine the starting address of the freed object, which can be a large
>> + * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to
>> + * page boundary for those, only offsets up to 4095 bytes can be accommodated.
>> * If the offset is larger than 4095 bytes, a compile-time error will
>> * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can
>> * either fall back to use of call_rcu() or rearrange the structure to
>> @@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr);
>> do { \
>> typeof (ptr) ___p = (ptr); \
>> \
>> - if (___p) { \
>> - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
>> - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
>> - } \
>> + if (___p) { \
>> + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \
>> + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
>> + } \
>>
> Why removing the macro? At least __is_kvfree_rcu_offset() makes it
> clear what and why + it has a nice comment what it is used for. 4096
> looks like hard-coded value.
Hmm but it's ultimately shorter. We could add a short comment to
kvfree_rcu_arg_2() referring to the longer explanation in the kfree_rcu()
comment?
> Or you do not want that someone else started to use that macro in
> another places?
And also that, since this has to be exposed in a "public" header.
>> diff --git a/mm/slab.h b/mm/slab.h
>> index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>> void **p, int objects, struct slabobj_ext *obj_exts);
>> #endif
>>
>> +void kvfree_rcu_cb(struct rcu_head *head);
>> +
>> size_t __ksize(const void *objp);
>>
>> static inline size_t slab_ksize(const struct kmem_cache *s)
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head)
>> rcu_lock_acquire(&rcu_callback_map);
>> trace_rcu_invoke_kvfree_callback("slab", head, offset);
>>
>> - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
>> - kvfree(ptr);
>> -
> This is not correct unless i miss something. Why do you remove this?
Oops, meant to remove just the warn check, and unconditionally call
kvfree(), thanks for noticing :)
The warning could really only trigger due to a use-after-free corruption of
the ptr pointer, because the BUILD_BUG_ON() would catch misuse with a too
large offset, so we don't need to keep it.
>> +void kvfree_rcu_cb(struct rcu_head *head)
>> +{
>> + void *obj = head;
>> + struct folio *folio;
>> + struct slab *slab;
>> + struct kmem_cache *s;
>> + void *slab_addr;
>> +
>> + if (unlikely(is_vmalloc_addr(obj))) {
>> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
>> + vfree(obj);
>> + return;
>> + }
>> +
>> + folio = virt_to_folio(obj);
>> + if (unlikely(!folio_test_slab(folio))) {
>> + /*
>> + * rcu_head offset can be only less than page size so no need to
>> + * consider folio order
>> + */
>> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
>> + free_large_kmalloc(folio, obj);
>> + return;
>> + }
>> +
>> + slab = folio_slab(folio);
>> + s = slab->slab_cache;
>> + slab_addr = folio_address(folio);
>> +
>> + if (is_kfence_address(obj)) {
>> + obj = kfence_object_start(obj);
>> + } else {
>> + unsigned int idx = __obj_to_index(s, slab_addr, obj);
>> +
>> + obj = slab_addr + s->size * idx;
>> + obj = fixup_red_left(s, obj);
>> + }
>> +
>> + slab_free(s, slab, obj, _RET_IP_);
>> +}
>> +
> Tiny computer case. Just small nit, maybe remove unlikely() but you decide :)
Force of habbit :)
> --
> Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu
2025-01-24 14:16 ` Vlastimil Babka
@ 2025-01-24 14:19 ` Uladzislau Rezki
0 siblings, 0 replies; 13+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 14:19 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Uladzislau Rezki, Christoph Lameter, David Rientjes,
Paul E. McKenney, Joel Fernandes, Josh Triplett, Boqun Feng,
Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu
On Fri, Jan 24, 2025 at 03:16:05PM +0100, Vlastimil Babka wrote:
> On 1/24/25 13:47, Uladzislau Rezki wrote:
> > On Thu, Jan 23, 2025 at 11:37:20AM +0100, Vlastimil Babka wrote:
> >> RCU has been special-casing callback function pointers that are integers
> >> lower than 4096 as offsets of rcu_head for kvfree() instead. The tree
> >> RCU implementation no longer does that as the batched kvfree_rcu() is
> >> not a simple call_rcu(). The tiny RCU still does, and the plan is also
> >> to make tree RCU use call_rcu() for SLUB_TINY configurations.
> >>
> >> Instead of teaching tree RCU again to special case the offsets, let's
> >> remove the special casing completely. Since there's no SLOB anymore, it
> >> is possible to create a callback function that can take a pointer to a
> >> middle of slab object with unknown offset and determine the object's
> >> pointer before freeing it, so implement that as kvfree_rcu_cb().
> >>
> >> Large kmalloc and vmalloc allocations are handled simply by aligning
> >> down to page size. For that we retain the requirement that the offset is
> >> smaller than 4096. But we can remove __is_kvfree_rcu_offset() completely
> >> and instead just opencode the condition in the BUILD_BUG_ON() check.
> >>
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> >> include/linux/rcupdate.h | 24 +++++++++---------------
> >> kernel/rcu/tiny.c | 13 -------------
> >> mm/slab.h | 2 ++
> >> mm/slab_common.c | 5 +----
> >> mm/slub.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> >> 5 files changed, 54 insertions(+), 32 deletions(-)
> >>
> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >> index 3f70d1c8144426f40553c8c589f07097ece8a706..7ff16a70ca1c0fb1012c4118388f60687c5e5b3f 100644
> >> --- a/include/linux/rcupdate.h
> >> +++ b/include/linux/rcupdate.h
> >> @@ -1025,12 +1025,6 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> >> #define RCU_POINTER_INITIALIZER(p, v) \
> >> .p = RCU_INITIALIZER(v)
> >>
> >> -/*
> >> - * Does the specified offset indicate that the corresponding rcu_head
> >> - * structure can be handled by kvfree_rcu()?
> >> - */
> >> -#define __is_kvfree_rcu_offset(offset) ((offset) < 4096)
> >> -
> >> /**
> >> * kfree_rcu() - kfree an object after a grace period.
> >> * @ptr: pointer to kfree for double-argument invocations.
> >> @@ -1041,11 +1035,11 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
> >> * when they are used in a kernel module, that module must invoke the
> >> * high-latency rcu_barrier() function at module-unload time.
> >> *
> >> - * The kfree_rcu() function handles this issue. Rather than encoding a
> >> - * function address in the embedded rcu_head structure, kfree_rcu() instead
> >> - * encodes the offset of the rcu_head structure within the base structure.
> >> - * Because the functions are not allowed in the low-order 4096 bytes of
> >> - * kernel virtual memory, offsets up to 4095 bytes can be accommodated.
> >> + * The kfree_rcu() function handles this issue. In order to have a universal
> >> + * callback function handling different offsets of rcu_head, the callback needs
> >> + * to determine the starting address of the freed object, which can be a large
> >> + * kmalloc of vmalloc allocation. To allow simply aligning the pointer down to
> >> + * page boundary for those, only offsets up to 4095 bytes can be accommodated.
> >> * If the offset is larger than 4095 bytes, a compile-time error will
> >> * be generated in kvfree_rcu_arg_2(). If this error is triggered, you can
> >> * either fall back to use of call_rcu() or rearrange the structure to
> >> @@ -1091,10 +1085,10 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr);
> >> do { \
> >> typeof (ptr) ___p = (ptr); \
> >> \
> >> - if (___p) { \
> >> - BUILD_BUG_ON(!__is_kvfree_rcu_offset(offsetof(typeof(*(ptr)), rhf))); \
> >> - kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> >> - } \
> >> + if (___p) { \
> >> + BUILD_BUG_ON(offsetof(typeof(*(ptr)), rhf) >= 4096); \
> >> + kvfree_call_rcu(&((___p)->rhf), (void *) (___p)); \
> >> + } \
> >>
> > Why removing the macro? At least __is_kvfree_rcu_offset() makes it
> > clear what and why + it has a nice comment what it is used for. 4096
> > looks like hard-coded value.
>
> Hmm but it's ultimately shorter. We could add a short comment to
> kvfree_rcu_arg_2() referring to the longer explanation in the kfree_rcu()
> comment?
>
Sounds good to place or keep the comment.
> > Or you do not want that someone else started to use that macro in
> > another places?
>
> And also that, since this has to be exposed in a "public" header.
>
> >> diff --git a/mm/slab.h b/mm/slab.h
> >> index e9fd9bf0bfa65b343a4ae0ecd5b4c2a325b04883..2f01c7317988ce036f0b22807403226a59f0f708 100644
> >> --- a/mm/slab.h
> >> +++ b/mm/slab.h
> >> @@ -604,6 +604,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> >> void **p, int objects, struct slabobj_ext *obj_exts);
> >> #endif
> >>
> >> +void kvfree_rcu_cb(struct rcu_head *head);
> >> +
> >> size_t __ksize(const void *objp);
> >>
> >> static inline size_t slab_ksize(const struct kmem_cache *s)
> >> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> index 330cdd8ebc5380090ee784c58e8ca1d1a52b3758..f13d2c901daf1419993620459fbd5845eecb85f1 100644
> >> --- a/mm/slab_common.c
> >> +++ b/mm/slab_common.c
> >> @@ -1532,9 +1532,6 @@ kvfree_rcu_list(struct rcu_head *head)
> >> rcu_lock_acquire(&rcu_callback_map);
> >> trace_rcu_invoke_kvfree_callback("slab", head, offset);
> >>
> >> - if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
> >> - kvfree(ptr);
> >> -
> > This is not correct unless i miss something. Why do you remove this?
>
> Oops, meant to remove just the warn check, and unconditionally call
> kvfree(), thanks for noticing :)
>
> The warning could really only trigger due to a use-after-free corruption of
> the ptr pointer, because the BUILD_BUG_ON() would catch misuse with a too
> large offset, so we don't need to keep it.
>
> >> +void kvfree_rcu_cb(struct rcu_head *head)
> >> +{
> >> + void *obj = head;
> >> + struct folio *folio;
> >> + struct slab *slab;
> >> + struct kmem_cache *s;
> >> + void *slab_addr;
> >> +
> >> + if (unlikely(is_vmalloc_addr(obj))) {
> >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> >> + vfree(obj);
> >> + return;
> >> + }
> >> +
> >> + folio = virt_to_folio(obj);
> >> + if (unlikely(!folio_test_slab(folio))) {
> >> + /*
> >> + * rcu_head offset can be only less than page size so no need to
> >> + * consider folio order
> >> + */
> >> + obj = (void *) PAGE_ALIGN_DOWN((unsigned long)obj);
> >> + free_large_kmalloc(folio, obj);
> >> + return;
> >> + }
> >> +
> >> + slab = folio_slab(folio);
> >> + s = slab->slab_cache;
> >> + slab_addr = folio_address(folio);
> >> +
> >> + if (is_kfence_address(obj)) {
> >> + obj = kfence_object_start(obj);
> >> + } else {
> >> + unsigned int idx = __obj_to_index(s, slab_addr, obj);
> >> +
> >> + obj = slab_addr + s->size * idx;
> >> + obj = fixup_red_left(s, obj);
> >> + }
> >> +
> >> + slab_free(s, slab, obj, _RET_IP_);
> >> +}
> >> +
> > Tiny computer case. Just small nit, maybe remove unlikely() but you decide :)
>
> Force of habbit :)
>
:)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
` (5 preceding siblings ...)
2025-01-23 20:26 ` Paul E. McKenney
@ 2025-01-24 14:25 ` Vlastimil Babka
6 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2025-01-24 14:25 UTC (permalink / raw)
To: Christoph Lameter, David Rientjes, Paul E. McKenney,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: Andrew Morton, Roman Gushchin, Hyeonggon Yoo, linux-mm,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu
On 1/23/25 11:37, Vlastimil Babka wrote:
> git tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=b4/slub-tiny-kfree_rcu
There I fixed build errors reported by robots, and kvfree_rcu_list() memory
leak reported by Ulad, thanks! The other suggestions and tags I will apply
before a non-RFC posting, most likely after the merge window.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-01-24 14:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-23 10:37 [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 1/4] slab, rcu: move TINY_RCU variant of " Vlastimil Babka
2025-01-24 12:56 ` Uladzislau Rezki
2025-01-23 10:37 ` [PATCH RFC 2/4] rcu: remove trace_rcu_kvfree_callback Vlastimil Babka
2025-01-23 10:37 ` [PATCH RFC 3/4] rcu, slab: use a regular callback function for kvfree_rcu Vlastimil Babka
2025-01-24 12:47 ` Uladzislau Rezki
2025-01-24 14:16 ` Vlastimil Babka
2025-01-24 14:19 ` Uladzislau Rezki
2025-01-23 10:37 ` [PATCH RFC 4/4] slab: don't batch kvfree_rcu() with SLUB_TINY Vlastimil Babka
2025-01-24 12:11 ` Uladzislau Rezki
2025-01-23 19:37 ` [PATCH RFC 0/4] slab, rcu: move and consolidate TINY_RCU kvfree_rcu() to SLAB Joel Fernandes
2025-01-23 20:26 ` Paul E. McKenney
2025-01-24 14:25 ` Vlastimil Babka
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox