linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
@ 2024-12-12 18:02 Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 1/5] rcu/kvfree: Initialize kvfree_rcu() separately Uladzislau Rezki (Sony)
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-12-12 18:02 UTC (permalink / raw)
  To: linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko

Hello!

This is v2. It is based on the Linux 6.13-rc2. The first version is
here:

https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/

The difference between v1 and v2 is that, the preparation process is
done in original place instead and after that there is one final move.

Uladzislau Rezki (Sony) (5):
  rcu/kvfree: Initialize kvfree_rcu() separately
  rcu/kvfree: Move some functions under CONFIG_TINY_RCU
  rcu/kvfree: Adjust names passed into trace functions
  rcu/kvfree: Adjust a shrinker name
  mm/slab: Move kvfree_rcu() into SLAB

 include/linux/slab.h |   1 +
 init/main.c          |   1 +
 kernel/rcu/tree.c    | 876 ------------------------------------------
 mm/slab_common.c     | 880 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 882 insertions(+), 876 deletions(-)

-- 
2.39.5



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/5] rcu/kvfree: Initialize kvfree_rcu() separately
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
@ 2024-12-12 18:02 ` Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 2/5] rcu/kvfree: Move some functions under CONFIG_TINY_RCU Uladzislau Rezki (Sony)
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-12-12 18:02 UTC (permalink / raw)
  To: linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko

Introduce a separate initialization of kvfree_rcu() functionality.
For such purpose a kfree_rcu_batch_init() is renamed to a kvfree_rcu_init()
and it is invoked from the main.c right after rcu_init() is done.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/rcupdate.h | 1 +
 init/main.c              | 1 +
 kernel/rcu/tree.c        | 3 +--
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 48e5c03df1dd..acb0095b4dbe 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -118,6 +118,7 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
 
 /* Internal to kernel */
 void rcu_init(void);
+void __init kvfree_rcu_init(void);
 extern int rcu_scheduler_active;
 void rcu_sched_clock_irq(int user);
 
diff --git a/init/main.c b/init/main.c
index 00fac1170294..893cb77aef22 100644
--- a/init/main.c
+++ b/init/main.c
@@ -992,6 +992,7 @@ void start_kernel(void)
 	workqueue_init_early();
 
 	rcu_init();
+	kvfree_rcu_init();
 
 	/* Trace events are available after this */
 	trace_init();
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index ff98233d4aa5..e69b867de8ef 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5648,7 +5648,7 @@ static void __init rcu_dump_rcu_node_tree(void)
 
 struct workqueue_struct *rcu_gp_wq;
 
-static void __init kfree_rcu_batch_init(void)
+void __init kvfree_rcu_init(void)
 {
 	int cpu;
 	int i, j;
@@ -5703,7 +5703,6 @@ void __init rcu_init(void)
 
 	rcu_early_boot_tests();
 
-	kfree_rcu_batch_init();
 	rcu_bootup_announce();
 	sanitize_kthread_prio();
 	rcu_init_geometry();
-- 
2.39.5



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 2/5] rcu/kvfree: Move some functions under CONFIG_TINY_RCU
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 1/5] rcu/kvfree: Initialize kvfree_rcu() separately Uladzislau Rezki (Sony)
@ 2024-12-12 18:02 ` Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 3/5] rcu/kvfree: Adjust names passed into trace functions Uladzislau Rezki (Sony)
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-12-12 18:02 UTC (permalink / raw)
  To: linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko

Currently when a tiny RCU is enabled, the tree.c file is not
compiled, thus duplicating function names do not conflict with
each other.

Because of moving of kvfree_rcu() functionality to the SLAB,
we have to reorder some functions and place them together under
CONFIG_TINY_RCU macro definition. Therefore, those functions name
will not conflict when a kernel is compiled for CONFIG_TINY_RCU
flavor.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 90 +++++++++++++++++++++++++----------------------
 1 file changed, 47 insertions(+), 43 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e69b867de8ef..b3853ae6e869 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3653,16 +3653,6 @@ static void kfree_rcu_monitor(struct work_struct *work)
 		schedule_delayed_monitor_work(krcp);
 }
 
-static enum hrtimer_restart
-schedule_page_work_fn(struct hrtimer *t)
-{
-	struct kfree_rcu_cpu *krcp =
-		container_of(t, struct kfree_rcu_cpu, hrtimer);
-
-	queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0);
-	return HRTIMER_NORESTART;
-}
-
 static void fill_page_cache_func(struct work_struct *work)
 {
 	struct kvfree_rcu_bulk_data *bnode;
@@ -3698,27 +3688,6 @@ static void fill_page_cache_func(struct work_struct *work)
 	atomic_set(&krcp->backoff_page_cache_fill, 0);
 }
 
-static void
-run_page_cache_worker(struct kfree_rcu_cpu *krcp)
-{
-	// If cache disabled, bail out.
-	if (!rcu_min_cached_objs)
-		return;
-
-	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
-			!atomic_xchg(&krcp->work_in_progress, 1)) {
-		if (atomic_read(&krcp->backoff_page_cache_fill)) {
-			queue_delayed_work(system_unbound_wq,
-				&krcp->page_cache_work,
-					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
-		} else {
-			hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-			krcp->hrtimer.function = schedule_page_work_fn;
-			hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL);
-		}
-	}
-}
-
 // Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
 // state specified by flags.  If can_alloc is true, the caller must
 // be schedulable and not be holding any locks or mutexes that might be
@@ -3779,6 +3748,51 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
 	return true;
 }
 
+#if !defined(CONFIG_TINY_RCU)
+
+static enum hrtimer_restart
+schedule_page_work_fn(struct hrtimer *t)
+{
+	struct kfree_rcu_cpu *krcp =
+		container_of(t, struct kfree_rcu_cpu, hrtimer);
+
+	queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0);
+	return HRTIMER_NORESTART;
+}
+
+static void
+run_page_cache_worker(struct kfree_rcu_cpu *krcp)
+{
+	// If cache disabled, bail out.
+	if (!rcu_min_cached_objs)
+		return;
+
+	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
+			!atomic_xchg(&krcp->work_in_progress, 1)) {
+		if (atomic_read(&krcp->backoff_page_cache_fill)) {
+			queue_delayed_work(system_unbound_wq,
+				&krcp->page_cache_work,
+					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
+		} else {
+			hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+			krcp->hrtimer.function = schedule_page_work_fn;
+			hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL);
+		}
+	}
+}
+
+void __init kfree_rcu_scheduler_running(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+
+		if (need_offload_krc(krcp))
+			schedule_delayed_monitor_work(krcp);
+	}
+}
+
 /*
  * Queue a request for lazy invocation of the appropriate free routine
  * after a grace period.  Please note that three paths are maintained,
@@ -3944,6 +3958,8 @@ void kvfree_rcu_barrier(void)
 }
 EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
 
+#endif /* #if !defined(CONFIG_TINY_RCU) */
+
 static unsigned long
 kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
@@ -3985,18 +4001,6 @@ kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 	return freed == 0 ? SHRINK_STOP : freed;
 }
 
-void __init kfree_rcu_scheduler_running(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-
-		if (need_offload_krc(krcp))
-			schedule_delayed_monitor_work(krcp);
-	}
-}
-
 /*
  * During early boot, any blocking grace-period wait automatically
  * implies a grace period.
-- 
2.39.5



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 3/5] rcu/kvfree: Adjust names passed into trace functions
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 1/5] rcu/kvfree: Initialize kvfree_rcu() separately Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 2/5] rcu/kvfree: Move some functions under CONFIG_TINY_RCU Uladzislau Rezki (Sony)
@ 2024-12-12 18:02 ` Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 4/5] rcu/kvfree: Adjust a shrinker name Uladzislau Rezki (Sony)
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-12-12 18:02 UTC (permalink / raw)
  To: linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko

Currently trace functions are supplied with "rcu_state.name"
member which is located in the structure. The problem is that
the "rcu_state" structure variable is local and can not be
accessed from another place.

To address this, this preparation patch passes "slab" string
as a first argument.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b3853ae6e869..6ab21655c248 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3379,14 +3379,14 @@ kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
 		rcu_lock_acquire(&rcu_callback_map);
 		if (idx == 0) { // kmalloc() / kfree().
 			trace_rcu_invoke_kfree_bulk_callback(
-				rcu_state.name, bnode->nr_records,
+				"slab", bnode->nr_records,
 				bnode->records);
 
 			kfree_bulk(bnode->nr_records, bnode->records);
 		} else { // vmalloc() / vfree().
 			for (i = 0; i < bnode->nr_records; i++) {
 				trace_rcu_invoke_kvfree_callback(
-					rcu_state.name, bnode->records[i], 0);
+					"slab", bnode->records[i], 0);
 
 				vfree(bnode->records[i]);
 			}
@@ -3417,7 +3417,7 @@ kvfree_rcu_list(struct rcu_head *head)
 		next = head->next;
 		debug_rcu_head_unqueue((struct rcu_head *)ptr);
 		rcu_lock_acquire(&rcu_callback_map);
-		trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
+		trace_rcu_invoke_kvfree_callback("slab", head, offset);
 
 		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
 			kvfree(ptr);
-- 
2.39.5



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 4/5] rcu/kvfree: Adjust a shrinker name
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
                   ` (2 preceding siblings ...)
  2024-12-12 18:02 ` [PATCH v2 3/5] rcu/kvfree: Adjust names passed into trace functions Uladzislau Rezki (Sony)
@ 2024-12-12 18:02 ` Uladzislau Rezki (Sony)
  2024-12-12 18:02 ` [PATCH v2 5/5] mm/slab: Move kvfree_rcu() into SLAB Uladzislau Rezki (Sony)
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-12-12 18:02 UTC (permalink / raw)
  To: linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko

Rename "rcu-kfree" to "slab-kvfree-rcu" since it goes to the
slab_common.c file soon.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 6ab21655c248..b7ec998f360e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -5689,7 +5689,7 @@ void __init kvfree_rcu_init(void)
 		krcp->initialized = true;
 	}
 
-	kfree_rcu_shrinker = shrinker_alloc(0, "rcu-kfree");
+	kfree_rcu_shrinker = shrinker_alloc(0, "slab-kvfree-rcu");
 	if (!kfree_rcu_shrinker) {
 		pr_err("Failed to allocate kfree_rcu() shrinker!\n");
 		return;
-- 
2.39.5



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 5/5] mm/slab: Move kvfree_rcu() into SLAB
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
                   ` (3 preceding siblings ...)
  2024-12-12 18:02 ` [PATCH v2 4/5] rcu/kvfree: Adjust a shrinker name Uladzislau Rezki (Sony)
@ 2024-12-12 18:02 ` Uladzislau Rezki (Sony)
  2024-12-12 18:30 ` [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Christoph Lameter (Ampere)
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 31+ messages in thread
From: Uladzislau Rezki (Sony) @ 2024-12-12 18:02 UTC (permalink / raw)
  To: linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Uladzislau Rezki,
	Oleksiy Avramchenko

Move kvfree_rcu() functionality to the slab_common.c file.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 include/linux/rcupdate.h |   1 -
 include/linux/slab.h     |   1 +
 kernel/rcu/tree.c        | 879 --------------------------------------
 mm/slab_common.c         | 880 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 881 insertions(+), 880 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index acb0095b4dbe..48e5c03df1dd 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -118,7 +118,6 @@ static inline void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
 
 /* Internal to kernel */
 void rcu_init(void);
-void __init kvfree_rcu_init(void);
 extern int rcu_scheduler_active;
 void rcu_sched_clock_irq(int user);
 
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 10a971c2bde3..09eedaecf120 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1099,5 +1099,6 @@ unsigned int kmem_cache_size(struct kmem_cache *s);
 size_t kmalloc_size_roundup(size_t size);
 
 void __init kmem_cache_init_late(void);
+void __init kvfree_rcu_init(void);
 
 #endif	/* _LINUX_SLAB_H */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b7ec998f360e..6af042cde972 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -186,26 +186,6 @@ static int rcu_unlock_delay;
 module_param(rcu_unlock_delay, int, 0444);
 #endif
 
-/*
- * This rcu parameter is runtime-read-only. It reflects
- * a minimum allowed number of objects which can be cached
- * per-CPU. Object size is equal to one page. This value
- * can be changed at boot time.
- */
-static int rcu_min_cached_objs = 5;
-module_param(rcu_min_cached_objs, int, 0444);
-
-// A page shrinker can ask for pages to be freed to make them
-// available for other parts of the system. This usually happens
-// under low memory conditions, and in that case we should also
-// defer page-cache filling for a short time period.
-//
-// The default value is 5 seconds, which is long enough to reduce
-// interference with the shrinker while it asks other systems to
-// drain their caches.
-static int rcu_delay_page_cache_fill_msec = 5000;
-module_param(rcu_delay_page_cache_fill_msec, int, 0444);
-
 /* Retrieve RCU kthreads priority for rcutorture */
 int rcu_get_gp_kthreads_prio(void)
 {
@@ -3191,816 +3171,6 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
-/* Maximum number of jiffies to wait before draining a batch. */
-#define KFREE_DRAIN_JIFFIES (5 * HZ)
-#define KFREE_N_BATCHES 2
-#define FREE_N_CHANNELS 2
-
-/**
- * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
- * @list: List node. All blocks are linked between each other
- * @gp_snap: Snapshot of RCU state for objects placed to this bulk
- * @nr_records: Number of active pointers in the array
- * @records: Array of the kvfree_rcu() pointers
- */
-struct kvfree_rcu_bulk_data {
-	struct list_head list;
-	struct rcu_gp_oldstate gp_snap;
-	unsigned long nr_records;
-	void *records[] __counted_by(nr_records);
-};
-
-/*
- * This macro defines how many entries the "records" array
- * will contain. It is based on the fact that the size of
- * kvfree_rcu_bulk_data structure becomes exactly one page.
- */
-#define KVFREE_BULK_MAX_ENTR \
-	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
-
-/**
- * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
- * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
- * @head_free: List of kfree_rcu() objects waiting for a grace period
- * @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees.
- * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
- * @krcp: Pointer to @kfree_rcu_cpu structure
- */
-
-struct kfree_rcu_cpu_work {
-	struct rcu_work rcu_work;
-	struct rcu_head *head_free;
-	struct rcu_gp_oldstate head_free_gp_snap;
-	struct list_head bulk_head_free[FREE_N_CHANNELS];
-	struct kfree_rcu_cpu *krcp;
-};
-
-/**
- * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
- * @head: List of kfree_rcu() objects not yet waiting for a grace period
- * @head_gp_snap: Snapshot of RCU state for objects placed to "@head"
- * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
- * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
- * @lock: Synchronize access to this structure
- * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
- * @initialized: The @rcu_work fields have been initialized
- * @head_count: Number of objects in rcu_head singular list
- * @bulk_count: Number of objects in bulk-list
- * @bkvcache:
- *	A simple cache list that contains objects for reuse purpose.
- *	In order to save some per-cpu space the list is singular.
- *	Even though it is lockless an access has to be protected by the
- *	per-cpu lock.
- * @page_cache_work: A work to refill the cache when it is empty
- * @backoff_page_cache_fill: Delay cache refills
- * @work_in_progress: Indicates that page_cache_work is running
- * @hrtimer: A hrtimer for scheduling a page_cache_work
- * @nr_bkv_objs: number of allocated objects at @bkvcache.
- *
- * This is a per-CPU structure.  The reason that it is not included in
- * the rcu_data structure is to permit this code to be extracted from
- * the RCU files.  Such extraction could allow further optimization of
- * the interactions with the slab allocators.
- */
-struct kfree_rcu_cpu {
-	// Objects queued on a linked list
-	// through their rcu_head structures.
-	struct rcu_head *head;
-	unsigned long head_gp_snap;
-	atomic_t head_count;
-
-	// Objects queued on a bulk-list.
-	struct list_head bulk_head[FREE_N_CHANNELS];
-	atomic_t bulk_count[FREE_N_CHANNELS];
-
-	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
-	raw_spinlock_t lock;
-	struct delayed_work monitor_work;
-	bool initialized;
-
-	struct delayed_work page_cache_work;
-	atomic_t backoff_page_cache_fill;
-	atomic_t work_in_progress;
-	struct hrtimer hrtimer;
-
-	struct llist_head bkvcache;
-	int nr_bkv_objs;
-};
-
-static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
-	.lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
-};
-
-static __always_inline void
-debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
-{
-#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-	int i;
-
-	for (i = 0; i < bhead->nr_records; i++)
-		debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i]));
-#endif
-}
-
-static inline struct kfree_rcu_cpu *
-krc_this_cpu_lock(unsigned long *flags)
-{
-	struct kfree_rcu_cpu *krcp;
-
-	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
-	krcp = this_cpu_ptr(&krc);
-	raw_spin_lock(&krcp->lock);
-
-	return krcp;
-}
-
-static inline void
-krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
-{
-	raw_spin_unlock_irqrestore(&krcp->lock, flags);
-}
-
-static inline struct kvfree_rcu_bulk_data *
-get_cached_bnode(struct kfree_rcu_cpu *krcp)
-{
-	if (!krcp->nr_bkv_objs)
-		return NULL;
-
-	WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1);
-	return (struct kvfree_rcu_bulk_data *)
-		llist_del_first(&krcp->bkvcache);
-}
-
-static inline bool
-put_cached_bnode(struct kfree_rcu_cpu *krcp,
-	struct kvfree_rcu_bulk_data *bnode)
-{
-	// Check the limit.
-	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
-		return false;
-
-	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
-	WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1);
-	return true;
-}
-
-static int
-drain_page_cache(struct kfree_rcu_cpu *krcp)
-{
-	unsigned long flags;
-	struct llist_node *page_list, *pos, *n;
-	int freed = 0;
-
-	if (!rcu_min_cached_objs)
-		return 0;
-
-	raw_spin_lock_irqsave(&krcp->lock, flags);
-	page_list = llist_del_all(&krcp->bkvcache);
-	WRITE_ONCE(krcp->nr_bkv_objs, 0);
-	raw_spin_unlock_irqrestore(&krcp->lock, flags);
-
-	llist_for_each_safe(pos, n, page_list) {
-		free_page((unsigned long)pos);
-		freed++;
-	}
-
-	return freed;
-}
-
-static void
-kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
-	struct kvfree_rcu_bulk_data *bnode, int idx)
-{
-	unsigned long flags;
-	int i;
-
-	if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
-		debug_rcu_bhead_unqueue(bnode);
-		rcu_lock_acquire(&rcu_callback_map);
-		if (idx == 0) { // kmalloc() / kfree().
-			trace_rcu_invoke_kfree_bulk_callback(
-				"slab", bnode->nr_records,
-				bnode->records);
-
-			kfree_bulk(bnode->nr_records, bnode->records);
-		} else { // vmalloc() / vfree().
-			for (i = 0; i < bnode->nr_records; i++) {
-				trace_rcu_invoke_kvfree_callback(
-					"slab", bnode->records[i], 0);
-
-				vfree(bnode->records[i]);
-			}
-		}
-		rcu_lock_release(&rcu_callback_map);
-	}
-
-	raw_spin_lock_irqsave(&krcp->lock, flags);
-	if (put_cached_bnode(krcp, bnode))
-		bnode = NULL;
-	raw_spin_unlock_irqrestore(&krcp->lock, flags);
-
-	if (bnode)
-		free_page((unsigned long) bnode);
-
-	cond_resched_tasks_rcu_qs();
-}
-
-static void
-kvfree_rcu_list(struct rcu_head *head)
-{
-	struct rcu_head *next;
-
-	for (; head; head = next) {
-		void *ptr = (void *) head->func;
-		unsigned long offset = (void *) head - ptr;
-
-		next = head->next;
-		debug_rcu_head_unqueue((struct rcu_head *)ptr);
-		rcu_lock_acquire(&rcu_callback_map);
-		trace_rcu_invoke_kvfree_callback("slab", head, offset);
-
-		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
-			kvfree(ptr);
-
-		rcu_lock_release(&rcu_callback_map);
-		cond_resched_tasks_rcu_qs();
-	}
-}
-
-/*
- * This function is invoked in workqueue context after a grace period.
- * It frees all the objects queued on ->bulk_head_free or ->head_free.
- */
-static void kfree_rcu_work(struct work_struct *work)
-{
-	unsigned long flags;
-	struct kvfree_rcu_bulk_data *bnode, *n;
-	struct list_head bulk_head[FREE_N_CHANNELS];
-	struct rcu_head *head;
-	struct kfree_rcu_cpu *krcp;
-	struct kfree_rcu_cpu_work *krwp;
-	struct rcu_gp_oldstate head_gp_snap;
-	int i;
-
-	krwp = container_of(to_rcu_work(work),
-		struct kfree_rcu_cpu_work, rcu_work);
-	krcp = krwp->krcp;
-
-	raw_spin_lock_irqsave(&krcp->lock, flags);
-	// Channels 1 and 2.
-	for (i = 0; i < FREE_N_CHANNELS; i++)
-		list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]);
-
-	// Channel 3.
-	head = krwp->head_free;
-	krwp->head_free = NULL;
-	head_gp_snap = krwp->head_free_gp_snap;
-	raw_spin_unlock_irqrestore(&krcp->lock, flags);
-
-	// Handle the first two channels.
-	for (i = 0; i < FREE_N_CHANNELS; i++) {
-		// Start from the tail page, so a GP is likely passed for it.
-		list_for_each_entry_safe(bnode, n, &bulk_head[i], list)
-			kvfree_rcu_bulk(krcp, bnode, i);
-	}
-
-	/*
-	 * This is used when the "bulk" path can not be used for the
-	 * double-argument of kvfree_rcu().  This happens when the
-	 * page-cache is empty, which means that objects are instead
-	 * queued on a linked list through their rcu_head structures.
-	 * This list is named "Channel 3".
-	 */
-	if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
-		kvfree_rcu_list(head);
-}
-
-static bool
-need_offload_krc(struct kfree_rcu_cpu *krcp)
-{
-	int i;
-
-	for (i = 0; i < FREE_N_CHANNELS; i++)
-		if (!list_empty(&krcp->bulk_head[i]))
-			return true;
-
-	return !!READ_ONCE(krcp->head);
-}
-
-static bool
-need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp)
-{
-	int i;
-
-	for (i = 0; i < FREE_N_CHANNELS; i++)
-		if (!list_empty(&krwp->bulk_head_free[i]))
-			return true;
-
-	return !!krwp->head_free;
-}
-
-static int krc_count(struct kfree_rcu_cpu *krcp)
-{
-	int sum = atomic_read(&krcp->head_count);
-	int i;
-
-	for (i = 0; i < FREE_N_CHANNELS; i++)
-		sum += atomic_read(&krcp->bulk_count[i]);
-
-	return sum;
-}
-
-static void
-__schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
-{
-	long delay, delay_left;
-
-	delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES;
-	if (delayed_work_pending(&krcp->monitor_work)) {
-		delay_left = krcp->monitor_work.timer.expires - jiffies;
-		if (delay < delay_left)
-			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
-		return;
-	}
-	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
-}
-
-static void
-schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
-{
-	unsigned long flags;
-
-	raw_spin_lock_irqsave(&krcp->lock, flags);
-	__schedule_delayed_monitor_work(krcp);
-	raw_spin_unlock_irqrestore(&krcp->lock, flags);
-}
-
-static void
-kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp)
-{
-	struct list_head bulk_ready[FREE_N_CHANNELS];
-	struct kvfree_rcu_bulk_data *bnode, *n;
-	struct rcu_head *head_ready = NULL;
-	unsigned long flags;
-	int i;
-
-	raw_spin_lock_irqsave(&krcp->lock, flags);
-	for (i = 0; i < FREE_N_CHANNELS; i++) {
-		INIT_LIST_HEAD(&bulk_ready[i]);
-
-		list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) {
-			if (!poll_state_synchronize_rcu_full(&bnode->gp_snap))
-				break;
-
-			atomic_sub(bnode->nr_records, &krcp->bulk_count[i]);
-			list_move(&bnode->list, &bulk_ready[i]);
-		}
-	}
-
-	if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) {
-		head_ready = krcp->head;
-		atomic_set(&krcp->head_count, 0);
-		WRITE_ONCE(krcp->head, NULL);
-	}
-	raw_spin_unlock_irqrestore(&krcp->lock, flags);
-
-	for (i = 0; i < FREE_N_CHANNELS; i++) {
-		list_for_each_entry_safe(bnode, n, &bulk_ready[i], list)
-			kvfree_rcu_bulk(krcp, bnode, i);
-	}
-
-	if (head_ready)
-		kvfree_rcu_list(head_ready);
-}
-
-/*
- * Return: %true if a work is queued, %false otherwise.
- */
-static bool
-kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
-{
-	unsigned long flags;
-	bool queued = false;
-	int i, j;
-
-	raw_spin_lock_irqsave(&krcp->lock, flags);
-
-	// Attempt to start a new batch.
-	for (i = 0; i < KFREE_N_BATCHES; i++) {
-		struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);
-
-		// Try to detach bulk_head or head and attach it, only when
-		// all channels are free.  Any channel is not free means at krwp
-		// there is on-going rcu work to handle krwp's free business.
-		if (need_wait_for_krwp_work(krwp))
-			continue;
-
-		// kvfree_rcu_drain_ready() might handle this krcp, if so give up.
-		if (need_offload_krc(krcp)) {
-			// Channel 1 corresponds to the SLAB-pointer bulk path.
-			// Channel 2 corresponds to vmalloc-pointer bulk path.
-			for (j = 0; j < FREE_N_CHANNELS; j++) {
-				if (list_empty(&krwp->bulk_head_free[j])) {
-					atomic_set(&krcp->bulk_count[j], 0);
-					list_replace_init(&krcp->bulk_head[j],
-						&krwp->bulk_head_free[j]);
-				}
-			}
-
-			// Channel 3 corresponds to both SLAB and vmalloc
-			// objects queued on the linked list.
-			if (!krwp->head_free) {
-				krwp->head_free = krcp->head;
-				get_state_synchronize_rcu_full(&krwp->head_free_gp_snap);
-				atomic_set(&krcp->head_count, 0);
-				WRITE_ONCE(krcp->head, NULL);
-			}
-
-			// One work is per one batch, so there are three
-			// "free channels", the batch can handle. Break
-			// the loop since it is done with this CPU thus
-			// queuing an RCU work is _always_ success here.
-			queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
-			WARN_ON_ONCE(!queued);
-			break;
-		}
-	}
-
-	raw_spin_unlock_irqrestore(&krcp->lock, flags);
-	return queued;
-}
-
-/*
- * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
- */
-static void kfree_rcu_monitor(struct work_struct *work)
-{
-	struct kfree_rcu_cpu *krcp = container_of(work,
-		struct kfree_rcu_cpu, monitor_work.work);
-
-	// Drain ready for reclaim.
-	kvfree_rcu_drain_ready(krcp);
-
-	// Queue a batch for a rest.
-	kvfree_rcu_queue_batch(krcp);
-
-	// If there is nothing to detach, it means that our job is
-	// successfully done here. In case of having at least one
-	// of the channels that is still busy we should rearm the
-	// work to repeat an attempt. Because previous batches are
-	// still in progress.
-	if (need_offload_krc(krcp))
-		schedule_delayed_monitor_work(krcp);
-}
-
-static void fill_page_cache_func(struct work_struct *work)
-{
-	struct kvfree_rcu_bulk_data *bnode;
-	struct kfree_rcu_cpu *krcp =
-		container_of(work, struct kfree_rcu_cpu,
-			page_cache_work.work);
-	unsigned long flags;
-	int nr_pages;
-	bool pushed;
-	int i;
-
-	nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ?
-		1 : rcu_min_cached_objs;
-
-	for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) {
-		bnode = (struct kvfree_rcu_bulk_data *)
-			__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
-
-		if (!bnode)
-			break;
-
-		raw_spin_lock_irqsave(&krcp->lock, flags);
-		pushed = put_cached_bnode(krcp, bnode);
-		raw_spin_unlock_irqrestore(&krcp->lock, flags);
-
-		if (!pushed) {
-			free_page((unsigned long) bnode);
-			break;
-		}
-	}
-
-	atomic_set(&krcp->work_in_progress, 0);
-	atomic_set(&krcp->backoff_page_cache_fill, 0);
-}
-
-// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
-// state specified by flags.  If can_alloc is true, the caller must
-// be schedulable and not be holding any locks or mutexes that might be
-// acquired by the memory allocator or anything that it might invoke.
-// Returns true if ptr was successfully recorded, else the caller must
-// use a fallback.
-static inline bool
-add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
-	unsigned long *flags, void *ptr, bool can_alloc)
-{
-	struct kvfree_rcu_bulk_data *bnode;
-	int idx;
-
-	*krcp = krc_this_cpu_lock(flags);
-	if (unlikely(!(*krcp)->initialized))
-		return false;
-
-	idx = !!is_vmalloc_addr(ptr);
-	bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx],
-		struct kvfree_rcu_bulk_data, list);
-
-	/* Check if a new block is required. */
-	if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) {
-		bnode = get_cached_bnode(*krcp);
-		if (!bnode && can_alloc) {
-			krc_this_cpu_unlock(*krcp, *flags);
-
-			// __GFP_NORETRY - allows a light-weight direct reclaim
-			// what is OK from minimizing of fallback hitting point of
-			// view. Apart of that it forbids any OOM invoking what is
-			// also beneficial since we are about to release memory soon.
-			//
-			// __GFP_NOMEMALLOC - prevents from consuming of all the
-			// memory reserves. Please note we have a fallback path.
-			//
-			// __GFP_NOWARN - it is supposed that an allocation can
-			// be failed under low memory or high memory pressure
-			// scenarios.
-			bnode = (struct kvfree_rcu_bulk_data *)
-				__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
-			raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
-		}
-
-		if (!bnode)
-			return false;
-
-		// Initialize the new block and attach it.
-		bnode->nr_records = 0;
-		list_add(&bnode->list, &(*krcp)->bulk_head[idx]);
-	}
-
-	// Finally insert and update the GP for this page.
-	bnode->nr_records++;
-	bnode->records[bnode->nr_records - 1] = ptr;
-	get_state_synchronize_rcu_full(&bnode->gp_snap);
-	atomic_inc(&(*krcp)->bulk_count[idx]);
-
-	return true;
-}
-
-#if !defined(CONFIG_TINY_RCU)
-
-static enum hrtimer_restart
-schedule_page_work_fn(struct hrtimer *t)
-{
-	struct kfree_rcu_cpu *krcp =
-		container_of(t, struct kfree_rcu_cpu, hrtimer);
-
-	queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0);
-	return HRTIMER_NORESTART;
-}
-
-static void
-run_page_cache_worker(struct kfree_rcu_cpu *krcp)
-{
-	// If cache disabled, bail out.
-	if (!rcu_min_cached_objs)
-		return;
-
-	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
-			!atomic_xchg(&krcp->work_in_progress, 1)) {
-		if (atomic_read(&krcp->backoff_page_cache_fill)) {
-			queue_delayed_work(system_unbound_wq,
-				&krcp->page_cache_work,
-					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
-		} else {
-			hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-			krcp->hrtimer.function = schedule_page_work_fn;
-			hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL);
-		}
-	}
-}
-
-void __init kfree_rcu_scheduler_running(void)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-
-		if (need_offload_krc(krcp))
-			schedule_delayed_monitor_work(krcp);
-	}
-}
-
-/*
- * Queue a request for lazy invocation of the appropriate free routine
- * after a grace period.  Please note that three paths are maintained,
- * two for the common case using arrays of pointers and a third one that
- * is used only when the main paths cannot be used, for example, due to
- * memory pressure.
- *
- * Each kvfree_call_rcu() request is added to a batch. The batch will be drained
- * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
- * be free'd in workqueue context. This allows us to: batch requests together to
- * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
- */
-void kvfree_call_rcu(struct rcu_head *head, void *ptr)
-{
-	unsigned long flags;
-	struct kfree_rcu_cpu *krcp;
-	bool success;
-
-	/*
-	 * Please note there is a limitation for the head-less
-	 * variant, that is why there is a clear rule for such
-	 * objects: it can be used from might_sleep() context
-	 * only. For other places please embed an rcu_head to
-	 * your data.
-	 */
-	if (!head)
-		might_sleep();
-
-	// Queue the object but don't yet schedule the batch.
-	if (debug_rcu_head_queue(ptr)) {
-		// Probable double kfree_rcu(), just leak.
-		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
-			  __func__, head);
-
-		// Mark as success and leave.
-		return;
-	}
-
-	kasan_record_aux_stack_noalloc(ptr);
-	success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head);
-	if (!success) {
-		run_page_cache_worker(krcp);
-
-		if (head == NULL)
-			// Inline if kvfree_rcu(one_arg) call.
-			goto unlock_return;
-
-		head->func = ptr;
-		head->next = krcp->head;
-		WRITE_ONCE(krcp->head, head);
-		atomic_inc(&krcp->head_count);
-
-		// Take a snapshot for this krcp.
-		krcp->head_gp_snap = get_state_synchronize_rcu();
-		success = true;
-	}
-
-	/*
-	 * The kvfree_rcu() caller considers the pointer freed at this point
-	 * and likely removes any references to it. Since the actual slab
-	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
-	 * this object (no scanning or false positives reporting).
-	 */
-	kmemleak_ignore(ptr);
-
-	// Set timer to drain after KFREE_DRAIN_JIFFIES.
-	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
-		__schedule_delayed_monitor_work(krcp);
-
-unlock_return:
-	krc_this_cpu_unlock(krcp, flags);
-
-	/*
-	 * Inline kvfree() after synchronize_rcu(). We can do
-	 * it from might_sleep() context only, so the current
-	 * CPU can pass the QS state.
-	 */
-	if (!success) {
-		debug_rcu_head_unqueue((struct rcu_head *) ptr);
-		synchronize_rcu();
-		kvfree(ptr);
-	}
-}
-EXPORT_SYMBOL_GPL(kvfree_call_rcu);
-
-/**
- * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete.
- *
- * Note that a single argument of kvfree_rcu() call has a slow path that
- * triggers synchronize_rcu() following by freeing a pointer. It is done
- * before the return from the function. Therefore for any single-argument
- * call that will result in a kfree() to a cache that is to be destroyed
- * during module exit, it is developer's responsibility to ensure that all
- * such calls have returned before the call to kmem_cache_destroy().
- */
-void kvfree_rcu_barrier(void)
-{
-	struct kfree_rcu_cpu_work *krwp;
-	struct kfree_rcu_cpu *krcp;
-	bool queued;
-	int i, cpu;
-
-	/*
-	 * Firstly we detach objects and queue them over an RCU-batch
-	 * for all CPUs. Finally queued works are flushed for each CPU.
-	 *
-	 * Please note. If there are outstanding batches for a particular
-	 * CPU, those have to be finished first following by queuing a new.
-	 */
-	for_each_possible_cpu(cpu) {
-		krcp = per_cpu_ptr(&krc, cpu);
-
-		/*
-		 * Check if this CPU has any objects which have been queued for a
-		 * new GP completion. If not(means nothing to detach), we are done
-		 * with it. If any batch is pending/running for this "krcp", below
-		 * per-cpu flush_rcu_work() waits its completion(see last step).
-		 */
-		if (!need_offload_krc(krcp))
-			continue;
-
-		while (1) {
-			/*
-			 * If we are not able to queue a new RCU work it means:
-			 * - batches for this CPU are still in flight which should
-			 *   be flushed first and then repeat;
-			 * - no objects to detach, because of concurrency.
-			 */
-			queued = kvfree_rcu_queue_batch(krcp);
-
-			/*
-			 * Bail out, if there is no need to offload this "krcp"
-			 * anymore. As noted earlier it can run concurrently.
-			 */
-			if (queued || !need_offload_krc(krcp))
-				break;
-
-			/* There are ongoing batches. */
-			for (i = 0; i < KFREE_N_BATCHES; i++) {
-				krwp = &(krcp->krw_arr[i]);
-				flush_rcu_work(&krwp->rcu_work);
-			}
-		}
-	}
-
-	/*
-	 * Now we guarantee that all objects are flushed.
-	 */
-	for_each_possible_cpu(cpu) {
-		krcp = per_cpu_ptr(&krc, cpu);
-
-		/*
-		 * A monitor work can drain ready to reclaim objects
-		 * directly. Wait its completion if running or pending.
-		 */
-		cancel_delayed_work_sync(&krcp->monitor_work);
-
-		for (i = 0; i < KFREE_N_BATCHES; i++) {
-			krwp = &(krcp->krw_arr[i]);
-			flush_rcu_work(&krwp->rcu_work);
-		}
-	}
-}
-EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
-
-#endif /* #if !defined(CONFIG_TINY_RCU) */
-
-static unsigned long
-kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
-{
-	int cpu;
-	unsigned long count = 0;
-
-	/* Snapshot count of all CPUs */
-	for_each_possible_cpu(cpu) {
-		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-
-		count += krc_count(krcp);
-		count += READ_ONCE(krcp->nr_bkv_objs);
-		atomic_set(&krcp->backoff_page_cache_fill, 1);
-	}
-
-	return count == 0 ? SHRINK_EMPTY : count;
-}
-
-static unsigned long
-kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
-{
-	int cpu, freed = 0;
-
-	for_each_possible_cpu(cpu) {
-		int count;
-		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-
-		count = krc_count(krcp);
-		count += drain_page_cache(krcp);
-		kfree_rcu_monitor(&krcp->monitor_work.work);
-
-		sc->nr_to_scan -= count;
-		freed += count;
-
-		if (sc->nr_to_scan <= 0)
-			break;
-	}
-
-	return freed == 0 ? SHRINK_STOP : freed;
-}
-
 /*
  * During early boot, any blocking grace-period wait automatically
  * implies a grace period.
@@ -5652,55 +4822,6 @@ static void __init rcu_dump_rcu_node_tree(void)
 
 struct workqueue_struct *rcu_gp_wq;
 
-void __init kvfree_rcu_init(void)
-{
-	int cpu;
-	int i, j;
-	struct shrinker *kfree_rcu_shrinker;
-
-	/* Clamp it to [0:100] seconds interval. */
-	if (rcu_delay_page_cache_fill_msec < 0 ||
-		rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
-
-		rcu_delay_page_cache_fill_msec =
-			clamp(rcu_delay_page_cache_fill_msec, 0,
-				(int) (100 * MSEC_PER_SEC));
-
-		pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n",
-			rcu_delay_page_cache_fill_msec);
-	}
-
-	for_each_possible_cpu(cpu) {
-		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
-
-		for (i = 0; i < KFREE_N_BATCHES; i++) {
-			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
-			krcp->krw_arr[i].krcp = krcp;
-
-			for (j = 0; j < FREE_N_CHANNELS; j++)
-				INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]);
-		}
-
-		for (i = 0; i < FREE_N_CHANNELS; i++)
-			INIT_LIST_HEAD(&krcp->bulk_head[i]);
-
-		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
-		INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
-		krcp->initialized = true;
-	}
-
-	kfree_rcu_shrinker = shrinker_alloc(0, "slab-kvfree-rcu");
-	if (!kfree_rcu_shrinker) {
-		pr_err("Failed to allocate kfree_rcu() shrinker!\n");
-		return;
-	}
-
-	kfree_rcu_shrinker->count_objects = kfree_rcu_shrink_count;
-	kfree_rcu_shrinker->scan_objects = kfree_rcu_shrink_scan;
-
-	shrinker_register(kfree_rcu_shrinker);
-}
-
 void __init rcu_init(void)
 {
 	int cpu = smp_processor_id();
diff --git a/mm/slab_common.c b/mm/slab_common.c
index a29457bef626..69f2d19010de 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -28,7 +28,9 @@
 #include <asm/page.h>
 #include <linux/memcontrol.h>
 #include <linux/stackdepot.h>
+#include <trace/events/rcu.h>
 
+#include "../kernel/rcu/rcu.h"
 #include "internal.h"
 #include "slab.h"
 
@@ -1282,3 +1284,881 @@ EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
 EXPORT_TRACEPOINT_SYMBOL(kfree);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
 
+/*
+ * This rcu parameter is runtime-read-only. It reflects
+ * a minimum allowed number of objects which can be cached
+ * per-CPU. Object size is equal to one page. This value
+ * can be changed at boot time.
+ */
+static int rcu_min_cached_objs = 5;
+module_param(rcu_min_cached_objs, int, 0444);
+
+// A page shrinker can ask for pages to be freed to make them
+// available for other parts of the system. This usually happens
+// under low memory conditions, and in that case we should also
+// defer page-cache filling for a short time period.
+//
+// The default value is 5 seconds, which is long enough to reduce
+// interference with the shrinker while it asks other systems to
+// drain their caches.
+static int rcu_delay_page_cache_fill_msec = 5000;
+module_param(rcu_delay_page_cache_fill_msec, int, 0444);
+
+/* Maximum number of jiffies to wait before draining a batch. */
+#define KFREE_DRAIN_JIFFIES (5 * HZ)
+#define KFREE_N_BATCHES 2
+#define FREE_N_CHANNELS 2
+
+/**
+ * struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
+ * @list: List node. All blocks are linked between each other
+ * @gp_snap: Snapshot of RCU state for objects placed to this bulk
+ * @nr_records: Number of active pointers in the array
+ * @records: Array of the kvfree_rcu() pointers
+ */
+struct kvfree_rcu_bulk_data {
+	struct list_head list;
+	struct rcu_gp_oldstate gp_snap;
+	unsigned long nr_records;
+	void *records[] __counted_by(nr_records);
+};
+
+/*
+ * This macro defines how many entries the "records" array
+ * will contain. It is based on the fact that the size of
+ * kvfree_rcu_bulk_data structure becomes exactly one page.
+ */
+#define KVFREE_BULK_MAX_ENTR \
+	((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *))
+
+/**
+ * struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
+ * @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
+ * @head_free: List of kfree_rcu() objects waiting for a grace period
+ * @head_free_gp_snap: Grace-period snapshot to check for attempted premature frees.
+ * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
+ * @krcp: Pointer to @kfree_rcu_cpu structure
+ */
+
+struct kfree_rcu_cpu_work {
+	struct rcu_work rcu_work;
+	struct rcu_head *head_free;
+	struct rcu_gp_oldstate head_free_gp_snap;
+	struct list_head bulk_head_free[FREE_N_CHANNELS];
+	struct kfree_rcu_cpu *krcp;
+};
+
+/**
+ * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
+ * @head: List of kfree_rcu() objects not yet waiting for a grace period
+ * @head_gp_snap: Snapshot of RCU state for objects placed to "@head"
+ * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
+ * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
+ * @lock: Synchronize access to this structure
+ * @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
+ * @initialized: The @rcu_work fields have been initialized
+ * @head_count: Number of objects in rcu_head singular list
+ * @bulk_count: Number of objects in bulk-list
+ * @bkvcache:
+ *	A simple cache list that contains objects for reuse purpose.
+ *	In order to save some per-cpu space the list is singular.
+ *	Even though it is lockless an access has to be protected by the
+ *	per-cpu lock.
+ * @page_cache_work: A work to refill the cache when it is empty
+ * @backoff_page_cache_fill: Delay cache refills
+ * @work_in_progress: Indicates that page_cache_work is running
+ * @hrtimer: A hrtimer for scheduling a page_cache_work
+ * @nr_bkv_objs: number of allocated objects at @bkvcache.
+ *
+ * This is a per-CPU structure.  The reason that it is not included in
+ * the rcu_data structure is to permit this code to be extracted from
+ * the RCU files.  Such extraction could allow further optimization of
+ * the interactions with the slab allocators.
+ */
+struct kfree_rcu_cpu {
+	// Objects queued on a linked list
+	// through their rcu_head structures.
+	struct rcu_head *head;
+	unsigned long head_gp_snap;
+	atomic_t head_count;
+
+	// Objects queued on a bulk-list.
+	struct list_head bulk_head[FREE_N_CHANNELS];
+	atomic_t bulk_count[FREE_N_CHANNELS];
+
+	struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
+	raw_spinlock_t lock;
+	struct delayed_work monitor_work;
+	bool initialized;
+
+	struct delayed_work page_cache_work;
+	atomic_t backoff_page_cache_fill;
+	atomic_t work_in_progress;
+	struct hrtimer hrtimer;
+
+	struct llist_head bkvcache;
+	int nr_bkv_objs;
+};
+
+static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc) = {
+	.lock = __RAW_SPIN_LOCK_UNLOCKED(krc.lock),
+};
+
+static __always_inline void
+debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead)
+{
+#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
+	int i;
+
+	for (i = 0; i < bhead->nr_records; i++)
+		debug_rcu_head_unqueue((struct rcu_head *)(bhead->records[i]));
+#endif
+}
+
+static inline struct kfree_rcu_cpu *
+krc_this_cpu_lock(unsigned long *flags)
+{
+	struct kfree_rcu_cpu *krcp;
+
+	local_irq_save(*flags);	// For safely calling this_cpu_ptr().
+	krcp = this_cpu_ptr(&krc);
+	raw_spin_lock(&krcp->lock);
+
+	return krcp;
+}
+
+static inline void
+krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags)
+{
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+}
+
+static inline struct kvfree_rcu_bulk_data *
+get_cached_bnode(struct kfree_rcu_cpu *krcp)
+{
+	if (!krcp->nr_bkv_objs)
+		return NULL;
+
+	WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs - 1);
+	return (struct kvfree_rcu_bulk_data *)
+		llist_del_first(&krcp->bkvcache);
+}
+
+static inline bool
+put_cached_bnode(struct kfree_rcu_cpu *krcp,
+	struct kvfree_rcu_bulk_data *bnode)
+{
+	// Check the limit.
+	if (krcp->nr_bkv_objs >= rcu_min_cached_objs)
+		return false;
+
+	llist_add((struct llist_node *) bnode, &krcp->bkvcache);
+	WRITE_ONCE(krcp->nr_bkv_objs, krcp->nr_bkv_objs + 1);
+	return true;
+}
+
+static int
+drain_page_cache(struct kfree_rcu_cpu *krcp)
+{
+	unsigned long flags;
+	struct llist_node *page_list, *pos, *n;
+	int freed = 0;
+
+	if (!rcu_min_cached_objs)
+		return 0;
+
+	raw_spin_lock_irqsave(&krcp->lock, flags);
+	page_list = llist_del_all(&krcp->bkvcache);
+	WRITE_ONCE(krcp->nr_bkv_objs, 0);
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+	llist_for_each_safe(pos, n, page_list) {
+		free_page((unsigned long)pos);
+		freed++;
+	}
+
+	return freed;
+}
+
+static void
+kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
+	struct kvfree_rcu_bulk_data *bnode, int idx)
+{
+	unsigned long flags;
+	int i;
+
+	if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
+		debug_rcu_bhead_unqueue(bnode);
+		rcu_lock_acquire(&rcu_callback_map);
+		if (idx == 0) { // kmalloc() / kfree().
+			trace_rcu_invoke_kfree_bulk_callback(
+				"slab", bnode->nr_records,
+				bnode->records);
+
+			kfree_bulk(bnode->nr_records, bnode->records);
+		} else { // vmalloc() / vfree().
+			for (i = 0; i < bnode->nr_records; i++) {
+				trace_rcu_invoke_kvfree_callback(
+					"slab", bnode->records[i], 0);
+
+				vfree(bnode->records[i]);
+			}
+		}
+		rcu_lock_release(&rcu_callback_map);
+	}
+
+	raw_spin_lock_irqsave(&krcp->lock, flags);
+	if (put_cached_bnode(krcp, bnode))
+		bnode = NULL;
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+	if (bnode)
+		free_page((unsigned long) bnode);
+
+	cond_resched_tasks_rcu_qs();
+}
+
+static void
+kvfree_rcu_list(struct rcu_head *head)
+{
+	struct rcu_head *next;
+
+	for (; head; head = next) {
+		void *ptr = (void *) head->func;
+		unsigned long offset = (void *) head - ptr;
+
+		next = head->next;
+		debug_rcu_head_unqueue((struct rcu_head *)ptr);
+		rcu_lock_acquire(&rcu_callback_map);
+		trace_rcu_invoke_kvfree_callback("slab", head, offset);
+
+		if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
+			kvfree(ptr);
+
+		rcu_lock_release(&rcu_callback_map);
+		cond_resched_tasks_rcu_qs();
+	}
+}
+
+/*
+ * This function is invoked in workqueue context after a grace period.
+ * It frees all the objects queued on ->bulk_head_free or ->head_free.
+ */
+static void kfree_rcu_work(struct work_struct *work)
+{
+	unsigned long flags;
+	struct kvfree_rcu_bulk_data *bnode, *n;
+	struct list_head bulk_head[FREE_N_CHANNELS];
+	struct rcu_head *head;
+	struct kfree_rcu_cpu *krcp;
+	struct kfree_rcu_cpu_work *krwp;
+	struct rcu_gp_oldstate head_gp_snap;
+	int i;
+
+	krwp = container_of(to_rcu_work(work),
+		struct kfree_rcu_cpu_work, rcu_work);
+	krcp = krwp->krcp;
+
+	raw_spin_lock_irqsave(&krcp->lock, flags);
+	// Channels 1 and 2.
+	for (i = 0; i < FREE_N_CHANNELS; i++)
+		list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]);
+
+	// Channel 3.
+	head = krwp->head_free;
+	krwp->head_free = NULL;
+	head_gp_snap = krwp->head_free_gp_snap;
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+	// Handle the first two channels.
+	for (i = 0; i < FREE_N_CHANNELS; i++) {
+		// Start from the tail page, so a GP is likely passed for it.
+		list_for_each_entry_safe(bnode, n, &bulk_head[i], list)
+			kvfree_rcu_bulk(krcp, bnode, i);
+	}
+
+	/*
+	 * This is used when the "bulk" path can not be used for the
+	 * double-argument of kvfree_rcu().  This happens when the
+	 * page-cache is empty, which means that objects are instead
+	 * queued on a linked list through their rcu_head structures.
+	 * This list is named "Channel 3".
+	 */
+	if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
+		kvfree_rcu_list(head);
+}
+
+static bool
+need_offload_krc(struct kfree_rcu_cpu *krcp)
+{
+	int i;
+
+	for (i = 0; i < FREE_N_CHANNELS; i++)
+		if (!list_empty(&krcp->bulk_head[i]))
+			return true;
+
+	return !!READ_ONCE(krcp->head);
+}
+
+static bool
+need_wait_for_krwp_work(struct kfree_rcu_cpu_work *krwp)
+{
+	int i;
+
+	for (i = 0; i < FREE_N_CHANNELS; i++)
+		if (!list_empty(&krwp->bulk_head_free[i]))
+			return true;
+
+	return !!krwp->head_free;
+}
+
+static int krc_count(struct kfree_rcu_cpu *krcp)
+{
+	int sum = atomic_read(&krcp->head_count);
+	int i;
+
+	for (i = 0; i < FREE_N_CHANNELS; i++)
+		sum += atomic_read(&krcp->bulk_count[i]);
+
+	return sum;
+}
+
+static void
+__schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+{
+	long delay, delay_left;
+
+	delay = krc_count(krcp) >= KVFREE_BULK_MAX_ENTR ? 1:KFREE_DRAIN_JIFFIES;
+	if (delayed_work_pending(&krcp->monitor_work)) {
+		delay_left = krcp->monitor_work.timer.expires - jiffies;
+		if (delay < delay_left)
+			mod_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+		return;
+	}
+	queue_delayed_work(system_unbound_wq, &krcp->monitor_work, delay);
+}
+
+static void
+schedule_delayed_monitor_work(struct kfree_rcu_cpu *krcp)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&krcp->lock, flags);
+	__schedule_delayed_monitor_work(krcp);
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+}
+
+static void
+kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp)
+{
+	struct list_head bulk_ready[FREE_N_CHANNELS];
+	struct kvfree_rcu_bulk_data *bnode, *n;
+	struct rcu_head *head_ready = NULL;
+	unsigned long flags;
+	int i;
+
+	raw_spin_lock_irqsave(&krcp->lock, flags);
+	for (i = 0; i < FREE_N_CHANNELS; i++) {
+		INIT_LIST_HEAD(&bulk_ready[i]);
+
+		list_for_each_entry_safe_reverse(bnode, n, &krcp->bulk_head[i], list) {
+			if (!poll_state_synchronize_rcu_full(&bnode->gp_snap))
+				break;
+
+			atomic_sub(bnode->nr_records, &krcp->bulk_count[i]);
+			list_move(&bnode->list, &bulk_ready[i]);
+		}
+	}
+
+	if (krcp->head && poll_state_synchronize_rcu(krcp->head_gp_snap)) {
+		head_ready = krcp->head;
+		atomic_set(&krcp->head_count, 0);
+		WRITE_ONCE(krcp->head, NULL);
+	}
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+	for (i = 0; i < FREE_N_CHANNELS; i++) {
+		list_for_each_entry_safe(bnode, n, &bulk_ready[i], list)
+			kvfree_rcu_bulk(krcp, bnode, i);
+	}
+
+	if (head_ready)
+		kvfree_rcu_list(head_ready);
+}
+
+/*
+ * Return: %true if a work is queued, %false otherwise.
+ */
+static bool
+kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp)
+{
+	unsigned long flags;
+	bool queued = false;
+	int i, j;
+
+	raw_spin_lock_irqsave(&krcp->lock, flags);
+
+	// Attempt to start a new batch.
+	for (i = 0; i < KFREE_N_BATCHES; i++) {
+		struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);
+
+		// Try to detach bulk_head or head and attach it, only when
+		// all channels are free.  Any channel is not free means at krwp
+		// there is on-going rcu work to handle krwp's free business.
+		if (need_wait_for_krwp_work(krwp))
+			continue;
+
+		// kvfree_rcu_drain_ready() might handle this krcp, if so give up.
+		if (need_offload_krc(krcp)) {
+			// Channel 1 corresponds to the SLAB-pointer bulk path.
+			// Channel 2 corresponds to vmalloc-pointer bulk path.
+			for (j = 0; j < FREE_N_CHANNELS; j++) {
+				if (list_empty(&krwp->bulk_head_free[j])) {
+					atomic_set(&krcp->bulk_count[j], 0);
+					list_replace_init(&krcp->bulk_head[j],
+						&krwp->bulk_head_free[j]);
+				}
+			}
+
+			// Channel 3 corresponds to both SLAB and vmalloc
+			// objects queued on the linked list.
+			if (!krwp->head_free) {
+				krwp->head_free = krcp->head;
+				get_state_synchronize_rcu_full(&krwp->head_free_gp_snap);
+				atomic_set(&krcp->head_count, 0);
+				WRITE_ONCE(krcp->head, NULL);
+			}
+
+			// One work is per one batch, so there are three
+			// "free channels", the batch can handle. Break
+			// the loop since it is done with this CPU thus
+			// queuing an RCU work is _always_ success here.
+			queued = queue_rcu_work(system_unbound_wq, &krwp->rcu_work);
+			WARN_ON_ONCE(!queued);
+			break;
+		}
+	}
+
+	raw_spin_unlock_irqrestore(&krcp->lock, flags);
+	return queued;
+}
+
+/*
+ * This function is invoked after the KFREE_DRAIN_JIFFIES timeout.
+ */
+static void kfree_rcu_monitor(struct work_struct *work)
+{
+	struct kfree_rcu_cpu *krcp = container_of(work,
+		struct kfree_rcu_cpu, monitor_work.work);
+
+	// Drain ready for reclaim.
+	kvfree_rcu_drain_ready(krcp);
+
+	// Queue a batch for a rest.
+	kvfree_rcu_queue_batch(krcp);
+
+	// If there is nothing to detach, it means that our job is
+	// successfully done here. In case of having at least one
+	// of the channels that is still busy we should rearm the
+	// work to repeat an attempt. Because previous batches are
+	// still in progress.
+	if (need_offload_krc(krcp))
+		schedule_delayed_monitor_work(krcp);
+}
+
+static void fill_page_cache_func(struct work_struct *work)
+{
+	struct kvfree_rcu_bulk_data *bnode;
+	struct kfree_rcu_cpu *krcp =
+		container_of(work, struct kfree_rcu_cpu,
+			page_cache_work.work);
+	unsigned long flags;
+	int nr_pages;
+	bool pushed;
+	int i;
+
+	nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ?
+		1 : rcu_min_cached_objs;
+
+	for (i = READ_ONCE(krcp->nr_bkv_objs); i < nr_pages; i++) {
+		bnode = (struct kvfree_rcu_bulk_data *)
+			__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
+
+		if (!bnode)
+			break;
+
+		raw_spin_lock_irqsave(&krcp->lock, flags);
+		pushed = put_cached_bnode(krcp, bnode);
+		raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+		if (!pushed) {
+			free_page((unsigned long) bnode);
+			break;
+		}
+	}
+
+	atomic_set(&krcp->work_in_progress, 0);
+	atomic_set(&krcp->backoff_page_cache_fill, 0);
+}
+
+// Record ptr in a page managed by krcp, with the pre-krc_this_cpu_lock()
+// state specified by flags.  If can_alloc is true, the caller must
+// be schedulable and not be holding any locks or mutexes that might be
+// acquired by the memory allocator or anything that it might invoke.
+// Returns true if ptr was successfully recorded, else the caller must
+// use a fallback.
+static inline bool
+add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
+	unsigned long *flags, void *ptr, bool can_alloc)
+{
+	struct kvfree_rcu_bulk_data *bnode;
+	int idx;
+
+	*krcp = krc_this_cpu_lock(flags);
+	if (unlikely(!(*krcp)->initialized))
+		return false;
+
+	idx = !!is_vmalloc_addr(ptr);
+	bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx],
+		struct kvfree_rcu_bulk_data, list);
+
+	/* Check if a new block is required. */
+	if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) {
+		bnode = get_cached_bnode(*krcp);
+		if (!bnode && can_alloc) {
+			krc_this_cpu_unlock(*krcp, *flags);
+
+			// __GFP_NORETRY - allows a light-weight direct reclaim
+			// what is OK from minimizing of fallback hitting point of
+			// view. Apart of that it forbids any OOM invoking what is
+			// also beneficial since we are about to release memory soon.
+			//
+			// __GFP_NOMEMALLOC - prevents from consuming of all the
+			// memory reserves. Please note we have a fallback path.
+			//
+			// __GFP_NOWARN - it is supposed that an allocation can
+			// be failed under low memory or high memory pressure
+			// scenarios.
+			bnode = (struct kvfree_rcu_bulk_data *)
+				__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
+			raw_spin_lock_irqsave(&(*krcp)->lock, *flags);
+		}
+
+		if (!bnode)
+			return false;
+
+		// Initialize the new block and attach it.
+		bnode->nr_records = 0;
+		list_add(&bnode->list, &(*krcp)->bulk_head[idx]);
+	}
+
+	// Finally insert and update the GP for this page.
+	bnode->nr_records++;
+	bnode->records[bnode->nr_records - 1] = ptr;
+	get_state_synchronize_rcu_full(&bnode->gp_snap);
+	atomic_inc(&(*krcp)->bulk_count[idx]);
+
+	return true;
+}
+
+#if !defined(CONFIG_TINY_RCU)
+
+static enum hrtimer_restart
+schedule_page_work_fn(struct hrtimer *t)
+{
+	struct kfree_rcu_cpu *krcp =
+		container_of(t, struct kfree_rcu_cpu, hrtimer);
+
+	queue_delayed_work(system_highpri_wq, &krcp->page_cache_work, 0);
+	return HRTIMER_NORESTART;
+}
+
+static void
+run_page_cache_worker(struct kfree_rcu_cpu *krcp)
+{
+	// If cache disabled, bail out.
+	if (!rcu_min_cached_objs)
+		return;
+
+	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING &&
+			!atomic_xchg(&krcp->work_in_progress, 1)) {
+		if (atomic_read(&krcp->backoff_page_cache_fill)) {
+			queue_delayed_work(system_unbound_wq,
+				&krcp->page_cache_work,
+					msecs_to_jiffies(rcu_delay_page_cache_fill_msec));
+		} else {
+			hrtimer_init(&krcp->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+			krcp->hrtimer.function = schedule_page_work_fn;
+			hrtimer_start(&krcp->hrtimer, 0, HRTIMER_MODE_REL);
+		}
+	}
+}
+
+void __init kfree_rcu_scheduler_running(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+
+		if (need_offload_krc(krcp))
+			schedule_delayed_monitor_work(krcp);
+	}
+}
+
+/*
+ * Queue a request for lazy invocation of the appropriate free routine
+ * after a grace period.  Please note that three paths are maintained,
+ * two for the common case using arrays of pointers and a third one that
+ * is used only when the main paths cannot be used, for example, due to
+ * memory pressure.
+ *
+ * Each kvfree_call_rcu() request is added to a batch. The batch will be drained
+ * every KFREE_DRAIN_JIFFIES number of jiffies. All the objects in the batch will
+ * be free'd in workqueue context. This allows us to: batch requests together to
+ * reduce the number of grace periods during heavy kfree_rcu()/kvfree_rcu() load.
+ */
+void kvfree_call_rcu(struct rcu_head *head, void *ptr)
+{
+	unsigned long flags;
+	struct kfree_rcu_cpu *krcp;
+	bool success;
+
+	/*
+	 * Please note there is a limitation for the head-less
+	 * variant, that is why there is a clear rule for such
+	 * objects: it can be used from might_sleep() context
+	 * only. For other places please embed an rcu_head to
+	 * your data.
+	 */
+	if (!head)
+		might_sleep();
+
+	// Queue the object but don't yet schedule the batch.
+	if (debug_rcu_head_queue(ptr)) {
+		// Probable double kfree_rcu(), just leak.
+		WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n",
+			  __func__, head);
+
+		// Mark as success and leave.
+		return;
+	}
+
+	kasan_record_aux_stack_noalloc(ptr);
+	success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr, !head);
+	if (!success) {
+		run_page_cache_worker(krcp);
+
+		if (head == NULL)
+			// Inline if kvfree_rcu(one_arg) call.
+			goto unlock_return;
+
+		head->func = ptr;
+		head->next = krcp->head;
+		WRITE_ONCE(krcp->head, head);
+		atomic_inc(&krcp->head_count);
+
+		// Take a snapshot for this krcp.
+		krcp->head_gp_snap = get_state_synchronize_rcu();
+		success = true;
+	}
+
+	/*
+	 * The kvfree_rcu() caller considers the pointer freed at this point
+	 * and likely removes any references to it. Since the actual slab
+	 * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
+	 * this object (no scanning or false positives reporting).
+	 */
+	kmemleak_ignore(ptr);
+
+	// Set timer to drain after KFREE_DRAIN_JIFFIES.
+	if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
+		__schedule_delayed_monitor_work(krcp);
+
+unlock_return:
+	krc_this_cpu_unlock(krcp, flags);
+
+	/*
+	 * Inline kvfree() after synchronize_rcu(). We can do
+	 * it from might_sleep() context only, so the current
+	 * CPU can pass the QS state.
+	 */
+	if (!success) {
+		debug_rcu_head_unqueue((struct rcu_head *) ptr);
+		synchronize_rcu();
+		kvfree(ptr);
+	}
+}
+EXPORT_SYMBOL_GPL(kvfree_call_rcu);
+
+/**
+ * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete.
+ *
+ * Note that a single argument of kvfree_rcu() call has a slow path that
+ * triggers synchronize_rcu() following by freeing a pointer. It is done
+ * before the return from the function. Therefore for any single-argument
+ * call that will result in a kfree() to a cache that is to be destroyed
+ * during module exit, it is developer's responsibility to ensure that all
+ * such calls have returned before the call to kmem_cache_destroy().
+ */
+void kvfree_rcu_barrier(void)
+{
+	struct kfree_rcu_cpu_work *krwp;
+	struct kfree_rcu_cpu *krcp;
+	bool queued;
+	int i, cpu;
+
+	/*
+	 * Firstly we detach objects and queue them over an RCU-batch
+	 * for all CPUs. Finally queued works are flushed for each CPU.
+	 *
+	 * Please note. If there are outstanding batches for a particular
+	 * CPU, those have to be finished first following by queuing a new.
+	 */
+	for_each_possible_cpu(cpu) {
+		krcp = per_cpu_ptr(&krc, cpu);
+
+		/*
+		 * Check if this CPU has any objects which have been queued for a
+		 * new GP completion. If not(means nothing to detach), we are done
+		 * with it. If any batch is pending/running for this "krcp", below
+		 * per-cpu flush_rcu_work() waits its completion(see last step).
+		 */
+		if (!need_offload_krc(krcp))
+			continue;
+
+		while (1) {
+			/*
+			 * If we are not able to queue a new RCU work it means:
+			 * - batches for this CPU are still in flight which should
+			 *   be flushed first and then repeat;
+			 * - no objects to detach, because of concurrency.
+			 */
+			queued = kvfree_rcu_queue_batch(krcp);
+
+			/*
+			 * Bail out, if there is no need to offload this "krcp"
+			 * anymore. As noted earlier it can run concurrently.
+			 */
+			if (queued || !need_offload_krc(krcp))
+				break;
+
+			/* There are ongoing batches. */
+			for (i = 0; i < KFREE_N_BATCHES; i++) {
+				krwp = &(krcp->krw_arr[i]);
+				flush_rcu_work(&krwp->rcu_work);
+			}
+		}
+	}
+
+	/*
+	 * Now we guarantee that all objects are flushed.
+	 */
+	for_each_possible_cpu(cpu) {
+		krcp = per_cpu_ptr(&krc, cpu);
+
+		/*
+		 * A monitor work can drain ready to reclaim objects
+		 * directly. Wait its completion if running or pending.
+		 */
+		cancel_delayed_work_sync(&krcp->monitor_work);
+
+		for (i = 0; i < KFREE_N_BATCHES; i++) {
+			krwp = &(krcp->krw_arr[i]);
+			flush_rcu_work(&krwp->rcu_work);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
+
+#endif /* #if !defined(CONFIG_TINY_RCU) */
+
+static unsigned long
+kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+	int cpu;
+	unsigned long count = 0;
+
+	/* Snapshot count of all CPUs */
+	for_each_possible_cpu(cpu) {
+		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+
+		count += krc_count(krcp);
+		count += READ_ONCE(krcp->nr_bkv_objs);
+		atomic_set(&krcp->backoff_page_cache_fill, 1);
+	}
+
+	return count == 0 ? SHRINK_EMPTY : count;
+}
+
+static unsigned long
+kfree_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	int cpu, freed = 0;
+
+	for_each_possible_cpu(cpu) {
+		int count;
+		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+
+		count = krc_count(krcp);
+		count += drain_page_cache(krcp);
+		kfree_rcu_monitor(&krcp->monitor_work.work);
+
+		sc->nr_to_scan -= count;
+		freed += count;
+
+		if (sc->nr_to_scan <= 0)
+			break;
+	}
+
+	return freed == 0 ? SHRINK_STOP : freed;
+}
+
+void __init kvfree_rcu_init(void)
+{
+	int cpu;
+	int i, j;
+	struct shrinker *kfree_rcu_shrinker;
+
+	/* Clamp it to [0:100] seconds interval. */
+	if (rcu_delay_page_cache_fill_msec < 0 ||
+		rcu_delay_page_cache_fill_msec > 100 * MSEC_PER_SEC) {
+
+		rcu_delay_page_cache_fill_msec =
+			clamp(rcu_delay_page_cache_fill_msec, 0,
+				(int) (100 * MSEC_PER_SEC));
+
+		pr_info("Adjusting rcutree.rcu_delay_page_cache_fill_msec to %d ms.\n",
+			rcu_delay_page_cache_fill_msec);
+	}
+
+	for_each_possible_cpu(cpu) {
+		struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);
+
+		for (i = 0; i < KFREE_N_BATCHES; i++) {
+			INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
+			krcp->krw_arr[i].krcp = krcp;
+
+			for (j = 0; j < FREE_N_CHANNELS; j++)
+				INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]);
+		}
+
+		for (i = 0; i < FREE_N_CHANNELS; i++)
+			INIT_LIST_HEAD(&krcp->bulk_head[i]);
+
+		INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
+		INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
+		krcp->initialized = true;
+	}
+
+	kfree_rcu_shrinker = shrinker_alloc(0, "slab-kvfree-rcu");
+	if (!kfree_rcu_shrinker) {
+		pr_err("Failed to allocate kfree_rcu() shrinker!\n");
+		return;
+	}
+
+	kfree_rcu_shrinker->count_objects = kfree_rcu_shrink_count;
+	kfree_rcu_shrinker->scan_objects = kfree_rcu_shrink_scan;
+
+	shrinker_register(kfree_rcu_shrinker);
+}
-- 
2.39.5



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
                   ` (4 preceding siblings ...)
  2024-12-12 18:02 ` [PATCH v2 5/5] mm/slab: Move kvfree_rcu() into SLAB Uladzislau Rezki (Sony)
@ 2024-12-12 18:30 ` Christoph Lameter (Ampere)
  2024-12-12 19:08   ` Uladzislau Rezki
  2024-12-12 19:10   ` Paul E. McKenney
  2024-12-15 17:30 ` Vlastimil Babka
  2025-01-06  7:21 ` [External Mail] " Hyeonggon Yoo
  7 siblings, 2 replies; 31+ messages in thread
From: Christoph Lameter (Ampere) @ 2024-12-12 18:30 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka, RCU,
	LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Oleksiy Avramchenko

On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote:

> This is v2. It is based on the Linux 6.13-rc2. The first version is
> here:

I do not see any use of internal slab interfaces by this code. It seems to
be using rcu internals though. So it would best be placed with the rcu
code.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-12 18:30 ` [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Christoph Lameter (Ampere)
@ 2024-12-12 19:08   ` Uladzislau Rezki
  2024-12-12 19:10   ` Paul E. McKenney
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2024-12-12 19:08 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka, RCU,
	LKML, Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Oleksiy Avramchenko

On Thu, Dec 12, 2024 at 10:30:28AM -0800, Christoph Lameter (Ampere) wrote:
> On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote:
> 
> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> > here:
> 
> I do not see any use of internal slab interfaces by this code. It seems to
> be using rcu internals though. So it would best be placed with the rcu
> code.
>
I think, later on there will be integration. This is a step forward to place
it under mm where it should be.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-12 18:30 ` [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Christoph Lameter (Ampere)
  2024-12-12 19:08   ` Uladzislau Rezki
@ 2024-12-12 19:10   ` Paul E. McKenney
  2024-12-12 19:13     ` Uladzislau Rezki
  1 sibling, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2024-12-12 19:10 UTC (permalink / raw)
  To: Christoph Lameter (Ampere)
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, Vlastimil Babka, RCU, LKML,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Oleksiy Avramchenko

On Thu, Dec 12, 2024 at 10:30:28AM -0800, Christoph Lameter (Ampere) wrote:
> On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote:
> 
> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> > here:
> 
> I do not see any use of internal slab interfaces by this code. It seems to
> be using rcu internals though. So it would best be placed with the rcu
> code.

That is indeed the current state.  The point of moving it is to later
take advantage of internal slab state.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-12 19:10   ` Paul E. McKenney
@ 2024-12-12 19:13     ` Uladzislau Rezki
  0 siblings, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2024-12-12 19:13 UTC (permalink / raw)
  To: Paul E. McKenney, Christoph Lameter (Ampere)
  Cc: Christoph Lameter (Ampere), Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, Vlastimil Babka, RCU, LKML,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Oleksiy Avramchenko

On Thu, Dec 12, 2024 at 11:10:47AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 12, 2024 at 10:30:28AM -0800, Christoph Lameter (Ampere) wrote:
> > On Thu, 12 Dec 2024, Uladzislau Rezki (Sony) wrote:
> > 
> > > This is v2. It is based on the Linux 6.13-rc2. The first version is
> > > here:
> > 
> > I do not see any use of internal slab interfaces by this code. It seems to
> > be using rcu internals though. So it would best be placed with the rcu
> > code.
> 
> That is indeed the current state.  The point of moving it is to later
> take advantage of internal slab state.
> 
And, in fact we already have some integrations. For example a barrier
has been added for slab caches.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
                   ` (5 preceding siblings ...)
  2024-12-12 18:30 ` [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Christoph Lameter (Ampere)
@ 2024-12-15 17:30 ` Vlastimil Babka
  2024-12-15 18:21   ` Paul E. McKenney
                     ` (2 more replies)
  2025-01-06  7:21 ` [External Mail] " Hyeonggon Yoo
  7 siblings, 3 replies; 31+ messages in thread
From: Vlastimil Babka @ 2024-12-15 17:30 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony), linux-mm, Paul E . McKenney, Andrew Morton
  Cc: RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
> Hello!
> 
> This is v2. It is based on the Linux 6.13-rc2. The first version is
> here:
> 
> https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> 
> The difference between v1 and v2 is that, the preparation process is
> done in original place instead and after that there is one final move.

Looks good, will include in slab/for-next

I think patch 5 should add more explanation to the commit message - the
subthread started by Christoph could provide content :) Can you summarize so
I can amend the commit log?

Also how about a followup patch moving the rcu-tiny implementation of
kvfree_call_rcu()?

We might also consider moving the kfree_rcu*() entry points from rcupdate.h
to slab.h, what do you think, is it a more logical place for them? There's
some risk that files that include rcupdate.h and not slab.h would break, so
that will need some build testing...

Thanks,
Vlastimil

> Uladzislau Rezki (Sony) (5):
>   rcu/kvfree: Initialize kvfree_rcu() separately
>   rcu/kvfree: Move some functions under CONFIG_TINY_RCU
>   rcu/kvfree: Adjust names passed into trace functions
>   rcu/kvfree: Adjust a shrinker name
>   mm/slab: Move kvfree_rcu() into SLAB
> 
>  include/linux/slab.h |   1 +
>  init/main.c          |   1 +
>  kernel/rcu/tree.c    | 876 ------------------------------------------
>  mm/slab_common.c     | 880 +++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 882 insertions(+), 876 deletions(-)
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-15 17:30 ` Vlastimil Babka
@ 2024-12-15 18:21   ` Paul E. McKenney
  2024-12-16 11:03   ` Uladzislau Rezki
  2024-12-16 13:07   ` Uladzislau Rezki
  2 siblings, 0 replies; 31+ messages in thread
From: Paul E. McKenney @ 2024-12-15 18:21 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Andrew Morton, RCU, LKML, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Oleksiy Avramchenko

On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
> > Hello!
> > 
> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> > here:
> > 
> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> > 
> > The difference between v1 and v2 is that, the preparation process is
> > done in original place instead and after that there is one final move.
> 
> Looks good, will include in slab/for-next
> 
> I think patch 5 should add more explanation to the commit message - the
> subthread started by Christoph could provide content :) Can you summarize so
> I can amend the commit log?
> 
> Also how about a followup patch moving the rcu-tiny implementation of
> kvfree_call_rcu()?
> 
> We might also consider moving the kfree_rcu*() entry points from rcupdate.h
> to slab.h, what do you think, is it a more logical place for them? There's
> some risk that files that include rcupdate.h and not slab.h would break, so
> that will need some build testing...

Moving the RCU Tiny implemention (or maybe even just retiring it in
favor of the RCU Tree implementation) and moving the entry points make
sense to me!

							Thanx, Paul

> Thanks,
> Vlastimil
> 
> > Uladzislau Rezki (Sony) (5):
> >   rcu/kvfree: Initialize kvfree_rcu() separately
> >   rcu/kvfree: Move some functions under CONFIG_TINY_RCU
> >   rcu/kvfree: Adjust names passed into trace functions
> >   rcu/kvfree: Adjust a shrinker name
> >   mm/slab: Move kvfree_rcu() into SLAB
> > 
> >  include/linux/slab.h |   1 +
> >  init/main.c          |   1 +
> >  kernel/rcu/tree.c    | 876 ------------------------------------------
> >  mm/slab_common.c     | 880 +++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 882 insertions(+), 876 deletions(-)
> > 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-15 17:30 ` Vlastimil Babka
  2024-12-15 18:21   ` Paul E. McKenney
@ 2024-12-16 11:03   ` Uladzislau Rezki
  2024-12-16 14:20     ` Vlastimil Babka
  2024-12-16 13:07   ` Uladzislau Rezki
  2 siblings, 1 reply; 31+ messages in thread
From: Uladzislau Rezki @ 2024-12-16 11:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Paul E . McKenney, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
> > Hello!
> > 
> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> > here:
> > 
> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> > 
> > The difference between v1 and v2 is that, the preparation process is
> > done in original place instead and after that there is one final move.
> 
> Looks good, will include in slab/for-next
> 
> I think patch 5 should add more explanation to the commit message - the
> subthread started by Christoph could provide content :) Can you summarize so
> I can amend the commit log?
> 
I will :)

> Also how about a followup patch moving the rcu-tiny implementation of
> kvfree_call_rcu()?
> 
As, Paul already noted, it would make sense. Or just remove a tiny
implementation.

>
> We might also consider moving the kfree_rcu*() entry points from rcupdate.h
> to slab.h, what do you think, is it a more logical place for them? There's
> some risk that files that include rcupdate.h and not slab.h would break, so
> that will need some build testing...
> 
I agree. I have not moved them in this series, because it requires more
testing due to a build break. I can work on this further, so it is not
an issue.

Thank you for taking this!

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-15 17:30 ` Vlastimil Babka
  2024-12-15 18:21   ` Paul E. McKenney
  2024-12-16 11:03   ` Uladzislau Rezki
@ 2024-12-16 13:07   ` Uladzislau Rezki
  2025-01-11 19:40     ` Vlastimil Babka
  2 siblings, 1 reply; 31+ messages in thread
From: Uladzislau Rezki @ 2024-12-16 13:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki (Sony),
	linux-mm, Paul E . McKenney, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
> > Hello!
> > 
> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> > here:
> > 
> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> > 
> > The difference between v1 and v2 is that, the preparation process is
> > done in original place instead and after that there is one final move.
> 
> Looks good, will include in slab/for-next
> 
> I think patch 5 should add more explanation to the commit message - the
> subthread started by Christoph could provide content :) Can you summarize so
> I can amend the commit log?
>
<snip>
mm/slab: Move kvfree_rcu() into SLAB

Move kvfree_rcu() functionality to the slab_common.c file.

The reason of being kvfree_rcu() functionality as part of SLAB is
that, there is a clear trend and need of closer integration. One
of the recent example is creating a barrier function for SLAB caches.

Another reason is to prevent of having several implementations of
RCU machinery for reclaiming objects after a GP. As future steps,
it can be more integrated(easier) with SLAB internals.
<snip>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-16 11:03   ` Uladzislau Rezki
@ 2024-12-16 14:20     ` Vlastimil Babka
  2024-12-16 15:41       ` Uladzislau Rezki
  0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2024-12-16 14:20 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Paul E . McKenney, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On 12/16/24 12:03, Uladzislau Rezki wrote:
> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
>> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
>> > Hello!
>> > 
>> > This is v2. It is based on the Linux 6.13-rc2. The first version is
>> > here:
>> > 
>> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
>> > 
>> > The difference between v1 and v2 is that, the preparation process is
>> > done in original place instead and after that there is one final move.
>> 
>> Looks good, will include in slab/for-next
>> 
>> I think patch 5 should add more explanation to the commit message - the
>> subthread started by Christoph could provide content :) Can you summarize so
>> I can amend the commit log?
>> 
> I will :)
> 
>> Also how about a followup patch moving the rcu-tiny implementation of
>> kvfree_call_rcu()?
>> 
> As, Paul already noted, it would make sense. Or just remove a tiny
> implementation.

AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
implementation with all the batching etc or would that be unnecessary overhead?

>>
>> We might also consider moving the kfree_rcu*() entry points from rcupdate.h
>> to slab.h, what do you think, is it a more logical place for them? There's
>> some risk that files that include rcupdate.h and not slab.h would break, so
>> that will need some build testing...
>> 
> I agree. I have not moved them in this series, because it requires more
> testing due to a build break. I can work on this further, so it is not
> an issue.
> 
> Thank you for taking this!
> 
> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-16 14:20     ` Vlastimil Babka
@ 2024-12-16 15:41       ` Uladzislau Rezki
  2024-12-16 15:44         ` Vlastimil Babka
  0 siblings, 1 reply; 31+ messages in thread
From: Uladzislau Rezki @ 2024-12-16 15:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, linux-mm, Paul E . McKenney, Andrew Morton,
	RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> On 12/16/24 12:03, Uladzislau Rezki wrote:
> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
> >> > Hello!
> >> > 
> >> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> >> > here:
> >> > 
> >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> >> > 
> >> > The difference between v1 and v2 is that, the preparation process is
> >> > done in original place instead and after that there is one final move.
> >> 
> >> Looks good, will include in slab/for-next
> >> 
> >> I think patch 5 should add more explanation to the commit message - the
> >> subthread started by Christoph could provide content :) Can you summarize so
> >> I can amend the commit log?
> >> 
> > I will :)
> > 
> >> Also how about a followup patch moving the rcu-tiny implementation of
> >> kvfree_call_rcu()?
> >> 
> > As, Paul already noted, it would make sense. Or just remove a tiny
> > implementation.
> 
> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> implementation with all the batching etc or would that be unnecessary overhead?
> 
Yes, it is for a really small systems with low amount of memory. I see
only one overhead it is about driving objects in pages. For a small
system it can be critical because we allocate.

From the other hand, for a tiny variant we can modify the normal variant
by bypassing batching logic, thus do not consume memory(for Tiny case)
i.e. merge it to a normal kvfree_rcu() path.

After that we do not depend on CONFIG_RCU_TINY option. Probably we need
also to perform some adaptation of regular kvfree_rcu() for a single CPU
system.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-16 15:41       ` Uladzislau Rezki
@ 2024-12-16 15:44         ` Vlastimil Babka
  2024-12-16 15:55           ` Uladzislau Rezki
  0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2024-12-16 15:44 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Paul E . McKenney, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On 12/16/24 16:41, Uladzislau Rezki wrote:
> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
>> On 12/16/24 12:03, Uladzislau Rezki wrote:
>> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
>> >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
>> >> > Hello!
>> >> > 
>> >> > This is v2. It is based on the Linux 6.13-rc2. The first version is
>> >> > here:
>> >> > 
>> >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
>> >> > 
>> >> > The difference between v1 and v2 is that, the preparation process is
>> >> > done in original place instead and after that there is one final move.
>> >> 
>> >> Looks good, will include in slab/for-next
>> >> 
>> >> I think patch 5 should add more explanation to the commit message - the
>> >> subthread started by Christoph could provide content :) Can you summarize so
>> >> I can amend the commit log?
>> >> 
>> > I will :)
>> > 
>> >> Also how about a followup patch moving the rcu-tiny implementation of
>> >> kvfree_call_rcu()?
>> >> 
>> > As, Paul already noted, it would make sense. Or just remove a tiny
>> > implementation.
>> 
>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
>> implementation with all the batching etc or would that be unnecessary overhead?
>> 
> Yes, it is for a really small systems with low amount of memory. I see
> only one overhead it is about driving objects in pages. For a small
> system it can be critical because we allocate.
> 
> From the other hand, for a tiny variant we can modify the normal variant
> by bypassing batching logic, thus do not consume memory(for Tiny case)
> i.e. merge it to a normal kvfree_rcu() path.

Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
case (less memory usage on low memory system, tradeoff for worse performance).

> After that we do not depend on CONFIG_RCU_TINY option. Probably we need
> also to perform some adaptation of regular kvfree_rcu() for a single CPU
> system.
> 
> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-16 15:44         ` Vlastimil Babka
@ 2024-12-16 15:55           ` Uladzislau Rezki
  2024-12-16 16:46             ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Uladzislau Rezki @ 2024-12-16 15:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Uladzislau Rezki, linux-mm, Paul E . McKenney, Andrew Morton,
	RCU, LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> On 12/16/24 16:41, Uladzislau Rezki wrote:
> > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> >> On 12/16/24 12:03, Uladzislau Rezki wrote:
> >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> >> >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
> >> >> > Hello!
> >> >> > 
> >> >> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> >> >> > here:
> >> >> > 
> >> >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> >> >> > 
> >> >> > The difference between v1 and v2 is that, the preparation process is
> >> >> > done in original place instead and after that there is one final move.
> >> >> 
> >> >> Looks good, will include in slab/for-next
> >> >> 
> >> >> I think patch 5 should add more explanation to the commit message - the
> >> >> subthread started by Christoph could provide content :) Can you summarize so
> >> >> I can amend the commit log?
> >> >> 
> >> > I will :)
> >> > 
> >> >> Also how about a followup patch moving the rcu-tiny implementation of
> >> >> kvfree_call_rcu()?
> >> >> 
> >> > As, Paul already noted, it would make sense. Or just remove a tiny
> >> > implementation.
> >> 
> >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> >> implementation with all the batching etc or would that be unnecessary overhead?
> >> 
> > Yes, it is for a really small systems with low amount of memory. I see
> > only one overhead it is about driving objects in pages. For a small
> > system it can be critical because we allocate.
> > 
> > From the other hand, for a tiny variant we can modify the normal variant
> > by bypassing batching logic, thus do not consume memory(for Tiny case)
> > i.e. merge it to a normal kvfree_rcu() path.
> 
> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> case (less memory usage on low memory system, tradeoff for worse performance).
> 
Yep, i also was thinking about that without saying it :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-16 15:55           ` Uladzislau Rezki
@ 2024-12-16 16:46             ` Paul E. McKenney
  2025-01-20 22:06               ` Vlastimil Babka
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2024-12-16 16:46 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Vlastimil Babka, linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> > On 12/16/24 16:41, Uladzislau Rezki wrote:
> > > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> > >> On 12/16/24 12:03, Uladzislau Rezki wrote:
> > >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> > >> >> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
> > >> >> > Hello!
> > >> >> > 
> > >> >> > This is v2. It is based on the Linux 6.13-rc2. The first version is
> > >> >> > here:
> > >> >> > 
> > >> >> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> > >> >> > 
> > >> >> > The difference between v1 and v2 is that, the preparation process is
> > >> >> > done in original place instead and after that there is one final move.
> > >> >> 
> > >> >> Looks good, will include in slab/for-next
> > >> >> 
> > >> >> I think patch 5 should add more explanation to the commit message - the
> > >> >> subthread started by Christoph could provide content :) Can you summarize so
> > >> >> I can amend the commit log?
> > >> >> 
> > >> > I will :)
> > >> > 
> > >> >> Also how about a followup patch moving the rcu-tiny implementation of
> > >> >> kvfree_call_rcu()?
> > >> >> 
> > >> > As, Paul already noted, it would make sense. Or just remove a tiny
> > >> > implementation.
> > >> 
> > >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> > >> implementation with all the batching etc or would that be unnecessary overhead?
> > >> 
> > > Yes, it is for a really small systems with low amount of memory. I see
> > > only one overhead it is about driving objects in pages. For a small
> > > system it can be critical because we allocate.
> > > 
> > > From the other hand, for a tiny variant we can modify the normal variant
> > > by bypassing batching logic, thus do not consume memory(for Tiny case)
> > > i.e. merge it to a normal kvfree_rcu() path.
> > 
> > Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> > case (less memory usage on low memory system, tradeoff for worse performance).
> > 
> Yep, i also was thinking about that without saying it :)

Works for me as well!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External Mail] [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
                   ` (6 preceding siblings ...)
  2024-12-15 17:30 ` Vlastimil Babka
@ 2025-01-06  7:21 ` Hyeonggon Yoo
  2025-01-11 19:43   ` Vlastimil Babka
  7 siblings, 1 reply; 31+ messages in thread
From: Hyeonggon Yoo @ 2025-01-06  7:21 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony),
	linux-mm, Paul E . McKenney, Andrew Morton, Vlastimil Babka
  Cc: kernel_team, 42.hyeyoo, RCU, LKML, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Oleksiy Avramchenko



On 2024-12-13 3:02 AM, Uladzislau Rezki (Sony) wrote:
> Hello!
> 
> This is v2. It is based on the Linux 6.13-rc2. The first version is
> here:
> 
> https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
> 
> The difference between v1 and v2 is that, the preparation process is
> done in original place instead and after that there is one final move.
> 
> Uladzislau Rezki (Sony) (5):
>    rcu/kvfree: Initialize kvfree_rcu() separately
>    rcu/kvfree: Move some functions under CONFIG_TINY_RCU
>    rcu/kvfree: Adjust names passed into trace functions
>    rcu/kvfree: Adjust a shrinker name
>    mm/slab: Move kvfree_rcu() into SLAB
> 
>   include/linux/slab.h |   1 +
>   init/main.c          |   1 +
>   kernel/rcu/tree.c    | 876 ------------------------------------------
>   mm/slab_common.c     | 880 +++++++++++++++++++++++++++++++++++++++++++
>   4 files changed, 882 insertions(+), 876 deletions(-)

Sorry for the late reply, but better late than never...

FWIW,

Acked-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
Tested-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com>

Thanks for all the efforts!

By the way, any future plans how to take advantage of internal slab
state?

--
Hyeonggon


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-16 13:07   ` Uladzislau Rezki
@ 2025-01-11 19:40     ` Vlastimil Babka
  0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-01-11 19:40 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Paul E . McKenney, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On 12/16/24 14:07, Uladzislau Rezki wrote:
> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
>> On 12/12/24 19:02, Uladzislau Rezki (Sony) wrote:
>> > Hello!
>> > 
>> > This is v2. It is based on the Linux 6.13-rc2. The first version is
>> > here:
>> > 
>> > https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
>> > 
>> > The difference between v1 and v2 is that, the preparation process is
>> > done in original place instead and after that there is one final move.
>> 
>> Looks good, will include in slab/for-next
>> 
>> I think patch 5 should add more explanation to the commit message - the
>> subthread started by Christoph could provide content :) Can you summarize so
>> I can amend the commit log?
>>
> <snip>
> mm/slab: Move kvfree_rcu() into SLAB
> 
> Move kvfree_rcu() functionality to the slab_common.c file.
> 
> The reason of being kvfree_rcu() functionality as part of SLAB is
> that, there is a clear trend and need of closer integration. One
> of the recent example is creating a barrier function for SLAB caches.
> 
> Another reason is to prevent of having several implementations of
> RCU machinery for reclaiming objects after a GP. As future steps,
> it can be more integrated(easier) with SLAB internals.
> <snip>

Thanks, amended.

> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [External Mail] [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-06  7:21 ` [External Mail] " Hyeonggon Yoo
@ 2025-01-11 19:43   ` Vlastimil Babka
  0 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-01-11 19:43 UTC (permalink / raw)
  To: Hyeonggon Yoo, Uladzislau Rezki (Sony),
	linux-mm, Paul E . McKenney, Andrew Morton
  Cc: kernel_team, 42.hyeyoo, RCU, LKML, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Oleksiy Avramchenko

On 1/6/25 08:21, Hyeonggon Yoo wrote:
> 
> 
> On 2024-12-13 3:02 AM, Uladzislau Rezki (Sony) wrote:
>> Hello!
>> 
>> This is v2. It is based on the Linux 6.13-rc2. The first version is
>> here:
>> 
>> https://lore.kernel.org/linux-mm/20241210164035.3391747-4-urezki@gmail.com/T/
>> 
>> The difference between v1 and v2 is that, the preparation process is
>> done in original place instead and after that there is one final move.
>> 
>> Uladzislau Rezki (Sony) (5):
>>    rcu/kvfree: Initialize kvfree_rcu() separately
>>    rcu/kvfree: Move some functions under CONFIG_TINY_RCU
>>    rcu/kvfree: Adjust names passed into trace functions
>>    rcu/kvfree: Adjust a shrinker name
>>    mm/slab: Move kvfree_rcu() into SLAB
>> 
>>   include/linux/slab.h |   1 +
>>   init/main.c          |   1 +
>>   kernel/rcu/tree.c    | 876 ------------------------------------------
>>   mm/slab_common.c     | 880 +++++++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 882 insertions(+), 876 deletions(-)
> 
> Sorry for the late reply, but better late than never...
> 
> FWIW,
> 
> Acked-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com>
> Tested-by: Hyeonggon Yoo <hyeonggon.yoo@sk.com>

Thanks, applied.

> Thanks for all the efforts!
> 
> By the way, any future plans how to take advantage of internal slab
> state?

One way is the sheaves effort which in the last RFC also replaced the way
kfree_rcu() is handled for caches with enabled sheaves.

Perhaps even without those we could consider e.g. doing the kfree_rcu()
batching by grouping the pending frees per kmem cache, which could perhaps
lead to more efficient flushing later.

> --
> Hyeonggon



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2024-12-16 16:46             ` Paul E. McKenney
@ 2025-01-20 22:06               ` Vlastimil Babka
  2025-01-21 13:33                 ` Uladzislau Rezki
  0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2025-01-20 22:06 UTC (permalink / raw)
  To: paulmck, Uladzislau Rezki
  Cc: linux-mm, Andrew Morton, RCU, LKML, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Oleksiy Avramchenko

On 12/16/24 17:46, Paul E. McKenney wrote:
> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
>> > On 12/16/24 16:41, Uladzislau Rezki wrote:
>> > > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
>> > >> On 12/16/24 12:03, Uladzislau Rezki wrote:
>> > >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
>> > >> > 
>> > >> >> Also how about a followup patch moving the rcu-tiny implementation of
>> > >> >> kvfree_call_rcu()?
>> > >> >> 
>> > >> > As, Paul already noted, it would make sense. Or just remove a tiny
>> > >> > implementation.
>> > >> 
>> > >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
>> > >> implementation with all the batching etc or would that be unnecessary overhead?
>> > >> 
>> > > Yes, it is for a really small systems with low amount of memory. I see
>> > > only one overhead it is about driving objects in pages. For a small
>> > > system it can be critical because we allocate.
>> > > 
>> > > From the other hand, for a tiny variant we can modify the normal variant
>> > > by bypassing batching logic, thus do not consume memory(for Tiny case)
>> > > i.e. merge it to a normal kvfree_rcu() path.
>> > 
>> > Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
>> > case (less memory usage on low memory system, tradeoff for worse performance).
>> > 
>> Yep, i also was thinking about that without saying it :)
> 
> Works for me as well!

Hi, so I tried looking at this. First I just moved the code to slab as seen
in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
not a show-stopper here.

Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
to control whether we use the full blown batching implementation or the
simple call_rcu() implmentation, and realized it's not straightforward and
reveals there are still some subtle dependencies of kvfree_rcu() on RCU
internals :)

Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU

AFAICS the batching implementation includes kfree_rcu_scheduler_running()
which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
there are other facilities the batching implementation needs that only
exists in the TREE_RCU implementation

Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
memory systems are fine with the simple implementation.

Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY

AFAICS I can't just make the simple implementation do call_rcu() on
CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
but no such equivalent exists in TREE_RCU. Am I right?

Possible solution: teach TREE_RCU callback invocation to handle
__is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
but RCU has to special case it still.

Possible solution 2: instead of the special offset handling, SLUB provides a
callback function, which will determine pointer to the object from the
pointer to a middle of it without knowing the rcu_head offset.
Downside: this will have some overhead, but SLUB_TINY is not meant to be
performant anyway so we might not care.
Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well

Thoughts?

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slub-tiny-kfree_rcu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-20 22:06               ` Vlastimil Babka
@ 2025-01-21 13:33                 ` Uladzislau Rezki
  2025-01-21 13:49                   ` Vlastimil Babka
  0 siblings, 1 reply; 31+ messages in thread
From: Uladzislau Rezki @ 2025-01-21 13:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: paulmck, Uladzislau Rezki, linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> On 12/16/24 17:46, Paul E. McKenney wrote:
> > On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> >> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> >> > On 12/16/24 16:41, Uladzislau Rezki wrote:
> >> > > On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> >> > >> On 12/16/24 12:03, Uladzislau Rezki wrote:
> >> > >> > On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> >> > >> > 
> >> > >> >> Also how about a followup patch moving the rcu-tiny implementation of
> >> > >> >> kvfree_call_rcu()?
> >> > >> >> 
> >> > >> > As, Paul already noted, it would make sense. Or just remove a tiny
> >> > >> > implementation.
> >> > >> 
> >> > >> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> >> > >> implementation with all the batching etc or would that be unnecessary overhead?
> >> > >> 
> >> > > Yes, it is for a really small systems with low amount of memory. I see
> >> > > only one overhead it is about driving objects in pages. For a small
> >> > > system it can be critical because we allocate.
> >> > > 
> >> > > From the other hand, for a tiny variant we can modify the normal variant
> >> > > by bypassing batching logic, thus do not consume memory(for Tiny case)
> >> > > i.e. merge it to a normal kvfree_rcu() path.
> >> > 
> >> > Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> >> > case (less memory usage on low memory system, tradeoff for worse performance).
> >> > 
> >> Yep, i also was thinking about that without saying it :)
> > 
> > Works for me as well!
> 
> Hi, so I tried looking at this. First I just moved the code to slab as seen
> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> not a show-stopper here.
> 
> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> to control whether we use the full blown batching implementation or the
> simple call_rcu() implmentation, and realized it's not straightforward and
> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> internals :)
> 
> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> 
> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> there are other facilities the batching implementation needs that only
> exists in the TREE_RCU implementation
> 
> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> memory systems are fine with the simple implementation.
> 
> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> 
> AFAICS I can't just make the simple implementation do call_rcu() on
> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> but no such equivalent exists in TREE_RCU. Am I right?
> 
> Possible solution: teach TREE_RCU callback invocation to handle
> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> but RCU has to special case it still.
> 
> Possible solution 2: instead of the special offset handling, SLUB provides a
> callback function, which will determine pointer to the object from the
> pointer to a middle of it without knowing the rcu_head offset.
> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> performant anyway so we might not care.
> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> 
> Thoughts?
> 
For the call_rcu() and to be able to reclaim over it we need to patch the
tree.c(please note TINY already works):

<snip>
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b1f883fcd918..ab24229dfa73 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
                debug_rcu_head_unqueue(rhp);

                rcu_lock_acquire(&rcu_callback_map);
-               trace_rcu_invoke_callback(rcu_state.name, rhp);

                f = rhp->func;
-               debug_rcu_head_callback(rhp);
-               WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
-               f(rhp);

+               if (__is_kvfree_rcu_offset((unsigned long) f)) {
+                       trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
+                       kvfree((void *) rhp - (unsigned long) f);
+               } else {
+                       trace_rcu_invoke_callback(rcu_state.name, rhp);
+                       debug_rcu_head_callback(rhp);
+                       WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
+                       f(rhp);
+               }
                rcu_lock_release(&rcu_callback_map);

                /*
<snip>

Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
should be avoided, i.e. if we can, we should eliminate a dependency on
TREE_RCU or TINY_RCU in a slab. As much as possible.

So, it requires a more closer look for sure :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-21 13:33                 ` Uladzislau Rezki
@ 2025-01-21 13:49                   ` Vlastimil Babka
  2025-01-21 14:14                     ` Uladzislau Rezki
  0 siblings, 1 reply; 31+ messages in thread
From: Vlastimil Babka @ 2025-01-21 13:49 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: paulmck, linux-mm, Andrew Morton, RCU, LKML, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Roman Gushchin,
	Hyeonggon Yoo, Oleksiy Avramchenko

On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
>> On 12/16/24 17:46, Paul E. McKenney wrote:
>>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
>>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
>>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
>>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
>>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
>>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
>>>>>>>>
>>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
>>>>>>>>> kvfree_call_rcu()?
>>>>>>>>>
>>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
>>>>>>>> implementation.
>>>>>>>
>>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
>>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
>>>>>>>
>>>>>> Yes, it is for a really small systems with low amount of memory. I see
>>>>>> only one overhead it is about driving objects in pages. For a small
>>>>>> system it can be critical because we allocate.
>>>>>>
>>>>>> From the other hand, for a tiny variant we can modify the normal variant
>>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
>>>>>> i.e. merge it to a normal kvfree_rcu() path.
>>>>>
>>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
>>>>> case (less memory usage on low memory system, tradeoff for worse performance).
>>>>>
>>>> Yep, i also was thinking about that without saying it :)
>>>
>>> Works for me as well!
>>
>> Hi, so I tried looking at this. First I just moved the code to slab as seen
>> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
>> not a show-stopper here.
>>
>> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
>> to control whether we use the full blown batching implementation or the
>> simple call_rcu() implmentation, and realized it's not straightforward and
>> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
>> internals :)
>>
>> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
>>
>> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
>> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
>> there are other facilities the batching implementation needs that only
>> exists in the TREE_RCU implementation
>>
>> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
>> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
>> memory systems are fine with the simple implementation.
>>
>> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
>>
>> AFAICS I can't just make the simple implementation do call_rcu() on
>> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
>> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
>> but no such equivalent exists in TREE_RCU. Am I right?
>>
>> Possible solution: teach TREE_RCU callback invocation to handle
>> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
>> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
>> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
>> but RCU has to special case it still.
>>
>> Possible solution 2: instead of the special offset handling, SLUB provides a
>> callback function, which will determine pointer to the object from the
>> pointer to a middle of it without knowing the rcu_head offset.
>> Downside: this will have some overhead, but SLUB_TINY is not meant to be
>> performant anyway so we might not care.
>> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
>>
>> Thoughts?
>>
> For the call_rcu() and to be able to reclaim over it we need to patch the
> tree.c(please note TINY already works):
> 
> <snip>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b1f883fcd918..ab24229dfa73 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
>                 debug_rcu_head_unqueue(rhp);
> 
>                 rcu_lock_acquire(&rcu_callback_map);
> -               trace_rcu_invoke_callback(rcu_state.name, rhp);
> 
>                 f = rhp->func;
> -               debug_rcu_head_callback(rhp);
> -               WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> -               f(rhp);
> 
> +               if (__is_kvfree_rcu_offset((unsigned long) f)) {
> +                       trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> +                       kvfree((void *) rhp - (unsigned long) f);
> +               } else {
> +                       trace_rcu_invoke_callback(rcu_state.name, rhp);
> +                       debug_rcu_head_callback(rhp);
> +                       WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> +                       f(rhp);
> +               }
>                 rcu_lock_release(&rcu_callback_map);

Right so that's the first Possible solution, but without the #ifdef. So
there's an overhead of checking __is_kvfree_rcu_offset() even if the
batching is done in slab and this function is never called with an offset.

After coming up with Possible solution 2, I've started liking the idea
more as RCU could then forget about the __is_kvfree_rcu_offset()
"callbacks" completely, and the performant case of TREE_RCU + batching
would be unaffected.

I'm speculating perhaps if there was not CONFIG_SLOB in the past, the
__is_kvfree_rcu_offset() would never exist in the first place? SLAB and
SLUB both can determine start of the object from a pointer to the middle
of it, while SLOB couldn't.

>                 /*
> <snip>
> 
> Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
> should be avoided, i.e. if we can, we should eliminate a dependency on
> TREE_RCU or TINY_RCU in a slab. As much as possible.
> 
> So, it requires a more closer look for sure :)

That requires solving Problem 1 above, but question is if it's worth the
trouble. Systems running TINY_RCU are unlikely to benefit from the batching?

But sure there's also possibility to hide these dependencies in KConfig,
so the slab code would only consider a single (for example) #ifdef
CONFIG_KVFREE_RCU_BATCHING that would be set automatically depending on
TREE_RCU and !SLUB_TINY.

> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-21 13:49                   ` Vlastimil Babka
@ 2025-01-21 14:14                     ` Uladzislau Rezki
  2025-01-21 20:32                       ` Paul E. McKenney
  0 siblings, 1 reply; 31+ messages in thread
From: Uladzislau Rezki @ 2025-01-21 14:14 UTC (permalink / raw)
  To: Vlastimil Babka, paulmck
  Cc: Uladzislau Rezki, paulmck, linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> > On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> >> On 12/16/24 17:46, Paul E. McKenney wrote:
> >>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> >>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> >>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
> >>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> >>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
> >>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> >>>>>>>>
> >>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
> >>>>>>>>> kvfree_call_rcu()?
> >>>>>>>>>
> >>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
> >>>>>>>> implementation.
> >>>>>>>
> >>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> >>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
> >>>>>>>
> >>>>>> Yes, it is for a really small systems with low amount of memory. I see
> >>>>>> only one overhead it is about driving objects in pages. For a small
> >>>>>> system it can be critical because we allocate.
> >>>>>>
> >>>>>> From the other hand, for a tiny variant we can modify the normal variant
> >>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
> >>>>>> i.e. merge it to a normal kvfree_rcu() path.
> >>>>>
> >>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> >>>>> case (less memory usage on low memory system, tradeoff for worse performance).
> >>>>>
> >>>> Yep, i also was thinking about that without saying it :)
> >>>
> >>> Works for me as well!
> >>
> >> Hi, so I tried looking at this. First I just moved the code to slab as seen
> >> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> >> not a show-stopper here.
> >>
> >> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> >> to control whether we use the full blown batching implementation or the
> >> simple call_rcu() implmentation, and realized it's not straightforward and
> >> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> >> internals :)
> >>
> >> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> >>
> >> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> >> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> >> there are other facilities the batching implementation needs that only
> >> exists in the TREE_RCU implementation
> >>
> >> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> >> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> >> memory systems are fine with the simple implementation.
> >>
> >> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> >>
> >> AFAICS I can't just make the simple implementation do call_rcu() on
> >> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> >> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> >> but no such equivalent exists in TREE_RCU. Am I right?
> >>
> >> Possible solution: teach TREE_RCU callback invocation to handle
> >> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> >> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> >> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> >> but RCU has to special case it still.
> >>
> >> Possible solution 2: instead of the special offset handling, SLUB provides a
> >> callback function, which will determine pointer to the object from the
> >> pointer to a middle of it without knowing the rcu_head offset.
> >> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> >> performant anyway so we might not care.
> >> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> >>
> >> Thoughts?
> >>
> > For the call_rcu() and to be able to reclaim over it we need to patch the
> > tree.c(please note TINY already works):
> > 
> > <snip>
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index b1f883fcd918..ab24229dfa73 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >                 debug_rcu_head_unqueue(rhp);
> > 
> >                 rcu_lock_acquire(&rcu_callback_map);
> > -               trace_rcu_invoke_callback(rcu_state.name, rhp);
> > 
> >                 f = rhp->func;
> > -               debug_rcu_head_callback(rhp);
> > -               WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > -               f(rhp);
> > 
> > +               if (__is_kvfree_rcu_offset((unsigned long) f)) {
> > +                       trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> > +                       kvfree((void *) rhp - (unsigned long) f);
> > +               } else {
> > +                       trace_rcu_invoke_callback(rcu_state.name, rhp);
> > +                       debug_rcu_head_callback(rhp);
> > +                       WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > +                       f(rhp);
> > +               }
> >                 rcu_lock_release(&rcu_callback_map);
> 
> Right so that's the first Possible solution, but without the #ifdef. So
> there's an overhead of checking __is_kvfree_rcu_offset() even if the
> batching is done in slab and this function is never called with an offset.
>
Or fulfilling a missing functionality? TREE is broken in that sense
whereas a TINY handles it without any issues. 

It can be called for SLUB_TINY option, just call_rcu() instead of
batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().

>
> After coming up with Possible solution 2, I've started liking the idea
> more as RCU could then forget about the __is_kvfree_rcu_offset()
> "callbacks" completely, and the performant case of TREE_RCU + batching
> would be unaffected.
> 
I doubt it is a performance issue :)

>
> I'm speculating perhaps if there was not CONFIG_SLOB in the past, the
> __is_kvfree_rcu_offset() would never exist in the first place? SLAB and
> SLUB both can determine start of the object from a pointer to the middle
> of it, while SLOB couldn't.
> 
We needed just to reclaim over RCU. So, i do not know. Paul probably
knows more then me :)

> >                 /*
> > <snip>
> > 
> > Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
> > should be avoided, i.e. if we can, we should eliminate a dependency on
> > TREE_RCU or TINY_RCU in a slab. As much as possible.
> > 
> > So, it requires a more closer look for sure :)
> 
> That requires solving Problem 1 above, but question is if it's worth the
> trouble. Systems running TINY_RCU are unlikely to benefit from the batching?
> 
> But sure there's also possibility to hide these dependencies in KConfig,
> so the slab code would only consider a single (for example) #ifdef
> CONFIG_KVFREE_RCU_BATCHING that would be set automatically depending on
> TREE_RCU and !SLUB_TINY.
> 
It is for small systems. We can use TINY or !SMP. We covered this AFAIR
that a single CPU system should not go with batching:

#ifdef SLUB_TINY || !SMP || TINY_RCU

or:

config TINY_RCU
	bool
	default y if !PREEMPT_RCU && !SMP
+	select SLUB_TINY


Paul, more input?

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-21 14:14                     ` Uladzislau Rezki
@ 2025-01-21 20:32                       ` Paul E. McKenney
  2025-01-22 15:04                         ` Joel Fernandes
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2025-01-21 20:32 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Vlastimil Babka, linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
> On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> > On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> > > On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> > >> On 12/16/24 17:46, Paul E. McKenney wrote:
> > >>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> > >>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> > >>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
> > >>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> > >>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
> > >>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> > >>>>>>>>
> > >>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
> > >>>>>>>>> kvfree_call_rcu()?
> > >>>>>>>>>
> > >>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
> > >>>>>>>> implementation.
> > >>>>>>>
> > >>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> > >>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
> > >>>>>>>
> > >>>>>> Yes, it is for a really small systems with low amount of memory. I see
> > >>>>>> only one overhead it is about driving objects in pages. For a small
> > >>>>>> system it can be critical because we allocate.
> > >>>>>>
> > >>>>>> From the other hand, for a tiny variant we can modify the normal variant
> > >>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
> > >>>>>> i.e. merge it to a normal kvfree_rcu() path.
> > >>>>>
> > >>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> > >>>>> case (less memory usage on low memory system, tradeoff for worse performance).
> > >>>>>
> > >>>> Yep, i also was thinking about that without saying it :)
> > >>>
> > >>> Works for me as well!
> > >>
> > >> Hi, so I tried looking at this. First I just moved the code to slab as seen
> > >> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> > >> not a show-stopper here.
> > >>
> > >> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> > >> to control whether we use the full blown batching implementation or the
> > >> simple call_rcu() implmentation, and realized it's not straightforward and
> > >> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> > >> internals :)
> > >>
> > >> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> > >>
> > >> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> > >> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> > >> there are other facilities the batching implementation needs that only
> > >> exists in the TREE_RCU implementation
> > >>
> > >> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> > >> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> > >> memory systems are fine with the simple implementation.
> > >>
> > >> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> > >>
> > >> AFAICS I can't just make the simple implementation do call_rcu() on
> > >> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> > >> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> > >> but no such equivalent exists in TREE_RCU. Am I right?
> > >>
> > >> Possible solution: teach TREE_RCU callback invocation to handle
> > >> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> > >> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> > >> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> > >> but RCU has to special case it still.
> > >>
> > >> Possible solution 2: instead of the special offset handling, SLUB provides a
> > >> callback function, which will determine pointer to the object from the
> > >> pointer to a middle of it without knowing the rcu_head offset.
> > >> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> > >> performant anyway so we might not care.
> > >> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> > >>
> > >> Thoughts?
> > >>
> > > For the call_rcu() and to be able to reclaim over it we need to patch the
> > > tree.c(please note TINY already works):
> > > 
> > > <snip>
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b1f883fcd918..ab24229dfa73 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > >                 debug_rcu_head_unqueue(rhp);
> > > 
> > >                 rcu_lock_acquire(&rcu_callback_map);
> > > -               trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > 
> > >                 f = rhp->func;
> > > -               debug_rcu_head_callback(rhp);
> > > -               WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > -               f(rhp);
> > > 
> > > +               if (__is_kvfree_rcu_offset((unsigned long) f)) {
> > > +                       trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> > > +                       kvfree((void *) rhp - (unsigned long) f);
> > > +               } else {
> > > +                       trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > +                       debug_rcu_head_callback(rhp);
> > > +                       WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > +                       f(rhp);
> > > +               }
> > >                 rcu_lock_release(&rcu_callback_map);
> > 
> > Right so that's the first Possible solution, but without the #ifdef. So
> > there's an overhead of checking __is_kvfree_rcu_offset() even if the
> > batching is done in slab and this function is never called with an offset.
> >
> Or fulfilling a missing functionality? TREE is broken in that sense
> whereas a TINY handles it without any issues. 
> 
> It can be called for SLUB_TINY option, just call_rcu() instead of
> batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().

Would this make sense?

		if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {

Just to be repetitive, other alternatives include:

1.	Take advantage of SLOB being no longer with us.

2.	Get rid of Tiny RCU's special casing of kfree_rcu(), and then
	eliminate the above "if" statement in favor of its "else" clause.

3.	Make Tiny RCU implement a trivial version of kfree_rcu() that
	passes a list through RCU.

I don't have strong feelings, and am happy to defer to your guys'
decision.

> > After coming up with Possible solution 2, I've started liking the idea
> > more as RCU could then forget about the __is_kvfree_rcu_offset()
> > "callbacks" completely, and the performant case of TREE_RCU + batching
> > would be unaffected.
> > 
> I doubt it is a performance issue :)

Me neither, especially with IS_ENABLED().

> > I'm speculating perhaps if there was not CONFIG_SLOB in the past, the
> > __is_kvfree_rcu_offset() would never exist in the first place? SLAB and
> > SLUB both can determine start of the object from a pointer to the middle
> > of it, while SLOB couldn't.
> > 
> We needed just to reclaim over RCU. So, i do not know. Paul probably
> knows more then me :)

In the absence of SLOB, yes, I would hope that I would have thought of
determining the start of the object from a pointer to the middle of it.
Or that someone would have pointed that out during review.  But I honestly
do not remember.  ;-)

> > >                 /*
> > > <snip>
> > > 
> > > Mixing up CONFIG_SLUB_TINY with CONFIG_TINY_RCU in the slab_common.c
> > > should be avoided, i.e. if we can, we should eliminate a dependency on
> > > TREE_RCU or TINY_RCU in a slab. As much as possible.
> > > 
> > > So, it requires a more closer look for sure :)
> > 
> > That requires solving Problem 1 above, but question is if it's worth the
> > trouble. Systems running TINY_RCU are unlikely to benefit from the batching?
> > 
> > But sure there's also possibility to hide these dependencies in KConfig,
> > so the slab code would only consider a single (for example) #ifdef
> > CONFIG_KVFREE_RCU_BATCHING that would be set automatically depending on
> > TREE_RCU and !SLUB_TINY.
> > 
> It is for small systems. We can use TINY or !SMP. We covered this AFAIR
> that a single CPU system should not go with batching:
> 
> #ifdef SLUB_TINY || !SMP || TINY_RCU
> 
> or:
> 
> config TINY_RCU
> 	bool
> 	default y if !PREEMPT_RCU && !SMP
> +	select SLUB_TINY
> 
> 
> Paul, more input?

I will say that Tiny RCU used to get much more focus from its users
10-15 years ago than it does now.  So one approach is to implement
the simplest option, and add any needed complexity back in when and if
people complain.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-21 20:32                       ` Paul E. McKenney
@ 2025-01-22 15:04                         ` Joel Fernandes
  2025-01-22 16:43                           ` Vlastimil Babka
  0 siblings, 1 reply; 31+ messages in thread
From: Joel Fernandes @ 2025-01-22 15:04 UTC (permalink / raw)
  To: paulmck
  Cc: Uladzislau Rezki, Vlastimil Babka, linux-mm, Andrew Morton, RCU,
	LKML, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Tue, Jan 21, 2025 at 3:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
> > On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> > > On 1/21/25 2:33 PM, Uladzislau Rezki wrote:
> > > > On Mon, Jan 20, 2025 at 11:06:13PM +0100, Vlastimil Babka wrote:
> > > >> On 12/16/24 17:46, Paul E. McKenney wrote:
> > > >>> On Mon, Dec 16, 2024 at 04:55:06PM +0100, Uladzislau Rezki wrote:
> > > >>>> On Mon, Dec 16, 2024 at 04:44:41PM +0100, Vlastimil Babka wrote:
> > > >>>>> On 12/16/24 16:41, Uladzislau Rezki wrote:
> > > >>>>>> On Mon, Dec 16, 2024 at 03:20:44PM +0100, Vlastimil Babka wrote:
> > > >>>>>>> On 12/16/24 12:03, Uladzislau Rezki wrote:
> > > >>>>>>>> On Sun, Dec 15, 2024 at 06:30:02PM +0100, Vlastimil Babka wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Also how about a followup patch moving the rcu-tiny implementation of
> > > >>>>>>>>> kvfree_call_rcu()?
> > > >>>>>>>>>
> > > >>>>>>>> As, Paul already noted, it would make sense. Or just remove a tiny
> > > >>>>>>>> implementation.
> > > >>>>>>>
> > > >>>>>>> AFAICS tiny rcu is for !SMP systems. Do they benefit from the "full"
> > > >>>>>>> implementation with all the batching etc or would that be unnecessary overhead?
> > > >>>>>>>
> > > >>>>>> Yes, it is for a really small systems with low amount of memory. I see
> > > >>>>>> only one overhead it is about driving objects in pages. For a small
> > > >>>>>> system it can be critical because we allocate.
> > > >>>>>>
> > > >>>>>> From the other hand, for a tiny variant we can modify the normal variant
> > > >>>>>> by bypassing batching logic, thus do not consume memory(for Tiny case)
> > > >>>>>> i.e. merge it to a normal kvfree_rcu() path.
> > > >>>>>
> > > >>>>> Maybe we could change it to use CONFIG_SLUB_TINY as that has similar use
> > > >>>>> case (less memory usage on low memory system, tradeoff for worse performance).
> > > >>>>>
> > > >>>> Yep, i also was thinking about that without saying it :)
> > > >>>
> > > >>> Works for me as well!
> > > >>
> > > >> Hi, so I tried looking at this. First I just moved the code to slab as seen
> > > >> in the top-most commit here [1]. Hope the non-inlined __kvfree_call_rcu() is
> > > >> not a show-stopper here.
> > > >>
> > > >> Then I wanted to switch the #ifdefs from CONFIG_TINY_RCU to CONFIG_SLUB_TINY
> > > >> to control whether we use the full blown batching implementation or the
> > > >> simple call_rcu() implmentation, and realized it's not straightforward and
> > > >> reveals there are still some subtle dependencies of kvfree_rcu() on RCU
> > > >> internals :)
> > > >>
> > > >> Problem 1: !CONFIG_SLUB_TINY with CONFIG_TINY_RCU
> > > >>
> > > >> AFAICS the batching implementation includes kfree_rcu_scheduler_running()
> > > >> which is called from rcu_set_runtime_mode() but only on TREE_RCU. Perhaps
> > > >> there are other facilities the batching implementation needs that only
> > > >> exists in the TREE_RCU implementation
> > > >>
> > > >> Possible solution: batching implementation depends on both !CONFIG_SLUB_TINY
> > > >> and !CONFIG_TINY_RCU. I think it makes sense as both !SMP systems and small
> > > >> memory systems are fine with the simple implementation.
> > > >>
> > > >> Problem 2: CONFIG_TREE_RCU with !CONFIG_SLUB_TINY
> > > >>
> > > >> AFAICS I can't just make the simple implementation do call_rcu() on
> > > >> CONFIG_TREE_RCU, because call_rcu() no longer knows how to handle the fake
> > > >> callback (__is_kvfree_rcu_offset()) - I see how rcu_reclaim_tiny() does that
> > > >> but no such equivalent exists in TREE_RCU. Am I right?
> > > >>
> > > >> Possible solution: teach TREE_RCU callback invocation to handle
> > > >> __is_kvfree_rcu_offset() again, perhaps hide that branch behind #ifndef
> > > >> CONFIG_SLUB_TINY to avoid overhead if the batching implementation is used.
> > > >> Downside: we visibly demonstrate how kvfree_rcu() is not purely a slab thing
> > > >> but RCU has to special case it still.
> > > >>
> > > >> Possible solution 2: instead of the special offset handling, SLUB provides a
> > > >> callback function, which will determine pointer to the object from the
> > > >> pointer to a middle of it without knowing the rcu_head offset.
> > > >> Downside: this will have some overhead, but SLUB_TINY is not meant to be
> > > >> performant anyway so we might not care.
> > > >> Upside: we can remove __is_kvfree_rcu_offset() from TINY_RCU as well
> > > >>
> > > >> Thoughts?
> > > >>
> > > > For the call_rcu() and to be able to reclaim over it we need to patch the
> > > > tree.c(please note TINY already works):
> > > >
> > > > <snip>
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b1f883fcd918..ab24229dfa73 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -2559,13 +2559,19 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > > >                 debug_rcu_head_unqueue(rhp);
> > > >
> > > >                 rcu_lock_acquire(&rcu_callback_map);
> > > > -               trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > >
> > > >                 f = rhp->func;
> > > > -               debug_rcu_head_callback(rhp);
> > > > -               WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > > -               f(rhp);
> > > >
> > > > +               if (__is_kvfree_rcu_offset((unsigned long) f)) {
> > > > +                       trace_rcu_invoke_kvfree_callback("", rhp, (unsigned long) f);
> > > > +                       kvfree((void *) rhp - (unsigned long) f);
> > > > +               } else {
> > > > +                       trace_rcu_invoke_callback(rcu_state.name, rhp);
> > > > +                       debug_rcu_head_callback(rhp);
> > > > +                       WRITE_ONCE(rhp->func, (rcu_callback_t)0L);
> > > > +                       f(rhp);
> > > > +               }
> > > >                 rcu_lock_release(&rcu_callback_map);
> > >
> > > Right so that's the first Possible solution, but without the #ifdef. So
> > > there's an overhead of checking __is_kvfree_rcu_offset() even if the
> > > batching is done in slab and this function is never called with an offset.
> > >
> > Or fulfilling a missing functionality? TREE is broken in that sense
> > whereas a TINY handles it without any issues.
> >
> > It can be called for SLUB_TINY option, just call_rcu() instead of
> > batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
>
> Would this make sense?
>
>                 if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
>
> Just to be repetitive, other alternatives include:
>
> 1.      Take advantage of SLOB being no longer with us.
>
> 2.      Get rid of Tiny RCU's special casing of kfree_rcu(), and then
>         eliminate the above "if" statement in favor of its "else" clause.
>
> 3.      Make Tiny RCU implement a trivial version of kfree_rcu() that
>         passes a list through RCU.
>
> I don't have strong feelings, and am happy to defer to your guys'
> decision.

If I may chime in with an opinion, I think the cleanest approach would
be to not special-case the func pointer and instead provide a callback
from the SLAB layer which does the kfree. Then get rid of
__is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
the overhead of function calling, but I highly doubt that it is going
to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
is a kfree slow-path.  I feel in the long run, this will also be more
maintainable.

Or is there a reason other than the theoretical function call overhead
why this may not work?

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-22 15:04                         ` Joel Fernandes
@ 2025-01-22 16:43                           ` Vlastimil Babka
  2025-01-22 16:47                             ` Joel Fernandes
  2025-01-22 17:42                             ` Uladzislau Rezki
  0 siblings, 2 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-01-22 16:43 UTC (permalink / raw)
  To: Joel Fernandes, paulmck
  Cc: Uladzislau Rezki, linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On 1/22/25 16:04, Joel Fernandes wrote:
> On Tue, Jan 21, 2025 at 3:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>>
>> On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
>> > On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
>> > > Right so that's the first Possible solution, but without the #ifdef. So
>> > > there's an overhead of checking __is_kvfree_rcu_offset() even if the
>> > > batching is done in slab and this function is never called with an offset.
>> > >
>> > Or fulfilling a missing functionality? TREE is broken in that sense
>> > whereas a TINY handles it without any issues.
>> >
>> > It can be called for SLUB_TINY option, just call_rcu() instead of
>> > batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
>>
>> Would this make sense?
>>
>>                 if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
>>
>> Just to be repetitive, other alternatives include:
>>
>> 1.      Take advantage of SLOB being no longer with us.
>>
>> 2.      Get rid of Tiny RCU's special casing of kfree_rcu(), and then
>>         eliminate the above "if" statement in favor of its "else" clause.
>>
>> 3.      Make Tiny RCU implement a trivial version of kfree_rcu() that
>>         passes a list through RCU.
>>
>> I don't have strong feelings, and am happy to defer to your guys'
>> decision.
> 
> If I may chime in with an opinion, I think the cleanest approach would
> be to not special-case the func pointer and instead provide a callback
> from the SLAB layer which does the kfree. Then get rid of

Right.

> __is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
> the overhead of function calling, but I highly doubt that it is going
> to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
> is a kfree slow-path.  I feel in the long run, this will also be more
> maintainable.
> 
> Or is there a reason other than the theoretical function call overhead
> why this may not work?

My concern was about the overhead of calculating the pointer to the object
starting address, but it's just some arithmetics, so it should be
negligible. So I'm prototyping this approach now. Thanks all.

> thanks,
> 
>  - Joel



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-22 16:43                           ` Vlastimil Babka
@ 2025-01-22 16:47                             ` Joel Fernandes
  2025-01-22 17:42                             ` Uladzislau Rezki
  1 sibling, 0 replies; 31+ messages in thread
From: Joel Fernandes @ 2025-01-22 16:47 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: paulmck, Uladzislau Rezki, linux-mm, Andrew Morton, RCU, LKML,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Roman Gushchin, Hyeonggon Yoo, Oleksiy Avramchenko

On Wed, Jan 22, 2025 at 11:43 AM Vlastimil Babka <vbabka@suse.cz> wrote:
[...]
> > __is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
> > the overhead of function calling, but I highly doubt that it is going
> > to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
> > is a kfree slow-path.  I feel in the long run, this will also be more
> > maintainable.
> >
> > Or is there a reason other than the theoretical function call overhead
> > why this may not work?
>
> My concern was about the overhead of calculating the pointer to the object
> starting address, but it's just some arithmetics, so it should be
> negligible. So I'm prototyping this approach now. Thanks all.

Ah, that's a valid point. Looking forward to reviewing the patch and
hope it works out!

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2)
  2025-01-22 16:43                           ` Vlastimil Babka
  2025-01-22 16:47                             ` Joel Fernandes
@ 2025-01-22 17:42                             ` Uladzislau Rezki
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2025-01-22 17:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Joel Fernandes, paulmck, Uladzislau Rezki, linux-mm,
	Andrew Morton, RCU, LKML, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	Oleksiy Avramchenko

On Wed, Jan 22, 2025 at 05:43:06PM +0100, Vlastimil Babka wrote:
> On 1/22/25 16:04, Joel Fernandes wrote:
> > On Tue, Jan 21, 2025 at 3:32 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>
> >> On Tue, Jan 21, 2025 at 03:14:16PM +0100, Uladzislau Rezki wrote:
> >> > On Tue, Jan 21, 2025 at 02:49:13PM +0100, Vlastimil Babka wrote:
> >> > > Right so that's the first Possible solution, but without the #ifdef. So
> >> > > there's an overhead of checking __is_kvfree_rcu_offset() even if the
> >> > > batching is done in slab and this function is never called with an offset.
> >> > >
> >> > Or fulfilling a missing functionality? TREE is broken in that sense
> >> > whereas a TINY handles it without any issues.
> >> >
> >> > It can be called for SLUB_TINY option, just call_rcu() instead of
> >> > batching layer. And yes, kvfree_rcu_barrier() switches to rcu_barrier().
> >>
> >> Would this make sense?
> >>
> >>                 if (IS_ENABLED(CONFIG_TINY_RCU) && __is_kvfree_rcu_offset((unsigned long) f)) {
> >>
> >> Just to be repetitive, other alternatives include:
> >>
> >> 1.      Take advantage of SLOB being no longer with us.
> >>
> >> 2.      Get rid of Tiny RCU's special casing of kfree_rcu(), and then
> >>         eliminate the above "if" statement in favor of its "else" clause.
> >>
> >> 3.      Make Tiny RCU implement a trivial version of kfree_rcu() that
> >>         passes a list through RCU.
> >>
> >> I don't have strong feelings, and am happy to defer to your guys'
> >> decision.
> > 
> > If I may chime in with an opinion, I think the cleanest approach would
> > be to not special-case the func pointer and instead provide a callback
> > from the SLAB layer which does the kfree. Then get rid of
> 
> Right.
> 
> > __is_kvfree_rcu_offset() and its usage from Tiny. Granted, there is
> > the overhead of function calling, but I highly doubt that it is going
> > to be a bottleneck, considering that the __is_kvfree_rcu_offset() path
> > is a kfree slow-path.  I feel in the long run, this will also be more
> > maintainable.
> > 
> > Or is there a reason other than the theoretical function call overhead
> > why this may not work?
> 
> My concern was about the overhead of calculating the pointer to the object
> starting address, but it's just some arithmetics, so it should be
> negligible. So I'm prototyping this approach now. Thanks all.
> 
You are welcome :)

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-01-22 17:42 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-12 18:02 [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Uladzislau Rezki (Sony)
2024-12-12 18:02 ` [PATCH v2 1/5] rcu/kvfree: Initialize kvfree_rcu() separately Uladzislau Rezki (Sony)
2024-12-12 18:02 ` [PATCH v2 2/5] rcu/kvfree: Move some functions under CONFIG_TINY_RCU Uladzislau Rezki (Sony)
2024-12-12 18:02 ` [PATCH v2 3/5] rcu/kvfree: Adjust names passed into trace functions Uladzislau Rezki (Sony)
2024-12-12 18:02 ` [PATCH v2 4/5] rcu/kvfree: Adjust a shrinker name Uladzislau Rezki (Sony)
2024-12-12 18:02 ` [PATCH v2 5/5] mm/slab: Move kvfree_rcu() into SLAB Uladzislau Rezki (Sony)
2024-12-12 18:30 ` [PATCH v2 0/5] Move kvfree_rcu() into SLAB (v2) Christoph Lameter (Ampere)
2024-12-12 19:08   ` Uladzislau Rezki
2024-12-12 19:10   ` Paul E. McKenney
2024-12-12 19:13     ` Uladzislau Rezki
2024-12-15 17:30 ` Vlastimil Babka
2024-12-15 18:21   ` Paul E. McKenney
2024-12-16 11:03   ` Uladzislau Rezki
2024-12-16 14:20     ` Vlastimil Babka
2024-12-16 15:41       ` Uladzislau Rezki
2024-12-16 15:44         ` Vlastimil Babka
2024-12-16 15:55           ` Uladzislau Rezki
2024-12-16 16:46             ` Paul E. McKenney
2025-01-20 22:06               ` Vlastimil Babka
2025-01-21 13:33                 ` Uladzislau Rezki
2025-01-21 13:49                   ` Vlastimil Babka
2025-01-21 14:14                     ` Uladzislau Rezki
2025-01-21 20:32                       ` Paul E. McKenney
2025-01-22 15:04                         ` Joel Fernandes
2025-01-22 16:43                           ` Vlastimil Babka
2025-01-22 16:47                             ` Joel Fernandes
2025-01-22 17:42                             ` Uladzislau Rezki
2024-12-16 13:07   ` Uladzislau Rezki
2025-01-11 19:40     ` Vlastimil Babka
2025-01-06  7:21 ` [External Mail] " Hyeonggon Yoo
2025-01-11 19:43   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox