linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr
@ 2023-10-07 13:51 Hou Tao
  2023-10-07 13:51 ` [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu() Hou Tao
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-07 13:51 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

From: Hou Tao <houtao1@huawei.com>

Hi,

The patchset aims to fix the problems found in the review of per-cpu
kptr patch-set [0]. Patch #1 introduces alloc_size_percpu() for dynamic
per-cpu area. Patch #2 and #3 use alloc_size_percpu() to check
whether or not unit_size matches with the size of underlying per-cpu
area and to select a matching bpf_mem_cache. Patch #4 fixes the freeing
of per-cpu kptr when these kptr is freed by map destruction. The last
patch adds test cases for these problems.

Please see individual patches for details. And comments are always
welcome.

[0]: https://lore.kernel.org/bpf/20230827152729.1995219-1-yonghong.song@linux.dev

Hou Tao (6):
  mm/percpu.c: introduce alloc_size_percpu()
  bpf: Re-enable unit_size checking for global per-cpu allocator
  bpf: Use alloc_size_percpu() in bpf_mem_free{_rcu}()
  bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h
  bpf: Use bpf_global_percpu_ma for per-cpu kptr in
    __bpf_obj_drop_impl()
  selftests/bpf: Add more test cases for bpf memory allocator

 include/linux/bpf_mem_alloc.h                 |   1 +
 include/linux/percpu.h                        |   1 +
 kernel/bpf/helpers.c                          |  25 ++-
 kernel/bpf/internal.h                         |  11 ++
 kernel/bpf/memalloc.c                         |  41 ++--
 kernel/bpf/syscall.c                          |   8 +-
 mm/percpu.c                                   |  29 +++
 .../selftests/bpf/prog_tests/test_bpf_ma.c    |  20 +-
 .../testing/selftests/bpf/progs/test_bpf_ma.c | 180 +++++++++++++++++-
 9 files changed, 282 insertions(+), 34 deletions(-)
 create mode 100644 kernel/bpf/internal.h

-- 
2.29.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu()
  2023-10-07 13:51 [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr Hou Tao
@ 2023-10-07 13:51 ` Hou Tao
  2023-10-07 14:04   ` Andrew Morton
  2023-10-08 22:32   ` Dennis Zhou
  2023-10-07 13:51 ` [PATCH bpf-next 2/6] bpf: Re-enable unit_size checking for global per-cpu allocator Hou Tao
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-07 13:51 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

From: Hou Tao <houtao1@huawei.com>

Introduce alloc_size_percpu() to get the size of the dynamic per-cpu
area. It will be used by bpf memory allocator in the following patches.
BPF memory allocator maintains multiple per-cpu area caches for multiple
area sizes and it needs the size of dynamic per-cpu area to select the
corresponding cache when bpf program frees the dynamic per-cpu area.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/percpu.h |  1 +
 mm/percpu.c            | 29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 68fac2e7cbe6..d140d9d79567 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -132,6 +132,7 @@ extern void __init setup_per_cpu_areas(void);
 extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1);
 extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1);
 extern void free_percpu(void __percpu *__pdata);
+extern size_t alloc_size_percpu(void __percpu *__pdata);
 
 DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
 
diff --git a/mm/percpu.c b/mm/percpu.c
index 7b40b3963f10..f541cfc3cb2d 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -2244,6 +2244,35 @@ static void pcpu_balance_workfn(struct work_struct *work)
 	mutex_unlock(&pcpu_alloc_mutex);
 }
 
+/**
+ * alloc_size_percpu - the size of the dynamic percpu area
+ * @ptr: pointer to the dynamic percpu area
+ *
+ * Return the size of the dynamic percpu area @ptr.
+ *
+ * RETURNS:
+ * The size of the dynamic percpu area.
+ *
+ * CONTEXT:
+ * Can be called from atomic context.
+ */
+size_t alloc_size_percpu(void __percpu *ptr)
+{
+	struct pcpu_chunk *chunk;
+	int bit_off, end;
+	void *addr;
+
+	if (!ptr)
+		return 0;
+
+	addr = __pcpu_ptr_to_addr(ptr);
+	/* No pcpu_lock here: ptr has not been freed, so chunk is still alive */
+	chunk = pcpu_chunk_addr_search(addr);
+	bit_off = (addr - chunk->base_addr) / PCPU_MIN_ALLOC_SIZE;
+	end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk), bit_off + 1);
+	return (end - bit_off) * PCPU_MIN_ALLOC_SIZE;
+}
+
 /**
  * free_percpu - free percpu area
  * @ptr: pointer to area to free
-- 
2.29.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next 2/6] bpf: Re-enable unit_size checking for global per-cpu allocator
  2023-10-07 13:51 [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr Hou Tao
  2023-10-07 13:51 ` [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu() Hou Tao
@ 2023-10-07 13:51 ` Hou Tao
  2023-10-09 16:51   ` Alexei Starovoitov
  2023-10-07 13:51 ` [PATCH bpf-next 3/6] bpf: Use alloc_size_percpu() in bpf_mem_free{_rcu}() Hou Tao
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Hou Tao @ 2023-10-07 13:51 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

From: Hou Tao <houtao1@huawei.com>

With alloc_size_percpu() in place, check whether or not the size of
the dynamic per-cpu area is matched with unit_size.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/memalloc.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index 6cf61ea55c27..af9ff0755959 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -497,21 +497,17 @@ static int check_obj_size(struct bpf_mem_cache *c, unsigned int idx)
 	struct llist_node *first;
 	unsigned int obj_size;
 
-	/* For per-cpu allocator, the size of free objects in free list doesn't
-	 * match with unit_size and now there is no way to get the size of
-	 * per-cpu pointer saved in free object, so just skip the checking.
-	 */
-	if (c->percpu_size)
-		return 0;
-
 	first = c->free_llist.first;
 	if (!first)
 		return 0;
 
-	obj_size = ksize(first);
+	if (c->percpu_size)
+		obj_size = alloc_size_percpu(((void **)first)[1]);
+	else
+		obj_size = ksize(first);
 	if (obj_size != c->unit_size) {
-		WARN_ONCE(1, "bpf_mem_cache[%u]: unexpected object size %u, expect %u\n",
-			  idx, obj_size, c->unit_size);
+		WARN_ONCE(1, "bpf_mem_cache[%u]: percpu %d, unexpected object size %u, expect %u\n",
+			  idx, c->percpu_size, obj_size, c->unit_size);
 		return -EINVAL;
 	}
 	return 0;
@@ -979,7 +975,14 @@ void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags)
 	return !ret ? NULL : ret + LLIST_NODE_SZ;
 }
 
-/* Most of the logic is taken from setup_kmalloc_cache_index_table() */
+/* The alignment of dynamic per-cpu area is 8 and c->unit_size and the
+ * actual size of dynamic per-cpu area will always be matched, so there is
+ * no need to adjust size_index for per-cpu allocation. However for the
+ * simplicity of the implementation, use an unified size_index for both
+ * kmalloc and per-cpu allocation.
+ *
+ * Most of the logic is taken from setup_kmalloc_cache_index_table().
+ */
 static __init int bpf_mem_cache_adjust_size(void)
 {
 	unsigned int size, index;
-- 
2.29.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next 3/6] bpf: Use alloc_size_percpu() in bpf_mem_free{_rcu}()
  2023-10-07 13:51 [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr Hou Tao
  2023-10-07 13:51 ` [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu() Hou Tao
  2023-10-07 13:51 ` [PATCH bpf-next 2/6] bpf: Re-enable unit_size checking for global per-cpu allocator Hou Tao
@ 2023-10-07 13:51 ` Hou Tao
  2023-10-07 13:51 ` [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h Hou Tao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-07 13:51 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

From: Hou Tao <houtao1@huawei.com>

For bpf_global_percpu_ma, the pointer passed to bpf_mem_free_rcu() is
allocated by kmalloc() and its size is fixed (16-bytes on x86-64). So
no matter which cache allocates the dynamic per-cpu area, cache[2] will
always be used to free the per-cpu area.

Fix the unbalance by checking whether the bpf memory allocator is
per-cpu or not and use alloc_size_percpu() instead of ksize() to
find the correct cache for per-cpu free.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 include/linux/bpf_mem_alloc.h |  1 +
 kernel/bpf/memalloc.c         | 16 ++++++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h
index d644bbb298af..bb1223b21308 100644
--- a/include/linux/bpf_mem_alloc.h
+++ b/include/linux/bpf_mem_alloc.h
@@ -11,6 +11,7 @@ struct bpf_mem_caches;
 struct bpf_mem_alloc {
 	struct bpf_mem_caches __percpu *caches;
 	struct bpf_mem_cache __percpu *cache;
+	bool percpu;
 	struct work_struct work;
 };
 
diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
index af9ff0755959..a17f650085ef 100644
--- a/kernel/bpf/memalloc.c
+++ b/kernel/bpf/memalloc.c
@@ -531,6 +531,7 @@ int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu)
 	/* room for llist_node and per-cpu pointer */
 	if (percpu)
 		percpu_size = LLIST_NODE_SZ + sizeof(void *);
+	ma->percpu = percpu;
 
 	if (size) {
 		pc = __alloc_percpu_gfp(sizeof(*pc), 8, GFP_KERNEL);
@@ -880,6 +881,17 @@ void notrace *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size)
 	return !ret ? NULL : ret + LLIST_NODE_SZ;
 }
 
+static notrace int bpf_mem_free_idx(void *ptr, bool percpu)
+{
+	size_t size;
+
+	if (percpu)
+		size = alloc_size_percpu(*((void **)ptr));
+	else
+		size = ksize(ptr - LLIST_NODE_SZ);
+	return bpf_mem_cache_idx(size);
+}
+
 void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr)
 {
 	int idx;
@@ -887,7 +899,7 @@ void notrace bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr)
 	if (!ptr)
 		return;
 
-	idx = bpf_mem_cache_idx(ksize(ptr - LLIST_NODE_SZ));
+	idx = bpf_mem_free_idx(ptr, ma->percpu);
 	if (idx < 0)
 		return;
 
@@ -901,7 +913,7 @@ void notrace bpf_mem_free_rcu(struct bpf_mem_alloc *ma, void *ptr)
 	if (!ptr)
 		return;
 
-	idx = bpf_mem_cache_idx(ksize(ptr - LLIST_NODE_SZ));
+	idx = bpf_mem_free_idx(ptr, ma->percpu);
 	if (idx < 0)
 		return;
 
-- 
2.29.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h
  2023-10-07 13:51 [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr Hou Tao
                   ` (2 preceding siblings ...)
  2023-10-07 13:51 ` [PATCH bpf-next 3/6] bpf: Use alloc_size_percpu() in bpf_mem_free{_rcu}() Hou Tao
@ 2023-10-07 13:51 ` Hou Tao
  2023-10-09 16:28   ` Stanislav Fomichev
  2023-10-09 16:56   ` Alexei Starovoitov
  2023-10-07 13:51 ` [PATCH bpf-next 5/6] bpf: Use bpf_global_percpu_ma for per-cpu kptr in __bpf_obj_drop_impl() Hou Tao
  2023-10-07 13:51 ` [PATCH bpf-next 6/6] selftests/bpf: Add more test cases for bpf memory allocator Hou Tao
  5 siblings, 2 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-07 13:51 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

From: Hou Tao <houtao1@huawei.com>

both syscall.c and helpers.c have the declaration of
__bpf_obj_drop_impl(), so just move it to a common header file.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/helpers.c  |  3 +--
 kernel/bpf/internal.h | 11 +++++++++++
 kernel/bpf/syscall.c  |  4 ++--
 3 files changed, 14 insertions(+), 4 deletions(-)
 create mode 100644 kernel/bpf/internal.h

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index dd1c69ee3375..07f49f8831c0 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -24,6 +24,7 @@
 #include <linux/bpf_mem_alloc.h>
 #include <linux/kasan.h>
 
+#include "internal.h"
 #include "../../lib/kstrtox.h"
 
 /* If kernel subsystem is allowing eBPF programs to call this function,
@@ -1808,8 +1809,6 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 	}
 }
 
-void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
-
 void bpf_list_head_free(const struct btf_field *field, void *list_head,
 			struct bpf_spin_lock *spin_lock)
 {
diff --git a/kernel/bpf/internal.h b/kernel/bpf/internal.h
new file mode 100644
index 000000000000..e233ea83eb0a
--- /dev/null
+++ b/kernel/bpf/internal.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023. Huawei Technologies Co., Ltd
+ */
+#ifndef __BPF_INTERNAL_H_
+#define __BPF_INTERNAL_H_
+
+struct btf_record;
+
+void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
+
+#endif /* __BPF_INTERNAL_H_ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6b5280f14a53..7de4b9f97c8f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -39,6 +39,8 @@
 
 #include <net/tcx.h>
 
+#include "internal.h"
+
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS)
@@ -626,8 +628,6 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
 	bpf_timer_cancel_and_free(obj + rec->timer_off);
 }
 
-extern void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
-
 void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 {
 	const struct btf_field *fields;
-- 
2.29.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next 5/6] bpf: Use bpf_global_percpu_ma for per-cpu kptr in __bpf_obj_drop_impl()
  2023-10-07 13:51 [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr Hou Tao
                   ` (3 preceding siblings ...)
  2023-10-07 13:51 ` [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h Hou Tao
@ 2023-10-07 13:51 ` Hou Tao
  2023-10-07 13:51 ` [PATCH bpf-next 6/6] selftests/bpf: Add more test cases for bpf memory allocator Hou Tao
  5 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-07 13:51 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

From: Hou Tao <houtao1@huawei.com>

The following warning was reported when running "./test_progs -t
test_bpf_ma/percpu_free_through_map_free":

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 68 at kernel/bpf/memalloc.c:342
  CPU: 1 PID: 68 Comm: kworker/u16:2 Not tainted 6.6.0-rc2+ #222
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  Workqueue: events_unbound bpf_map_free_deferred
  RIP: 0010:bpf_mem_refill+0x21c/0x2a0
  ......
  Call Trace:
   <IRQ>
   ? bpf_mem_refill+0x21c/0x2a0
   irq_work_single+0x27/0x70
   irq_work_run_list+0x2a/0x40
   irq_work_run+0x18/0x40
   __sysvec_irq_work+0x1c/0xc0
   sysvec_irq_work+0x73/0x90
   </IRQ>
   <TASK>
   asm_sysvec_irq_work+0x1b/0x20
  RIP: 0010:unit_free+0x50/0x80
   ......
   bpf_mem_free+0x46/0x60
   __bpf_obj_drop_impl+0x40/0x90
   bpf_obj_free_fields+0x17d/0x1a0
   array_map_free+0x6b/0x170
   bpf_map_free_deferred+0x54/0xa0
   process_scheduled_works+0xba/0x370
   worker_thread+0x16d/0x2e0
   kthread+0x105/0x140
   ret_from_fork+0x39/0x60
   ret_from_fork_asm+0x1b/0x30
   </TASK>
  ---[ end trace 0000000000000000 ]---

The reason is simple: __bpf_obj_drop_impl() does not know the freeing
field is a per-cpu pointer and it uses bpf_global_ma to free the
pointer. Because bpf_global_ma is not a per-cpu allocator, so ksize() is
used to select the corresponding cache. The bpf_mem_cache with 16-bytes
unit_size will always be selected to do the unmatched free and it will
trigger the warning in free_bulk() eventually.

Because per-cpu kptr doesn't support list or rb-tree now, so fix the
problem by only checking whether or not the type of kptr is per-cpu in
bpf_obj_free_fields(), and using bpf_global_percpu_ma to these kptrs.

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/helpers.c  | 22 ++++++++++++++--------
 kernel/bpf/internal.h |  2 +-
 kernel/bpf/syscall.c  |  4 ++--
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 07f49f8831c0..078217c921e8 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1840,7 +1840,7 @@ void bpf_list_head_free(const struct btf_field *field, void *list_head,
 		 * bpf_list_head which needs to be freed.
 		 */
 		migrate_disable();
-		__bpf_obj_drop_impl(obj, field->graph_root.value_rec);
+		__bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
 		migrate_enable();
 	}
 }
@@ -1879,7 +1879,7 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
 
 
 		migrate_disable();
-		__bpf_obj_drop_impl(obj, field->graph_root.value_rec);
+		__bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
 		migrate_enable();
 	}
 }
@@ -1911,8 +1911,10 @@ __bpf_kfunc void *bpf_percpu_obj_new_impl(u64 local_type_id__k, void *meta__ign)
 }
 
 /* Must be called under migrate_disable(), as required by bpf_mem_free */
-void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
+void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu)
 {
+	struct bpf_mem_alloc *ma;
+
 	if (rec && rec->refcount_off >= 0 &&
 	    !refcount_dec_and_test((refcount_t *)(p + rec->refcount_off))) {
 		/* Object is refcounted and refcount_dec didn't result in 0
@@ -1924,10 +1926,14 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
 	if (rec)
 		bpf_obj_free_fields(rec, p);
 
+	if (percpu)
+		ma = &bpf_global_percpu_ma;
+	else
+		ma = &bpf_global_ma;
 	if (rec && rec->refcount_off >= 0)
-		bpf_mem_free_rcu(&bpf_global_ma, p);
+		bpf_mem_free_rcu(ma, p);
 	else
-		bpf_mem_free(&bpf_global_ma, p);
+		bpf_mem_free(ma, p);
 }
 
 __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
@@ -1935,7 +1941,7 @@ __bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)
 	struct btf_struct_meta *meta = meta__ign;
 	void *p = p__alloc;
 
-	__bpf_obj_drop_impl(p, meta ? meta->record : NULL);
+	__bpf_obj_drop_impl(p, meta ? meta->record : NULL, false);
 }
 
 __bpf_kfunc void bpf_percpu_obj_drop_impl(void *p__alloc, void *meta__ign)
@@ -1979,7 +1985,7 @@ static int __bpf_list_add(struct bpf_list_node_kern *node,
 	 */
 	if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) {
 		/* Only called from BPF prog, no need to migrate_disable */
-		__bpf_obj_drop_impl((void *)n - off, rec);
+		__bpf_obj_drop_impl((void *)n - off, rec, false);
 		return -EINVAL;
 	}
 
@@ -2078,7 +2084,7 @@ static int __bpf_rbtree_add(struct bpf_rb_root *root,
 	 */
 	if (cmpxchg(&node->owner, NULL, BPF_PTR_POISON)) {
 		/* Only called from BPF prog, no need to migrate_disable */
-		__bpf_obj_drop_impl((void *)n - off, rec);
+		__bpf_obj_drop_impl((void *)n - off, rec, false);
 		return -EINVAL;
 	}
 
diff --git a/kernel/bpf/internal.h b/kernel/bpf/internal.h
index e233ea83eb0a..4c3cfdd6e6a2 100644
--- a/kernel/bpf/internal.h
+++ b/kernel/bpf/internal.h
@@ -6,6 +6,6 @@
 
 struct btf_record;
 
-void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
+void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu);
 
 #endif /* __BPF_INTERNAL_H_ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7de4b9f97c8f..8dfc5d39c91d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -662,8 +662,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
 									   field->kptr.btf_id);
 				migrate_disable();
 				__bpf_obj_drop_impl(xchgd_field, pointee_struct_meta ?
-								 pointee_struct_meta->record :
-								 NULL);
+								 pointee_struct_meta->record : NULL,
+								 fields[i].type == BPF_KPTR_PERCPU);
 				migrate_enable();
 			} else {
 				field->kptr.dtor(xchgd_field);
-- 
2.29.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next 6/6] selftests/bpf: Add more test cases for bpf memory allocator
  2023-10-07 13:51 [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr Hou Tao
                   ` (4 preceding siblings ...)
  2023-10-07 13:51 ` [PATCH bpf-next 5/6] bpf: Use bpf_global_percpu_ma for per-cpu kptr in __bpf_obj_drop_impl() Hou Tao
@ 2023-10-07 13:51 ` Hou Tao
  5 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-07 13:51 UTC (permalink / raw)
  To: bpf, linux-mm
  Cc: Martin KaFai Lau, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

From: Hou Tao <houtao1@huawei.com>

Add the following 3 test cases for bpf memory allocator:
1) Do allocation in bpf program and free through map free
2) Do batch per-cpu allocation and per-cpu free in bpf program
3) Do per-cpu allocation in bpf program and free through map free

Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 .../selftests/bpf/prog_tests/test_bpf_ma.c    |  20 +-
 .../testing/selftests/bpf/progs/test_bpf_ma.c | 180 +++++++++++++++++-
 2 files changed, 193 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_bpf_ma.c b/tools/testing/selftests/bpf/prog_tests/test_bpf_ma.c
index 0cca4e8ae38e..d3491a84b3b9 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_bpf_ma.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_bpf_ma.c
@@ -9,9 +9,10 @@
 
 #include "test_bpf_ma.skel.h"
 
-void test_test_bpf_ma(void)
+static void do_bpf_ma_test(const char *name)
 {
 	struct test_bpf_ma *skel;
+	struct bpf_program *prog;
 	struct btf *btf;
 	int i, err;
 
@@ -34,6 +35,11 @@ void test_test_bpf_ma(void)
 		skel->rodata->data_btf_ids[i] = id;
 	}
 
+	prog = bpf_object__find_program_by_name(skel->obj, name);
+	if (!ASSERT_OK_PTR(prog, "invalid prog name"))
+		goto out;
+	bpf_program__set_autoload(prog, true);
+
 	err = test_bpf_ma__load(skel);
 	if (!ASSERT_OK(err, "load"))
 		goto out;
@@ -48,3 +54,15 @@ void test_test_bpf_ma(void)
 out:
 	test_bpf_ma__destroy(skel);
 }
+
+void test_test_bpf_ma(void)
+{
+	if (test__start_subtest("batch_alloc_free"))
+		do_bpf_ma_test("test_batch_alloc_free");
+	if (test__start_subtest("free_through_map_free"))
+		do_bpf_ma_test("test_free_through_map_free");
+	if (test__start_subtest("batch_percpu_alloc_free"))
+		do_bpf_ma_test("test_batch_percpu_alloc_free");
+	if (test__start_subtest("percpu_free_through_map_free"))
+		do_bpf_ma_test("test_percpu_free_through_map_free");
+}
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_ma.c b/tools/testing/selftests/bpf/progs/test_bpf_ma.c
index ecde41ae0fc8..b685a4aba6bd 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_ma.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_ma.c
@@ -37,10 +37,20 @@ int pid = 0;
 		__type(key, int); \
 		__type(value, struct map_value_##_size); \
 		__uint(max_entries, 128); \
-	} array_##_size SEC(".maps");
+	} array_##_size SEC(".maps")
 
-static __always_inline void batch_alloc_free(struct bpf_map *map, unsigned int batch,
-					     unsigned int idx)
+#define DEFINE_ARRAY_WITH_PERCPU_KPTR(_size) \
+	struct map_value_percpu_##_size { \
+		struct bin_data_##_size __percpu_kptr * data; \
+	}; \
+	struct { \
+		__uint(type, BPF_MAP_TYPE_ARRAY); \
+		__type(key, int); \
+		__type(value, struct map_value_percpu_##_size); \
+		__uint(max_entries, 128); \
+	} array_percpu_##_size SEC(".maps")
+
+static __always_inline void batch_alloc(struct bpf_map *map, unsigned int batch, unsigned int idx)
 {
 	struct generic_map_value *value;
 	unsigned int i, key;
@@ -65,6 +75,14 @@ static __always_inline void batch_alloc_free(struct bpf_map *map, unsigned int b
 			return;
 		}
 	}
+}
+
+static __always_inline void batch_free(struct bpf_map *map, unsigned int batch, unsigned int idx)
+{
+	struct generic_map_value *value;
+	unsigned int i, key;
+	void *old;
+
 	for (i = 0; i < batch; i++) {
 		key = i;
 		value = bpf_map_lookup_elem(map, &key);
@@ -81,8 +99,72 @@ static __always_inline void batch_alloc_free(struct bpf_map *map, unsigned int b
 	}
 }
 
+static __always_inline void batch_percpu_alloc(struct bpf_map *map, unsigned int batch,
+					       unsigned int idx)
+{
+	struct generic_map_value *value;
+	unsigned int i, key;
+	void *old, *new;
+
+	for (i = 0; i < batch; i++) {
+		key = i;
+		value = bpf_map_lookup_elem(map, &key);
+		if (!value) {
+			err = 1;
+			return;
+		}
+		/* per-cpu allocator may not be able to refill in time */
+		new = bpf_percpu_obj_new_impl(data_btf_ids[idx], NULL);
+		if (!new)
+			continue;
+
+		old = bpf_kptr_xchg(&value->data, new);
+		if (old) {
+			bpf_percpu_obj_drop(old);
+			err = 2;
+			return;
+		}
+	}
+}
+
+static __always_inline void batch_percpu_free(struct bpf_map *map, unsigned int batch,
+					      unsigned int idx)
+{
+	struct generic_map_value *value;
+	unsigned int i, key;
+	void *old;
+
+	for (i = 0; i < batch; i++) {
+		key = i;
+		value = bpf_map_lookup_elem(map, &key);
+		if (!value) {
+			err = 3;
+			return;
+		}
+		old = bpf_kptr_xchg(&value->data, NULL);
+		if (!old)
+			continue;
+		bpf_percpu_obj_drop(old);
+	}
+}
+
+#define CALL_BATCH_ALLOC(size, batch, idx) \
+	batch_alloc((struct bpf_map *)(&array_##size), batch, idx)
+
 #define CALL_BATCH_ALLOC_FREE(size, batch, idx) \
-	batch_alloc_free((struct bpf_map *)(&array_##size), batch, idx)
+	do { \
+		batch_alloc((struct bpf_map *)(&array_##size), batch, idx); \
+		batch_free((struct bpf_map *)(&array_##size), batch, idx); \
+	} while (0)
+
+#define CALL_BATCH_PERCPU_ALLOC(size, batch, idx) \
+	batch_percpu_alloc((struct bpf_map *)(&array_percpu_##size), batch, idx)
+
+#define CALL_BATCH_PERCPU_ALLOC_FREE(size, batch, idx) \
+	do { \
+		batch_percpu_alloc((struct bpf_map *)(&array_percpu_##size), batch, idx); \
+		batch_percpu_free((struct bpf_map *)(&array_percpu_##size), batch, idx); \
+	} while (0)
 
 DEFINE_ARRAY_WITH_KPTR(8);
 DEFINE_ARRAY_WITH_KPTR(16);
@@ -97,8 +179,21 @@ DEFINE_ARRAY_WITH_KPTR(1024);
 DEFINE_ARRAY_WITH_KPTR(2048);
 DEFINE_ARRAY_WITH_KPTR(4096);
 
-SEC("fentry/" SYS_PREFIX "sys_nanosleep")
-int test_bpf_mem_alloc_free(void *ctx)
+/* per-cpu kptr doesn't support bin_data_8 which is a zero-sized array */
+DEFINE_ARRAY_WITH_PERCPU_KPTR(16);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(32);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(64);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(96);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(128);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(192);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(256);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(512);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(1024);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(2048);
+DEFINE_ARRAY_WITH_PERCPU_KPTR(4096);
+
+SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
+int test_batch_alloc_free(void *ctx)
 {
 	if ((u32)bpf_get_current_pid_tgid() != pid)
 		return 0;
@@ -121,3 +216,76 @@ int test_bpf_mem_alloc_free(void *ctx)
 
 	return 0;
 }
+
+SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
+int test_free_through_map_free(void *ctx)
+{
+	if ((u32)bpf_get_current_pid_tgid() != pid)
+		return 0;
+
+	/* Alloc 128 8-bytes objects in batch to trigger refilling,
+	 * then free these objects through map free.
+	 */
+	CALL_BATCH_ALLOC(8, 128, 0);
+	CALL_BATCH_ALLOC(16, 128, 1);
+	CALL_BATCH_ALLOC(32, 128, 2);
+	CALL_BATCH_ALLOC(64, 128, 3);
+	CALL_BATCH_ALLOC(96, 128, 4);
+	CALL_BATCH_ALLOC(128, 128, 5);
+	CALL_BATCH_ALLOC(192, 128, 6);
+	CALL_BATCH_ALLOC(256, 128, 7);
+	CALL_BATCH_ALLOC(512, 64, 8);
+	CALL_BATCH_ALLOC(1024, 32, 9);
+	CALL_BATCH_ALLOC(2048, 16, 10);
+	CALL_BATCH_ALLOC(4096, 8, 11);
+
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
+int test_batch_percpu_alloc_free(void *ctx)
+{
+	if ((u32)bpf_get_current_pid_tgid() != pid)
+		return 0;
+
+	/* Alloc 128 16-bytes per-cpu objects in batch to trigger refilling,
+	 * then free 128 16-bytes per-cpu objects in batch to trigger freeing.
+	 */
+	CALL_BATCH_PERCPU_ALLOC_FREE(16, 128, 1);
+	CALL_BATCH_PERCPU_ALLOC_FREE(32, 128, 2);
+	CALL_BATCH_PERCPU_ALLOC_FREE(64, 128, 3);
+	CALL_BATCH_PERCPU_ALLOC_FREE(96, 128, 4);
+	CALL_BATCH_PERCPU_ALLOC_FREE(128, 128, 5);
+	CALL_BATCH_PERCPU_ALLOC_FREE(192, 128, 6);
+	CALL_BATCH_PERCPU_ALLOC_FREE(256, 128, 7);
+	CALL_BATCH_PERCPU_ALLOC_FREE(512, 64, 8);
+	CALL_BATCH_PERCPU_ALLOC_FREE(1024, 32, 9);
+	CALL_BATCH_PERCPU_ALLOC_FREE(2048, 16, 10);
+	CALL_BATCH_PERCPU_ALLOC_FREE(4096, 8, 11);
+
+	return 0;
+}
+
+SEC("?fentry/" SYS_PREFIX "sys_nanosleep")
+int test_percpu_free_through_map_free(void *ctx)
+{
+	if ((u32)bpf_get_current_pid_tgid() != pid)
+		return 0;
+
+	/* Alloc 128 16-bytes per-cpu objects in batch to trigger refilling,
+	 * then free these object through map free.
+	 */
+	CALL_BATCH_PERCPU_ALLOC(16, 128, 1);
+	CALL_BATCH_PERCPU_ALLOC(32, 128, 2);
+	CALL_BATCH_PERCPU_ALLOC(64, 128, 3);
+	CALL_BATCH_PERCPU_ALLOC(96, 128, 4);
+	CALL_BATCH_PERCPU_ALLOC(128, 128, 5);
+	CALL_BATCH_PERCPU_ALLOC(192, 128, 6);
+	CALL_BATCH_PERCPU_ALLOC(256, 128, 7);
+	CALL_BATCH_PERCPU_ALLOC(512, 64, 8);
+	CALL_BATCH_PERCPU_ALLOC(1024, 32, 9);
+	CALL_BATCH_PERCPU_ALLOC(2048, 16, 10);
+	CALL_BATCH_PERCPU_ALLOC(4096, 8, 11);
+
+	return 0;
+}
-- 
2.29.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu()
  2023-10-07 13:51 ` [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu() Hou Tao
@ 2023-10-07 14:04   ` Andrew Morton
  2023-10-08  2:47     ` Hou Tao
  2023-10-08 22:32   ` Dennis Zhou
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2023-10-07 14:04 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, linux-mm, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, Dennis Zhou, Tejun Heo,
	Christoph Lameter

On Sat,  7 Oct 2023 21:51:01 +0800 Hou Tao <houtao@huaweicloud.com> wrote:

> From: Hou Tao <houtao1@huawei.com>
> 
> Introduce alloc_size_percpu() to get the size of the dynamic per-cpu
> area. It will be used by bpf memory allocator in the following patches.
> BPF memory allocator maintains multiple per-cpu area caches for multiple
> area sizes and it needs the size of dynamic per-cpu area to select the
> corresponding cache when bpf program frees the dynamic per-cpu area.
> 
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2244,6 +2244,35 @@ static void pcpu_balance_workfn(struct work_struct *work)
>  	mutex_unlock(&pcpu_alloc_mutex);
>  }
>  
> +/**
> + * alloc_size_percpu - the size of the dynamic percpu area
> + * @ptr: pointer to the dynamic percpu area
> + *
> + * Return the size of the dynamic percpu area @ptr.
> + *
> + * RETURNS:
> + * The size of the dynamic percpu area.
> + *
> + * CONTEXT:
> + * Can be called from atomic context.
> + */
> +size_t alloc_size_percpu(void __percpu *ptr)
> +{
> +	struct pcpu_chunk *chunk;
> +	int bit_off, end;

It's minor, but I'd suggest unsigned long for both.

> +	void *addr;
> +
> +	if (!ptr)
> +		return 0;
> +
> +	addr = __pcpu_ptr_to_addr(ptr);
> +	/* No pcpu_lock here: ptr has not been freed, so chunk is still alive */
> +	chunk = pcpu_chunk_addr_search(addr);
> +	bit_off = (addr - chunk->base_addr) / PCPU_MIN_ALLOC_SIZE;

void* - void* is a ptrdiff_t, which is long or int.

> +	end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk), bit_off + 1);

find_next_bit takes an unsigned long

> +	return (end - bit_off) * PCPU_MIN_ALLOC_SIZE;

And then we don't need to worry about signedness issues.

> +}
> +



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu()
  2023-10-07 14:04   ` Andrew Morton
@ 2023-10-08  2:47     ` Hou Tao
  0 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-08  2:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: bpf, linux-mm, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, Dennis Zhou, Tejun Heo,
	Christoph Lameter

Hi,

On 10/7/2023 10:04 PM, Andrew Morton wrote:
> On Sat,  7 Oct 2023 21:51:01 +0800 Hou Tao <houtao@huaweicloud.com> wrote:
>
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Introduce alloc_size_percpu() to get the size of the dynamic per-cpu
>> area. It will be used by bpf memory allocator in the following patches.
>> BPF memory allocator maintains multiple per-cpu area caches for multiple
>> area sizes and it needs the size of dynamic per-cpu area to select the
>> corresponding cache when bpf program frees the dynamic per-cpu area.
>>
>> --- a/mm/percpu.c
>> +++ b/mm/percpu.c
>> @@ -2244,6 +2244,35 @@ static void pcpu_balance_workfn(struct work_struct *work)
>>  	mutex_unlock(&pcpu_alloc_mutex);
>>  }
>>  
>> +/**
>> + * alloc_size_percpu - the size of the dynamic percpu area
>> + * @ptr: pointer to the dynamic percpu area
>> + *
>> + * Return the size of the dynamic percpu area @ptr.
>> + *
>> + * RETURNS:
>> + * The size of the dynamic percpu area.
>> + *
>> + * CONTEXT:
>> + * Can be called from atomic context.
>> + */
>> +size_t alloc_size_percpu(void __percpu *ptr)
>> +{
>> +	struct pcpu_chunk *chunk;
>> +	int bit_off, end;
> It's minor, but I'd suggest unsigned long for both.

Thanks for this and all following suggestions. Will do in v2.
>
>> +	void *addr;
>> +
>> +	if (!ptr)
>> +		return 0;
>> +
>> +	addr = __pcpu_ptr_to_addr(ptr);
>> +	/* No pcpu_lock here: ptr has not been freed, so chunk is still alive */
>> +	chunk = pcpu_chunk_addr_search(addr);
>> +	bit_off = (addr - chunk->base_addr) / PCPU_MIN_ALLOC_SIZE;
> void* - void* is a ptrdiff_t, which is long or int.
>
>> +	end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk), bit_off + 1);
> find_next_bit takes an unsigned long
>
>> +	return (end - bit_off) * PCPU_MIN_ALLOC_SIZE;
> And then we don't need to worry about signedness issues.
>
>> +}
>> +



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu()
  2023-10-07 13:51 ` [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu() Hou Tao
  2023-10-07 14:04   ` Andrew Morton
@ 2023-10-08 22:32   ` Dennis Zhou
  2023-10-11  6:30     ` Hou Tao
  1 sibling, 1 reply; 16+ messages in thread
From: Dennis Zhou @ 2023-10-08 22:32 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, linux-mm, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, Tejun Heo, Christoph Lameter,
	Andrew Morton

Hello,

On Sat, Oct 07, 2023 at 09:51:01PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Introduce alloc_size_percpu() to get the size of the dynamic per-cpu
> area. It will be used by bpf memory allocator in the following patches.
> BPF memory allocator maintains multiple per-cpu area caches for multiple
> area sizes and it needs the size of dynamic per-cpu area to select the
> corresponding cache when bpf program frees the dynamic per-cpu area.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  include/linux/percpu.h |  1 +
>  mm/percpu.c            | 29 +++++++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index 68fac2e7cbe6..d140d9d79567 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -132,6 +132,7 @@ extern void __init setup_per_cpu_areas(void);
>  extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1);
>  extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1);
>  extern void free_percpu(void __percpu *__pdata);
> +extern size_t alloc_size_percpu(void __percpu *__pdata);
>  
>  DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
>  
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 7b40b3963f10..f541cfc3cb2d 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -2244,6 +2244,35 @@ static void pcpu_balance_workfn(struct work_struct *work)
>  	mutex_unlock(&pcpu_alloc_mutex);
>  }
>  
> +/**
> + * alloc_size_percpu - the size of the dynamic percpu area

Can we name this pcpu_alloc_size(). A few other functions are
exposed under pcpu_* so it's a bit easier to keep track of.

> + * @ptr: pointer to the dynamic percpu area
> + *
> + * Return the size of the dynamic percpu area @ptr.
> + *
> + * RETURNS:
> + * The size of the dynamic percpu area.
> + *
> + * CONTEXT:
> + * Can be called from atomic context.
> + */
> +size_t alloc_size_percpu(void __percpu *ptr)
> +{
> +	struct pcpu_chunk *chunk;
> +	int bit_off, end;
> +	void *addr;
> +
> +	if (!ptr)
> +		return 0;
> +
> +	addr = __pcpu_ptr_to_addr(ptr);
> +	/* No pcpu_lock here: ptr has not been freed, so chunk is still alive */

Now that percpu variables are floating around more commonly, I think we
or I need to add more validation guards so it's easier to
debug bogus/stale pointers. Potentially like a static_key or Kconfig so
that we take the lock and `test_bit()`.

> +	chunk = pcpu_chunk_addr_search(addr);
> +	bit_off = (addr - chunk->base_addr) / PCPU_MIN_ALLOC_SIZE;
> +	end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk), bit_off + 1);

Nit: can you please reflow `bit_off + 1` to the next line. I know we
dropped the line requirement, but percpu.c almost completely still
follows it.

> +	return (end - bit_off) * PCPU_MIN_ALLOC_SIZE;
> +}
> +
>  /**
>   * free_percpu - free percpu area
>   * @ptr: pointer to area to free
> -- 
> 2.29.2
> 

Thanks,
Dennis


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h
  2023-10-07 13:51 ` [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h Hou Tao
@ 2023-10-09 16:28   ` Stanislav Fomichev
  2023-10-11  6:31     ` Hou Tao
  2023-10-09 16:56   ` Alexei Starovoitov
  1 sibling, 1 reply; 16+ messages in thread
From: Stanislav Fomichev @ 2023-10-09 16:28 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, linux-mm, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

On 10/07, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> both syscall.c and helpers.c have the declaration of
> __bpf_obj_drop_impl(), so just move it to a common header file.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/helpers.c  |  3 +--
>  kernel/bpf/internal.h | 11 +++++++++++
>  kernel/bpf/syscall.c  |  4 ++--
>  3 files changed, 14 insertions(+), 4 deletions(-)
>  create mode 100644 kernel/bpf/internal.h
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index dd1c69ee3375..07f49f8831c0 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -24,6 +24,7 @@
>  #include <linux/bpf_mem_alloc.h>
>  #include <linux/kasan.h>
>  
> +#include "internal.h"
>  #include "../../lib/kstrtox.h"
>  
>  /* If kernel subsystem is allowing eBPF programs to call this function,
> @@ -1808,8 +1809,6 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>  	}
>  }
>  
> -void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
> -
>  void bpf_list_head_free(const struct btf_field *field, void *list_head,
>  			struct bpf_spin_lock *spin_lock)
>  {
> diff --git a/kernel/bpf/internal.h b/kernel/bpf/internal.h
> new file mode 100644
> index 000000000000..e233ea83eb0a
> --- /dev/null
> +++ b/kernel/bpf/internal.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2023. Huawei Technologies Co., Ltd
> + */

Don't think copyright works this way? You can't move the code and
claim authorship.

In general, git tracks authors and contributors, so not sure
why we still keep putting these explicit notices..


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 2/6] bpf: Re-enable unit_size checking for global per-cpu allocator
  2023-10-07 13:51 ` [PATCH bpf-next 2/6] bpf: Re-enable unit_size checking for global per-cpu allocator Hou Tao
@ 2023-10-09 16:51   ` Alexei Starovoitov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2023-10-09 16:51 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, linux-mm, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

On Sat, Oct 07, 2023 at 09:51:02PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> With alloc_size_percpu() in place, check whether or not the size of
> the dynamic per-cpu area is matched with unit_size.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/memalloc.c | 25 ++++++++++++++-----------
>  1 file changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> index 6cf61ea55c27..af9ff0755959 100644
> --- a/kernel/bpf/memalloc.c
> +++ b/kernel/bpf/memalloc.c
> @@ -497,21 +497,17 @@ static int check_obj_size(struct bpf_mem_cache *c, unsigned int idx)
>  	struct llist_node *first;
>  	unsigned int obj_size;
>  
> -	/* For per-cpu allocator, the size of free objects in free list doesn't
> -	 * match with unit_size and now there is no way to get the size of
> -	 * per-cpu pointer saved in free object, so just skip the checking.
> -	 */
> -	if (c->percpu_size)
> -		return 0;
> -
>  	first = c->free_llist.first;
>  	if (!first)
>  		return 0;
>  
> -	obj_size = ksize(first);
> +	if (c->percpu_size)
> +		obj_size = alloc_size_percpu(((void **)first)[1]);
> +	else
> +		obj_size = ksize(first);
>  	if (obj_size != c->unit_size) {
> -		WARN_ONCE(1, "bpf_mem_cache[%u]: unexpected object size %u, expect %u\n",
> -			  idx, obj_size, c->unit_size);
> +		WARN_ONCE(1, "bpf_mem_cache[%u]: percpu %d, unexpected object size %u, expect %u\n",
> +			  idx, c->percpu_size, obj_size, c->unit_size);
>  		return -EINVAL;
>  	}
>  	return 0;
> @@ -979,7 +975,14 @@ void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags)
>  	return !ret ? NULL : ret + LLIST_NODE_SZ;
>  }
>  
> -/* Most of the logic is taken from setup_kmalloc_cache_index_table() */
> +/* The alignment of dynamic per-cpu area is 8 and c->unit_size and the
> + * actual size of dynamic per-cpu area will always be matched, so there is
> + * no need to adjust size_index for per-cpu allocation. However for the
> + * simplicity of the implementation, use an unified size_index for both
> + * kmalloc and per-cpu allocation.
> + *
> + * Most of the logic is taken from setup_kmalloc_cache_index_table().

Since this logic is removed in bpf tree you probably need to wait for
bpf tree to get merged into bpf-next before respinning to avoid conflicts.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h
  2023-10-07 13:51 ` [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h Hou Tao
  2023-10-09 16:28   ` Stanislav Fomichev
@ 2023-10-09 16:56   ` Alexei Starovoitov
  2023-10-11  6:40     ` Hou Tao
  1 sibling, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2023-10-09 16:56 UTC (permalink / raw)
  To: Hou Tao
  Cc: bpf, linux-mm, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

On Sat, Oct 07, 2023 at 09:51:04PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> both syscall.c and helpers.c have the declaration of
> __bpf_obj_drop_impl(), so just move it to a common header file.
> 
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  kernel/bpf/helpers.c  |  3 +--
>  kernel/bpf/internal.h | 11 +++++++++++
>  kernel/bpf/syscall.c  |  4 ++--
>  3 files changed, 14 insertions(+), 4 deletions(-)
>  create mode 100644 kernel/bpf/internal.h
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index dd1c69ee3375..07f49f8831c0 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -24,6 +24,7 @@
>  #include <linux/bpf_mem_alloc.h>
>  #include <linux/kasan.h>
>  
> +#include "internal.h"

Pls use one of the existing headers. No need for new one.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu()
  2023-10-08 22:32   ` Dennis Zhou
@ 2023-10-11  6:30     ` Hou Tao
  0 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-11  6:30 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: bpf, linux-mm, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Stanislav Fomichev, Jiri Olsa,
	John Fastabend, houtao1, Tejun Heo, Christoph Lameter,
	Andrew Morton

Hi,

On 10/9/2023 6:32 AM, Dennis Zhou wrote:
> Hello,
>
> On Sat, Oct 07, 2023 at 09:51:01PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Introduce alloc_size_percpu() to get the size of the dynamic per-cpu
>> area. It will be used by bpf memory allocator in the following patches.
>> BPF memory allocator maintains multiple per-cpu area caches for multiple
>> area sizes and it needs the size of dynamic per-cpu area to select the
>> corresponding cache when bpf program frees the dynamic per-cpu area.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  include/linux/percpu.h |  1 +
>>  mm/percpu.c            | 29 +++++++++++++++++++++++++++++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
>> index 68fac2e7cbe6..d140d9d79567 100644
>> --- a/include/linux/percpu.h
>> +++ b/include/linux/percpu.h
>> @@ -132,6 +132,7 @@ extern void __init setup_per_cpu_areas(void);
>>  extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp) __alloc_size(1);
>>  extern void __percpu *__alloc_percpu(size_t size, size_t align) __alloc_size(1);
>>  extern void free_percpu(void __percpu *__pdata);
>> +extern size_t alloc_size_percpu(void __percpu *__pdata);
>>  
>>  DEFINE_FREE(free_percpu, void __percpu *, free_percpu(_T))
>>  
>> diff --git a/mm/percpu.c b/mm/percpu.c
>> index 7b40b3963f10..f541cfc3cb2d 100644
>> --- a/mm/percpu.c
>> +++ b/mm/percpu.c
>> @@ -2244,6 +2244,35 @@ static void pcpu_balance_workfn(struct work_struct *work)
>>  	mutex_unlock(&pcpu_alloc_mutex);
>>  }
>>  
>> +/**
>> + * alloc_size_percpu - the size of the dynamic percpu area
> Can we name this pcpu_alloc_size(). A few other functions are
> exposed under pcpu_* so it's a bit easier to keep track of.

Will do in v2.
>
>> + * @ptr: pointer to the dynamic percpu area
>> + *
>> + * Return the size of the dynamic percpu area @ptr.
>> + *
>> + * RETURNS:
>> + * The size of the dynamic percpu area.
>> + *
>> + * CONTEXT:
>> + * Can be called from atomic context.
>> + */
>> +size_t alloc_size_percpu(void __percpu *ptr)
>> +{
>> +	struct pcpu_chunk *chunk;
>> +	int bit_off, end;
>> +	void *addr;
>> +
>> +	if (!ptr)
>> +		return 0;
>> +
>> +	addr = __pcpu_ptr_to_addr(ptr);
>> +	/* No pcpu_lock here: ptr has not been freed, so chunk is still alive */
> Now that percpu variables are floating around more commonly, I think we
> or I need to add more validation guards so it's easier to
> debug bogus/stale pointers. Potentially like a static_key or Kconfig so
> that we take the lock and `test_bit()`.

Before the patch, it seems that free_percpu() is the only user of
__pcpu_ptr_to_addr(ptr). I can move the invocation of both
__pcpu_ptr_to_addr() and pcpu_chunk_addr_search() into a common helper
if you are OK with it. And I think it will ease the work of adding
validation guard on the per-cpu pointer.
>
>> +	chunk = pcpu_chunk_addr_search(addr);
>> +	bit_off = (addr - chunk->base_addr) / PCPU_MIN_ALLOC_SIZE;
>> +	end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk), bit_off + 1);
> Nit: can you please reflow `bit_off + 1` to the next line. I know we
> dropped the line requirement, but percpu.c almost completely still
> follows it.

Will do in v2.
>
>> +	return (end - bit_off) * PCPU_MIN_ALLOC_SIZE;
>> +}
>> +
>>  /**
>>   * free_percpu - free percpu area
>>   * @ptr: pointer to area to free
>> -- 
>> 2.29.2
>>
> Thanks,
> Dennis



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h
  2023-10-09 16:28   ` Stanislav Fomichev
@ 2023-10-11  6:31     ` Hou Tao
  0 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-11  6:31 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: bpf, linux-mm, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Hao Luo, Yonghong Song,
	Daniel Borkmann, KP Singh, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

Hi,

On 10/10/2023 12:28 AM, Stanislav Fomichev wrote:
> On 10/07, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> both syscall.c and helpers.c have the declaration of
>> __bpf_obj_drop_impl(), so just move it to a common header file.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  kernel/bpf/helpers.c  |  3 +--
>>  kernel/bpf/internal.h | 11 +++++++++++
>>  kernel/bpf/syscall.c  |  4 ++--
>>  3 files changed, 14 insertions(+), 4 deletions(-)
>>  create mode 100644 kernel/bpf/internal.h
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index dd1c69ee3375..07f49f8831c0 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/bpf_mem_alloc.h>
>>  #include <linux/kasan.h>
>>  
>> +#include "internal.h"
>>  #include "../../lib/kstrtox.h"
>>  
>>  /* If kernel subsystem is allowing eBPF programs to call this function,
>> @@ -1808,8 +1809,6 @@ bpf_base_func_proto(enum bpf_func_id func_id)
>>  	}
>>  }
>>  
>> -void __bpf_obj_drop_impl(void *p, const struct btf_record *rec);
>> -
>>  void bpf_list_head_free(const struct btf_field *field, void *list_head,
>>  			struct bpf_spin_lock *spin_lock)
>>  {
>> diff --git a/kernel/bpf/internal.h b/kernel/bpf/internal.h
>> new file mode 100644
>> index 000000000000..e233ea83eb0a
>> --- /dev/null
>> +++ b/kernel/bpf/internal.h
>> @@ -0,0 +1,11 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright (C) 2023. Huawei Technologies Co., Ltd
>> + */
> Don't think copyright works this way? You can't move the code and
> claim authorship.
>
> In general, git tracks authors and contributors, so not sure
> why we still keep putting these explicit notices..
My bad. Thanks for the remainder. Will fix in v2.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h
  2023-10-09 16:56   ` Alexei Starovoitov
@ 2023-10-11  6:40     ` Hou Tao
  0 siblings, 0 replies; 16+ messages in thread
From: Hou Tao @ 2023-10-11  6:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, linux-mm, Martin KaFai Lau, Andrii Nakryiko, Song Liu,
	Hao Luo, Yonghong Song, Daniel Borkmann, KP Singh,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, houtao1,
	Dennis Zhou, Tejun Heo, Christoph Lameter, Andrew Morton

Hi,

On 10/10/2023 12:56 AM, Alexei Starovoitov wrote:
> On Sat, Oct 07, 2023 at 09:51:04PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> both syscall.c and helpers.c have the declaration of
>> __bpf_obj_drop_impl(), so just move it to a common header file.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
>> ---
>>  kernel/bpf/helpers.c  |  3 +--
>>  kernel/bpf/internal.h | 11 +++++++++++
>>  kernel/bpf/syscall.c  |  4 ++--
>>  3 files changed, 14 insertions(+), 4 deletions(-)
>>  create mode 100644 kernel/bpf/internal.h
>>
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index dd1c69ee3375..07f49f8831c0 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -24,6 +24,7 @@
>>  #include <linux/bpf_mem_alloc.h>
>>  #include <linux/kasan.h>
>>  
>> +#include "internal.h"
> Pls use one of the existing headers. No need for new one.

Do you mean one of header files in include/linux, right ? Because there
is no proper header files under kernel/bpf. The best fit is
include/linux/bpf.h, but after modify bpf.h, multiple files need to be
rebuild: (e.g. under net/, kernel, drivers/net, block/, fs/). Maybe it
is time to break bpf.h into multiple independent files ?



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-10-11  7:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-07 13:51 [PATCH bpf-next 0/6] bpf: Fixes for per-cpu kptr Hou Tao
2023-10-07 13:51 ` [PATCH bpf-next 1/6] mm/percpu.c: introduce alloc_size_percpu() Hou Tao
2023-10-07 14:04   ` Andrew Morton
2023-10-08  2:47     ` Hou Tao
2023-10-08 22:32   ` Dennis Zhou
2023-10-11  6:30     ` Hou Tao
2023-10-07 13:51 ` [PATCH bpf-next 2/6] bpf: Re-enable unit_size checking for global per-cpu allocator Hou Tao
2023-10-09 16:51   ` Alexei Starovoitov
2023-10-07 13:51 ` [PATCH bpf-next 3/6] bpf: Use alloc_size_percpu() in bpf_mem_free{_rcu}() Hou Tao
2023-10-07 13:51 ` [PATCH bpf-next 4/6] bpf: Move the declaration of __bpf_obj_drop_impl() to internal.h Hou Tao
2023-10-09 16:28   ` Stanislav Fomichev
2023-10-11  6:31     ` Hou Tao
2023-10-09 16:56   ` Alexei Starovoitov
2023-10-11  6:40     ` Hou Tao
2023-10-07 13:51 ` [PATCH bpf-next 5/6] bpf: Use bpf_global_percpu_ma for per-cpu kptr in __bpf_obj_drop_impl() Hou Tao
2023-10-07 13:51 ` [PATCH bpf-next 6/6] selftests/bpf: Add more test cases for bpf memory allocator Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox