* [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2)
@ 2024-09-27 18:41 Namhyung Kim
2024-09-27 18:41 ` [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Namhyung Kim @ 2024-09-27 18:41 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
Hello,
I'm proposing a new iterator and a kfunc for the slab memory allocator
to get information of each kmem_cache like in /proc/slabinfo or
/sys/kernel/slab in more flexible way.
v2 changes)
* rename it to "kmem_cache_iter"
* fix a build issue
* add Acked-by's from Roman and Vlastimil (Thanks!)
* add error codes in the test for debugging
v1: https://lore.kernel.org/lkml/20240925223023.735947-1-namhyung@kernel.org/
My use case is `perf lock contention` tool which shows contended locks
but many of them are not global locks and don't have symbols. If it
can tranlate the address of the lock in a slab object to the name of
the slab, it'd be much more useful.
I'm not aware of type information in slab yet, but I was told there's
a work to associate BTF ID with it. It'd be definitely helpful to my
use case. Probably we need another kfunc to get the start address of
the object or the offset in the object from an address if the type
info is available. But I want to start with a simple thing first.
The slab_iter iterates kmem_cache objects under slab_mutex and will be
useful for userspace to prepare some work for specific slabs like
setting up filters in advance. And the bpf_get_slab_cache() kfunc
will return a pointer to a slab from the address of a lock. And the
test code is to read from the iterator and make sure it finds a slab
cache of the task_struct for the current task.
The code is available at 'bpf/slab-iter-v2' branch in
https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Thanks,
Namhyung
Namhyung Kim (3):
bpf: Add kmem_cache iterator
mm/bpf: Add bpf_get_kmem_cache() kfunc
selftests/bpf: Add a test for kmem_cache_iter
include/linux/btf_ids.h | 1 +
kernel/bpf/Makefile | 1 +
kernel/bpf/helpers.c | 1 +
kernel/bpf/kmem_cache_iter.c | 131 ++++++++++++++++++
mm/slab_common.c | 16 +++
.../bpf/prog_tests/kmem_cache_iter.c | 64 +++++++++
tools/testing/selftests/bpf/progs/bpf_iter.h | 7 +
.../selftests/bpf/progs/kmem_cache_iter.c | 66 +++++++++
8 files changed, 287 insertions(+)
create mode 100644 kernel/bpf/kmem_cache_iter.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator
2024-09-27 18:41 [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Namhyung Kim
@ 2024-09-27 18:41 ` Namhyung Kim
2024-09-29 17:04 ` Alexei Starovoitov
2024-09-27 18:41 ` [RFC/PATCH bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
` (2 subsequent siblings)
3 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2024-09-27 18:41 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
The new "kmem_cache" iterator will traverse the list of slab caches
and call attached BPF programs for each entry. It should check the
argument (ctx.s) if it's NULL before using it.
The iteration will be done with slab_mutex held but it'd break and
return to user if the BPF program emits data to seq buffer more than
the buffer size given by the user. IOW the whole iteration would be
protected by the slab_mutex as long as it won't emit anything.
It includes the internal "mm/slab.h" header to access kmem_cache,
slab_caches and slab_mutex. Hope it's ok to mm folks.
Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
include/linux/btf_ids.h | 1 +
kernel/bpf/Makefile | 1 +
kernel/bpf/kmem_cache_iter.c | 131 +++++++++++++++++++++++++++++++++++
3 files changed, 133 insertions(+)
create mode 100644 kernel/bpf/kmem_cache_iter.c
diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h
index c0e3e1426a82f5c4..139bdececdcfaefb 100644
--- a/include/linux/btf_ids.h
+++ b/include/linux/btf_ids.h
@@ -283,5 +283,6 @@ extern u32 btf_tracing_ids[];
extern u32 bpf_cgroup_btf_id[];
extern u32 bpf_local_storage_map_btf_id[];
extern u32 btf_bpf_map_id[];
+extern u32 bpf_kmem_cache_btf_id[];
#endif
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 9b9c151b5c826b31..105328f0b9c04e37 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -52,3 +52,4 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o
obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
+obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o
diff --git a/kernel/bpf/kmem_cache_iter.c b/kernel/bpf/kmem_cache_iter.c
new file mode 100644
index 0000000000000000..5f7436b52f2e6b06
--- /dev/null
+++ b/kernel/bpf/kmem_cache_iter.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Google */
+#include <linux/bpf.h>
+#include <linux/btf_ids.h>
+#include <linux/slab.h>
+#include <linux/kernel.h>
+#include <linux/seq_file.h>
+
+#include "../../mm/slab.h" /* kmem_cache, slab_caches and slab_mutex */
+
+struct bpf_iter__kmem_cache {
+ __bpf_md_ptr(struct bpf_iter_meta *, meta);
+ __bpf_md_ptr(struct kmem_cache *, s);
+};
+
+static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
+{
+ loff_t cnt = 0;
+ struct kmem_cache *s = NULL;
+
+ mutex_lock(&slab_mutex);
+
+ /*
+ * Find an entry at the given position in the slab_caches list instead
+ * of keeping a reference (of the last visited entry, if any) out of
+ * slab_mutex. It might miss something if one is deleted in the middle
+ * while it releases the lock. But it should be rare and there's not
+ * much we can do about it.
+ */
+ list_for_each_entry(s, &slab_caches, list) {
+ if (cnt == *pos)
+ break;
+
+ cnt++;
+ }
+
+ if (cnt != *pos)
+ return NULL;
+
+ ++*pos;
+ return s;
+}
+
+static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
+{
+ struct bpf_iter_meta meta;
+ struct bpf_iter__kmem_cache ctx = {
+ .meta = &meta,
+ .s = v,
+ };
+ struct bpf_prog *prog;
+
+ meta.seq = seq;
+ prog = bpf_iter_get_info(&meta, true);
+ if (prog)
+ bpf_iter_run_prog(prog, &ctx);
+
+ mutex_unlock(&slab_mutex);
+}
+
+static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ struct kmem_cache *s = v;
+
+ ++*pos;
+
+ if (list_last_entry(&slab_caches, struct kmem_cache, list) == s)
+ return NULL;
+
+ return list_next_entry(s, list);
+}
+
+static int kmem_cache_iter_seq_show(struct seq_file *seq, void *v)
+{
+ struct bpf_iter_meta meta;
+ struct bpf_iter__kmem_cache ctx = {
+ .meta = &meta,
+ .s = v,
+ };
+ struct bpf_prog *prog;
+ int ret = 0;
+
+ meta.seq = seq;
+ prog = bpf_iter_get_info(&meta, false);
+ if (prog)
+ ret = bpf_iter_run_prog(prog, &ctx);
+
+ return ret;
+}
+
+static const struct seq_operations kmem_cache_iter_seq_ops = {
+ .start = kmem_cache_iter_seq_start,
+ .next = kmem_cache_iter_seq_next,
+ .stop = kmem_cache_iter_seq_stop,
+ .show = kmem_cache_iter_seq_show,
+};
+
+BTF_ID_LIST_GLOBAL_SINGLE(bpf_kmem_cache_btf_id, struct, kmem_cache)
+
+static const struct bpf_iter_seq_info kmem_cache_iter_seq_info = {
+ .seq_ops = &kmem_cache_iter_seq_ops,
+};
+
+static void bpf_iter_kmem_cache_show_fdinfo(const struct bpf_iter_aux_info *aux,
+ struct seq_file *seq)
+{
+ seq_puts(seq, "kmem_cache iter\n");
+}
+
+DEFINE_BPF_ITER_FUNC(kmem_cache, struct bpf_iter_meta *meta,
+ struct kmem_cache *s)
+
+static struct bpf_iter_reg bpf_kmem_cache_reg_info = {
+ .target = "kmem_cache",
+ .feature = BPF_ITER_RESCHED,
+ .show_fdinfo = bpf_iter_kmem_cache_show_fdinfo,
+ .ctx_arg_info_size = 1,
+ .ctx_arg_info = {
+ { offsetof(struct bpf_iter__kmem_cache, s),
+ PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
+ },
+ .seq_info = &kmem_cache_iter_seq_info,
+};
+
+static int __init bpf_kmem_cache_iter_init(void)
+{
+ bpf_kmem_cache_reg_info.ctx_arg_info[0].btf_id = bpf_kmem_cache_btf_id[0];
+ return bpf_iter_reg_target(&bpf_kmem_cache_reg_info);
+}
+
+late_initcall(bpf_kmem_cache_iter_init);
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC/PATCH bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
2024-09-27 18:41 [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Namhyung Kim
2024-09-27 18:41 ` [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
@ 2024-09-27 18:41 ` Namhyung Kim
2024-09-29 17:05 ` Alexei Starovoitov
2024-09-27 18:41 ` [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim
2024-09-29 17:00 ` [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Alexei Starovoitov
3 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2024-09-27 18:41 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
The bpf_get_kmem_cache() is to get a slab cache information from a
virtual address like virt_to_cache(). If the address is a pointer
to a slab object, it'd return a valid kmem_cache pointer, otherwise
NULL is returned.
It doesn't grab a reference count of the kmem_cache so the caller is
responsible to manage the access. The intended use case for now is to
symbolize locks in slab objects from the lock contention tracepoints.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
kernel/bpf/helpers.c | 1 +
mm/slab_common.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1a43d06eab286c26..bbc5800ec3afc899 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
BTF_KFUNCS_END(common_btf_ids)
static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7443244656150325..e648b05a635b94bf 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1322,6 +1322,22 @@ size_t ksize(const void *objp)
}
EXPORT_SYMBOL(ksize);
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/btf.h>
+
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
+{
+ struct slab *slab;
+
+ slab = virt_to_slab((void *)(long)addr);
+ return slab ? slab->slab_cache : NULL;
+}
+
+__bpf_kfunc_end_defs();
+#endif /* CONFIG_BPF_SYSCALL */
+
/* Tracepoints definitions. */
EXPORT_TRACEPOINT_SYMBOL(kmalloc);
EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
2024-09-27 18:41 [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Namhyung Kim
2024-09-27 18:41 ` [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
2024-09-27 18:41 ` [RFC/PATCH bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
@ 2024-09-27 18:41 ` Namhyung Kim
2024-09-29 6:13 ` Namhyung Kim
2024-09-29 17:00 ` [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Alexei Starovoitov
3 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2024-09-27 18:41 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
The test traverses all slab caches using the kmem_cache_iter and check
if current task's pointer is from "task_struct" slab cache.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
.../bpf/prog_tests/kmem_cache_iter.c | 64 ++++++++++++++++++
tools/testing/selftests/bpf/progs/bpf_iter.h | 7 ++
.../selftests/bpf/progs/kmem_cache_iter.c | 66 +++++++++++++++++++
3 files changed, 137 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c
diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
new file mode 100644
index 0000000000000000..814bcc453e9f3ccd
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Google */
+
+#include <test_progs.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+#include "kmem_cache_iter.skel.h"
+
+static void test_kmem_cache_iter_check_task(struct kmem_cache_iter *skel)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .flags = BPF_F_TEST_RUN_ON_CPU,
+ );
+ int prog_fd = bpf_program__fd(skel->progs.check_task_struct);
+
+ /* get task_struct and check it if's from a slab cache */
+ bpf_prog_test_run_opts(prog_fd, &opts);
+
+ /* the BPF program should set 'found' variable */
+ ASSERT_EQ(skel->bss->found, 1, "found task_struct");
+}
+
+void test_kmem_cache_iter(void)
+{
+ DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+ struct kmem_cache_iter *skel = NULL;
+ union bpf_iter_link_info linfo = {};
+ struct bpf_link *link;
+ char buf[1024];
+ int iter_fd;
+
+ skel = kmem_cache_iter__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "kmem_cache_iter__open_and_load"))
+ return;
+
+ opts.link_info = &linfo;
+ opts.link_info_len = sizeof(linfo);
+
+ link = bpf_program__attach_iter(skel->progs.slab_info_collector, &opts);
+ if (!ASSERT_OK_PTR(link, "attach_iter"))
+ goto destroy;
+
+ iter_fd = bpf_iter_create(bpf_link__fd(link));
+ if (!ASSERT_GE(iter_fd, 0, "iter_create"))
+ goto free_link;
+
+ memset(buf, 0, sizeof(buf));
+ while (read(iter_fd, buf, sizeof(buf) > 0)) {
+ /* read out all contents */
+ printf("%s", buf);
+ }
+
+ /* next reads should return 0 */
+ ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read");
+
+ test_kmem_cache_iter_check_task(skel);
+
+ close(iter_fd);
+
+free_link:
+ bpf_link__destroy(link);
+destroy:
+ kmem_cache_iter__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
index c41ee80533ca219a..3305dc3a74b32481 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter.h
+++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
@@ -24,6 +24,7 @@
#define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
#define BTF_F_ZERO BTF_F_ZERO___not_used
#define bpf_iter__ksym bpf_iter__ksym___not_used
+#define bpf_iter__kmem_cache bpf_iter__kmem_cache___not_used
#include "vmlinux.h"
#undef bpf_iter_meta
#undef bpf_iter__bpf_map
@@ -48,6 +49,7 @@
#undef BTF_F_PTR_RAW
#undef BTF_F_ZERO
#undef bpf_iter__ksym
+#undef bpf_iter__kmem_cache
struct bpf_iter_meta {
struct seq_file *seq;
@@ -165,3 +167,8 @@ struct bpf_iter__ksym {
struct bpf_iter_meta *meta;
struct kallsym_iter *ksym;
};
+
+struct bpf_iter__kmem_cache {
+ struct bpf_iter_meta *meta;
+ struct kmem_cache *s;
+} __attribute__((preserve_access_index));
diff --git a/tools/testing/selftests/bpf/progs/kmem_cache_iter.c b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
new file mode 100644
index 0000000000000000..3f6ec15a1bf6344c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Google */
+
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#define SLAB_NAME_MAX 256
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(key_size, sizeof(void *));
+ __uint(value_size, SLAB_NAME_MAX);
+ __uint(max_entries, 1024);
+} slab_hash SEC(".maps");
+
+extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
+
+/* result, will be checked by userspace */
+int found;
+
+SEC("iter/kmem_cache")
+int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
+{
+ struct seq_file *seq = ctx->meta->seq;
+ struct kmem_cache *s = ctx->s;
+
+ if (s) {
+ char name[SLAB_NAME_MAX];
+
+ /*
+ * To make sure if the slab_iter implements the seq interface
+ * properly and it's also useful for debugging.
+ */
+ BPF_SEQ_PRINTF(seq, "%s: %u\n", s->name, s->object_size);
+
+ bpf_probe_read_kernel_str(name, sizeof(name), s->name);
+ bpf_map_update_elem(&slab_hash, &s, name, BPF_NOEXIST);
+ }
+
+ return 0;
+}
+
+SEC("raw_tp/bpf_test_finish")
+int BPF_PROG(check_task_struct)
+{
+ __u64 curr = bpf_get_current_task();
+ struct kmem_cache *s;
+ char *name;
+
+ s = bpf_get_kmem_cache(curr);
+ if (s == NULL) {
+ found = -1;
+ return 0;
+ }
+
+ name = bpf_map_lookup_elem(&slab_hash, &s);
+ if (name && !bpf_strncmp(name, 11, "task_struct"))
+ found = 1;
+ else
+ found = -2;
+
+ return 0;
+}
--
2.46.1.824.gd892dcdcdd-goog
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
2024-09-27 18:41 ` [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim
@ 2024-09-29 6:13 ` Namhyung Kim
2024-09-29 14:27 ` Hyeonggon Yoo
0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2024-09-29 6:13 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
On Fri, Sep 27, 2024 at 11:41:33AM -0700, Namhyung Kim wrote:
> The test traverses all slab caches using the kmem_cache_iter and check
> if current task's pointer is from "task_struct" slab cache.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> .../bpf/prog_tests/kmem_cache_iter.c | 64 ++++++++++++++++++
> tools/testing/selftests/bpf/progs/bpf_iter.h | 7 ++
> .../selftests/bpf/progs/kmem_cache_iter.c | 66 +++++++++++++++++++
> 3 files changed, 137 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> new file mode 100644
> index 0000000000000000..814bcc453e9f3ccd
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> @@ -0,0 +1,64 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Google */
> +
> +#include <test_progs.h>
> +#include <bpf/libbpf.h>
> +#include <bpf/btf.h>
> +#include "kmem_cache_iter.skel.h"
> +
> +static void test_kmem_cache_iter_check_task(struct kmem_cache_iter *skel)
> +{
> + LIBBPF_OPTS(bpf_test_run_opts, opts,
> + .flags = BPF_F_TEST_RUN_ON_CPU,
> + );
> + int prog_fd = bpf_program__fd(skel->progs.check_task_struct);
> +
> + /* get task_struct and check it if's from a slab cache */
> + bpf_prog_test_run_opts(prog_fd, &opts);
> +
> + /* the BPF program should set 'found' variable */
> + ASSERT_EQ(skel->bss->found, 1, "found task_struct");
Hmm.. I'm seeing a failure with found being -1, which means ...
> +}
> +
> +void test_kmem_cache_iter(void)
> +{
> + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> + struct kmem_cache_iter *skel = NULL;
> + union bpf_iter_link_info linfo = {};
> + struct bpf_link *link;
> + char buf[1024];
> + int iter_fd;
> +
> + skel = kmem_cache_iter__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "kmem_cache_iter__open_and_load"))
> + return;
> +
> + opts.link_info = &linfo;
> + opts.link_info_len = sizeof(linfo);
> +
> + link = bpf_program__attach_iter(skel->progs.slab_info_collector, &opts);
> + if (!ASSERT_OK_PTR(link, "attach_iter"))
> + goto destroy;
> +
> + iter_fd = bpf_iter_create(bpf_link__fd(link));
> + if (!ASSERT_GE(iter_fd, 0, "iter_create"))
> + goto free_link;
> +
> + memset(buf, 0, sizeof(buf));
> + while (read(iter_fd, buf, sizeof(buf) > 0)) {
> + /* read out all contents */
> + printf("%s", buf);
> + }
> +
> + /* next reads should return 0 */
> + ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read");
> +
> + test_kmem_cache_iter_check_task(skel);
> +
> + close(iter_fd);
> +
> +free_link:
> + bpf_link__destroy(link);
> +destroy:
> + kmem_cache_iter__destroy(skel);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
> index c41ee80533ca219a..3305dc3a74b32481 100644
> --- a/tools/testing/selftests/bpf/progs/bpf_iter.h
> +++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
> @@ -24,6 +24,7 @@
> #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
> #define BTF_F_ZERO BTF_F_ZERO___not_used
> #define bpf_iter__ksym bpf_iter__ksym___not_used
> +#define bpf_iter__kmem_cache bpf_iter__kmem_cache___not_used
> #include "vmlinux.h"
> #undef bpf_iter_meta
> #undef bpf_iter__bpf_map
> @@ -48,6 +49,7 @@
> #undef BTF_F_PTR_RAW
> #undef BTF_F_ZERO
> #undef bpf_iter__ksym
> +#undef bpf_iter__kmem_cache
>
> struct bpf_iter_meta {
> struct seq_file *seq;
> @@ -165,3 +167,8 @@ struct bpf_iter__ksym {
> struct bpf_iter_meta *meta;
> struct kallsym_iter *ksym;
> };
> +
> +struct bpf_iter__kmem_cache {
> + struct bpf_iter_meta *meta;
> + struct kmem_cache *s;
> +} __attribute__((preserve_access_index));
> diff --git a/tools/testing/selftests/bpf/progs/kmem_cache_iter.c b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> new file mode 100644
> index 0000000000000000..3f6ec15a1bf6344c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024 Google */
> +
> +#include "bpf_iter.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +#define SLAB_NAME_MAX 256
> +
> +struct {
> + __uint(type, BPF_MAP_TYPE_HASH);
> + __uint(key_size, sizeof(void *));
> + __uint(value_size, SLAB_NAME_MAX);
> + __uint(max_entries, 1024);
> +} slab_hash SEC(".maps");
> +
> +extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
> +
> +/* result, will be checked by userspace */
> +int found;
> +
> +SEC("iter/kmem_cache")
> +int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
> +{
> + struct seq_file *seq = ctx->meta->seq;
> + struct kmem_cache *s = ctx->s;
> +
> + if (s) {
> + char name[SLAB_NAME_MAX];
> +
> + /*
> + * To make sure if the slab_iter implements the seq interface
> + * properly and it's also useful for debugging.
> + */
> + BPF_SEQ_PRINTF(seq, "%s: %u\n", s->name, s->object_size);
> +
> + bpf_probe_read_kernel_str(name, sizeof(name), s->name);
> + bpf_map_update_elem(&slab_hash, &s, name, BPF_NOEXIST);
> + }
> +
> + return 0;
> +}
> +
> +SEC("raw_tp/bpf_test_finish")
> +int BPF_PROG(check_task_struct)
> +{
> + __u64 curr = bpf_get_current_task();
> + struct kmem_cache *s;
> + char *name;
> +
> + s = bpf_get_kmem_cache(curr);
> + if (s == NULL) {
> + found = -1;
> + return 0;
... it cannot find a kmem_cache for the current task. This program is
run by bpf_prog_test_run_opts() with BPF_F_TEST_RUN_ON_CPU. So I think
the curr should point a task_struct in a slab cache.
Am I missing something?
Thanks,
Namhyung
> + }
> +
> + name = bpf_map_lookup_elem(&slab_hash, &s);
> + if (name && !bpf_strncmp(name, 11, "task_struct"))
> + found = 1;
> + else
> + found = -2;
> +
> + return 0;
> +}
> --
> 2.46.1.824.gd892dcdcdd-goog
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
2024-09-29 6:13 ` Namhyung Kim
@ 2024-09-29 14:27 ` Hyeonggon Yoo
2024-09-30 2:18 ` Namhyung Kim
0 siblings, 1 reply; 17+ messages in thread
From: Hyeonggon Yoo @ 2024-09-29 14:27 UTC (permalink / raw)
To: Namhyung Kim
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
linux-mm, Arnaldo Carvalho de Melo
On Sun, Sep 29, 2024 at 3:13 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Sep 27, 2024 at 11:41:33AM -0700, Namhyung Kim wrote:
> > The test traverses all slab caches using the kmem_cache_iter and check
> > if current task's pointer is from "task_struct" slab cache.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > .../bpf/prog_tests/kmem_cache_iter.c | 64 ++++++++++++++++++
> > tools/testing/selftests/bpf/progs/bpf_iter.h | 7 ++
> > .../selftests/bpf/progs/kmem_cache_iter.c | 66 +++++++++++++++++++
> > 3 files changed, 137 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > new file mode 100644
> > index 0000000000000000..814bcc453e9f3ccd
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > @@ -0,0 +1,64 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2024 Google */
> > +
> > +#include <test_progs.h>
> > +#include <bpf/libbpf.h>
> > +#include <bpf/btf.h>
> > +#include "kmem_cache_iter.skel.h"
> > +
> > +static void test_kmem_cache_iter_check_task(struct kmem_cache_iter *skel)
> > +{
> > + LIBBPF_OPTS(bpf_test_run_opts, opts,
> > + .flags = BPF_F_TEST_RUN_ON_CPU,
> > + );
> > + int prog_fd = bpf_program__fd(skel->progs.check_task_struct);
> > +
> > + /* get task_struct and check it if's from a slab cache */
> > + bpf_prog_test_run_opts(prog_fd, &opts);
> > +
> > + /* the BPF program should set 'found' variable */
> > + ASSERT_EQ(skel->bss->found, 1, "found task_struct");
>
> Hmm.. I'm seeing a failure with found being -1, which means ...
>
> > +}
> > +
> > +void test_kmem_cache_iter(void)
> > +{
> > + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > + struct kmem_cache_iter *skel = NULL;
> > + union bpf_iter_link_info linfo = {};
> > + struct bpf_link *link;
> > + char buf[1024];
> > + int iter_fd;
> > +
> > + skel = kmem_cache_iter__open_and_load();
> > + if (!ASSERT_OK_PTR(skel, "kmem_cache_iter__open_and_load"))
> > + return;
> > +
> > + opts.link_info = &linfo;
> > + opts.link_info_len = sizeof(linfo);
> > +
> > + link = bpf_program__attach_iter(skel->progs.slab_info_collector, &opts);
> > + if (!ASSERT_OK_PTR(link, "attach_iter"))
> > + goto destroy;
> > +
> > + iter_fd = bpf_iter_create(bpf_link__fd(link));
> > + if (!ASSERT_GE(iter_fd, 0, "iter_create"))
> > + goto free_link;
> > +
> > + memset(buf, 0, sizeof(buf));
> > + while (read(iter_fd, buf, sizeof(buf) > 0)) {
> > + /* read out all contents */
> > + printf("%s", buf);
> > + }
> > +
> > + /* next reads should return 0 */
> > + ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read");
> > +
> > + test_kmem_cache_iter_check_task(skel);
> > +
> > + close(iter_fd);
> > +
> > +free_link:
> > + bpf_link__destroy(link);
> > +destroy:
> > + kmem_cache_iter__destroy(skel);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
> > index c41ee80533ca219a..3305dc3a74b32481 100644
> > --- a/tools/testing/selftests/bpf/progs/bpf_iter.h
> > +++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
> > @@ -24,6 +24,7 @@
> > #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
> > #define BTF_F_ZERO BTF_F_ZERO___not_used
> > #define bpf_iter__ksym bpf_iter__ksym___not_used
> > +#define bpf_iter__kmem_cache bpf_iter__kmem_cache___not_used
> > #include "vmlinux.h"
> > #undef bpf_iter_meta
> > #undef bpf_iter__bpf_map
> > @@ -48,6 +49,7 @@
> > #undef BTF_F_PTR_RAW
> > #undef BTF_F_ZERO
> > #undef bpf_iter__ksym
> > +#undef bpf_iter__kmem_cache
> >
> > struct bpf_iter_meta {
> > struct seq_file *seq;
> > @@ -165,3 +167,8 @@ struct bpf_iter__ksym {
> > struct bpf_iter_meta *meta;
> > struct kallsym_iter *ksym;
> > };
> > +
> > +struct bpf_iter__kmem_cache {
> > + struct bpf_iter_meta *meta;
> > + struct kmem_cache *s;
> > +} __attribute__((preserve_access_index));
> > diff --git a/tools/testing/selftests/bpf/progs/kmem_cache_iter.c b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > new file mode 100644
> > index 0000000000000000..3f6ec15a1bf6344c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > @@ -0,0 +1,66 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2024 Google */
> > +
> > +#include "bpf_iter.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +#define SLAB_NAME_MAX 256
> > +
> > +struct {
> > + __uint(type, BPF_MAP_TYPE_HASH);
> > + __uint(key_size, sizeof(void *));
> > + __uint(value_size, SLAB_NAME_MAX);
> > + __uint(max_entries, 1024);
> > +} slab_hash SEC(".maps");
> > +
> > +extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
> > +
> > +/* result, will be checked by userspace */
> > +int found;
> > +
> > +SEC("iter/kmem_cache")
> > +int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
> > +{
> > + struct seq_file *seq = ctx->meta->seq;
> > + struct kmem_cache *s = ctx->s;
> > +
> > + if (s) {
> > + char name[SLAB_NAME_MAX];
> > +
> > + /*
> > + * To make sure if the slab_iter implements the seq interface
> > + * properly and it's also useful for debugging.
> > + */
> > + BPF_SEQ_PRINTF(seq, "%s: %u\n", s->name, s->object_size);
> > +
> > + bpf_probe_read_kernel_str(name, sizeof(name), s->name);
> > + bpf_map_update_elem(&slab_hash, &s, name, BPF_NOEXIST);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +SEC("raw_tp/bpf_test_finish")
> > +int BPF_PROG(check_task_struct)
> > +{
> > + __u64 curr = bpf_get_current_task();
> > + struct kmem_cache *s;
> > + char *name;
> > +
> > + s = bpf_get_kmem_cache(curr);
> > + if (s == NULL) {
> > + found = -1;
> > + return 0;
>
> ... it cannot find a kmem_cache for the current task. This program is
> run by bpf_prog_test_run_opts() with BPF_F_TEST_RUN_ON_CPU. So I think
> the curr should point a task_struct in a slab cache.
>
> Am I missing something?
Hi Namhyung,
Out of curiosity I've been investigating this issue on my machine and
running some experiments.
When the test fails, calling dump_page() for the page the task_struct
belongs to,
shows that the page does not have the PGTY_slab flag set which is why
virt_to_slab(current) returns NULL.
Does the test always fails on your environment? On my machine, the
test passed sometimes but failed some times.
Maybe sometimes the value returned by 'current' macro belongs to a
slab, but sometimes it does not.
But that doesn't really make sense to me as IIUC task_struct
descriptors are allocated from slab.
....Or maybe some code can overwrote the page_type field of a slab?
Hmm, it seems we need more information to identify what's gone wrong.
Just FYI, adding the output of the following code snippet in
bpf_get_kmem_cache():
pr_info("current = %llx\n", (unsigned long long)current);
dump_page(virt_to_head_page(current), "virt_to_head_page()");
# When the test passes
[ 232.755028] current = ffff8ff5b9ebd200
[ 232.755031] page: refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x139eb8
[ 232.755033] head: order:3 mapcount:0 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[ 232.755035] memcg:ffff8ff5b3ee0c01
[ 232.755037] ksm flags:
0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ 232.755040] page_type: f5(slab)
[ 232.755042] raw: 0017ffffc0000040 ffff8ff58028ab00 ffffdaba05b8fc00
dead000000000003
[ 232.755045] raw: 0000000000000000 0000000000030003 00000001f5000000
ffff8ff5b3ee0c01
[ 232.755047] head: 0017ffffc0000040 ffff8ff58028ab00
ffffdaba05b8fc00 dead000000000003
[ 232.755048] head: 0000000000000000 0000000000030003
00000001f5000000 ffff8ff5b3ee0c01
[ 232.755050] head: 0017ffffc0000003 ffffdaba04e7ae01
ffffffffffffffff 0000000000000000
[ 232.755052] head: 0000000000000008 0000000000000000
00000000ffffffff 0000000000000000
[ 232.755053] page dumped because: virt_to_head_page()
# When the test fails
[ 130.811626] current = ffffffff884110c0
[ 130.811628] page: refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x8a9411
[ 130.811632] flags:
0x17ffffc0002000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
[ 130.811636] raw: 0017ffffc0002000 ffffdaba22a50448 ffffdaba22a50448
0000000000000000
[ 130.811639] raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
[ 130.811641] page dumped because: virt_to_head_page()
Best,
Hyeonggon
>
> Thanks,
> Namhyung
>
> > + }
> > +
> > + name = bpf_map_lookup_elem(&slab_hash, &s);
> > + if (name && !bpf_strncmp(name, 11, "task_struct"))
> > + found = 1;
> > + else
> > + found = -2;
> > +
> > + return 0;
> > +}
> > --
> > 2.46.1.824.gd892dcdcdd-goog
> >
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2)
2024-09-27 18:41 [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Namhyung Kim
` (2 preceding siblings ...)
2024-09-27 18:41 ` [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim
@ 2024-09-29 17:00 ` Alexei Starovoitov
2024-09-30 1:51 ` Namhyung Kim
3 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2024-09-29 17:00 UTC (permalink / raw)
To: Namhyung Kim
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
On Fri, Sep 27, 2024 at 11:41 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> I'm proposing a new iterator and a kfunc for the slab memory allocator
> to get information of each kmem_cache like in /proc/slabinfo or
> /sys/kernel/slab in more flexible way.
>
> v2 changes)
The subject is confusing CI and human readers.
Please use [PATCH v3 bpf-next ..] in the future.
Also note that RFC patches are never going to be applied and they are
ignored by BPF CI.
If you want things to land then drop the RFC tag.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator
2024-09-27 18:41 ` [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
@ 2024-09-29 17:04 ` Alexei Starovoitov
2024-09-30 2:08 ` Namhyung Kim
0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2024-09-29 17:04 UTC (permalink / raw)
To: Namhyung Kim
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
On Fri, Sep 27, 2024 at 11:41 AM Namhyung Kim <namhyung@kernel.org> wrote:
> +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> + loff_t cnt = 0;
> + struct kmem_cache *s = NULL;
> +
> + mutex_lock(&slab_mutex);
It would be better to find a way to iterate slabs without holding
the mutex for the duration of the loop.
Maybe use refcnt to hold the kmem_cache while bpf prog is looking at it?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
2024-09-27 18:41 ` [RFC/PATCH bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
@ 2024-09-29 17:05 ` Alexei Starovoitov
2024-09-30 2:09 ` Namhyung Kim
0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2024-09-29 17:05 UTC (permalink / raw)
To: Namhyung Kim
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
On Fri, Sep 27, 2024 at 11:41 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +{
> + struct slab *slab;
> +
> + slab = virt_to_slab((void *)(long)addr);
> + return slab ? slab->slab_cache : NULL;
> +}
I think this needs more safety guards on 'addr'.
It needs to check the valid range of 'addr' before doing virt_to_slab.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2)
2024-09-29 17:00 ` [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Alexei Starovoitov
@ 2024-09-30 1:51 ` Namhyung Kim
0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2024-09-30 1:51 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
Hello Alexei,
On Sun, Sep 29, 2024 at 10:00:56AM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 27, 2024 at 11:41 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > I'm proposing a new iterator and a kfunc for the slab memory allocator
> > to get information of each kmem_cache like in /proc/slabinfo or
> > /sys/kernel/slab in more flexible way.
> >
> > v2 changes)
>
> The subject is confusing CI and human readers.
> Please use [PATCH v3 bpf-next ..] in the future.
>
> Also note that RFC patches are never going to be applied and they are
> ignored by BPF CI.
> If you want things to land then drop the RFC tag.
Ok, I'll change the subject line in the next version.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator
2024-09-29 17:04 ` Alexei Starovoitov
@ 2024-09-30 2:08 ` Namhyung Kim
2024-10-01 18:23 ` Alexei Starovoitov
0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2024-09-30 2:08 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
On Sun, Sep 29, 2024 at 10:04:00AM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 27, 2024 at 11:41 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
> > +{
> > + loff_t cnt = 0;
> > + struct kmem_cache *s = NULL;
> > +
> > + mutex_lock(&slab_mutex);
>
> It would be better to find a way to iterate slabs without holding
> the mutex for the duration of the loop.
> Maybe use refcnt to hold the kmem_cache while bpf prog is looking at it?
Do you mean that you want to not hold slab_mutex while BPF program is
running? Maybe we can allocates an arary of pointers to the slab cahe
(with refcounts) at the beginning and iterate them instead. And call
kmem_cache_destroy() for each entry at the end. Is it ok to you?
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
2024-09-29 17:05 ` Alexei Starovoitov
@ 2024-09-30 2:09 ` Namhyung Kim
0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2024-09-30 2:09 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
On Sun, Sep 29, 2024 at 10:05:42AM -0700, Alexei Starovoitov wrote:
> On Fri, Sep 27, 2024 at 11:41 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > +{
> > + struct slab *slab;
> > +
> > + slab = virt_to_slab((void *)(long)addr);
> > + return slab ? slab->slab_cache : NULL;
> > +}
>
> I think this needs more safety guards on 'addr'.
> It needs to check the valid range of 'addr' before doing virt_to_slab.
Ok, I think we can use virt_addr_valid() for that.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
2024-09-29 14:27 ` Hyeonggon Yoo
@ 2024-09-30 2:18 ` Namhyung Kim
2024-09-30 3:24 ` Hyeonggon Yoo
0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2024-09-30 2:18 UTC (permalink / raw)
To: Hyeonggon Yoo
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
linux-mm, Arnaldo Carvalho de Melo
Hello Hyeonggon,
On Sun, Sep 29, 2024 at 11:27:25PM +0900, Hyeonggon Yoo wrote:
> On Sun, Sep 29, 2024 at 3:13 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Sep 27, 2024 at 11:41:33AM -0700, Namhyung Kim wrote:
> > > The test traverses all slab caches using the kmem_cache_iter and check
> > > if current task's pointer is from "task_struct" slab cache.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > > .../bpf/prog_tests/kmem_cache_iter.c | 64 ++++++++++++++++++
> > > tools/testing/selftests/bpf/progs/bpf_iter.h | 7 ++
> > > .../selftests/bpf/progs/kmem_cache_iter.c | 66 +++++++++++++++++++
> > > 3 files changed, 137 insertions(+)
> > > create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > > create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > > new file mode 100644
> > > index 0000000000000000..814bcc453e9f3ccd
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > > @@ -0,0 +1,64 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/* Copyright (c) 2024 Google */
> > > +
> > > +#include <test_progs.h>
> > > +#include <bpf/libbpf.h>
> > > +#include <bpf/btf.h>
> > > +#include "kmem_cache_iter.skel.h"
> > > +
> > > +static void test_kmem_cache_iter_check_task(struct kmem_cache_iter *skel)
> > > +{
> > > + LIBBPF_OPTS(bpf_test_run_opts, opts,
> > > + .flags = BPF_F_TEST_RUN_ON_CPU,
> > > + );
> > > + int prog_fd = bpf_program__fd(skel->progs.check_task_struct);
> > > +
> > > + /* get task_struct and check it if's from a slab cache */
> > > + bpf_prog_test_run_opts(prog_fd, &opts);
> > > +
> > > + /* the BPF program should set 'found' variable */
> > > + ASSERT_EQ(skel->bss->found, 1, "found task_struct");
> >
> > Hmm.. I'm seeing a failure with found being -1, which means ...
> >
> > > +}
> > > +
> > > +void test_kmem_cache_iter(void)
> > > +{
> > > + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > > + struct kmem_cache_iter *skel = NULL;
> > > + union bpf_iter_link_info linfo = {};
> > > + struct bpf_link *link;
> > > + char buf[1024];
> > > + int iter_fd;
> > > +
> > > + skel = kmem_cache_iter__open_and_load();
> > > + if (!ASSERT_OK_PTR(skel, "kmem_cache_iter__open_and_load"))
> > > + return;
> > > +
> > > + opts.link_info = &linfo;
> > > + opts.link_info_len = sizeof(linfo);
> > > +
> > > + link = bpf_program__attach_iter(skel->progs.slab_info_collector, &opts);
> > > + if (!ASSERT_OK_PTR(link, "attach_iter"))
> > > + goto destroy;
> > > +
> > > + iter_fd = bpf_iter_create(bpf_link__fd(link));
> > > + if (!ASSERT_GE(iter_fd, 0, "iter_create"))
> > > + goto free_link;
> > > +
> > > + memset(buf, 0, sizeof(buf));
> > > + while (read(iter_fd, buf, sizeof(buf) > 0)) {
> > > + /* read out all contents */
> > > + printf("%s", buf);
> > > + }
> > > +
> > > + /* next reads should return 0 */
> > > + ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read");
> > > +
> > > + test_kmem_cache_iter_check_task(skel);
> > > +
> > > + close(iter_fd);
> > > +
> > > +free_link:
> > > + bpf_link__destroy(link);
> > > +destroy:
> > > + kmem_cache_iter__destroy(skel);
> > > +}
> > > diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
> > > index c41ee80533ca219a..3305dc3a74b32481 100644
> > > --- a/tools/testing/selftests/bpf/progs/bpf_iter.h
> > > +++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
> > > @@ -24,6 +24,7 @@
> > > #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
> > > #define BTF_F_ZERO BTF_F_ZERO___not_used
> > > #define bpf_iter__ksym bpf_iter__ksym___not_used
> > > +#define bpf_iter__kmem_cache bpf_iter__kmem_cache___not_used
> > > #include "vmlinux.h"
> > > #undef bpf_iter_meta
> > > #undef bpf_iter__bpf_map
> > > @@ -48,6 +49,7 @@
> > > #undef BTF_F_PTR_RAW
> > > #undef BTF_F_ZERO
> > > #undef bpf_iter__ksym
> > > +#undef bpf_iter__kmem_cache
> > >
> > > struct bpf_iter_meta {
> > > struct seq_file *seq;
> > > @@ -165,3 +167,8 @@ struct bpf_iter__ksym {
> > > struct bpf_iter_meta *meta;
> > > struct kallsym_iter *ksym;
> > > };
> > > +
> > > +struct bpf_iter__kmem_cache {
> > > + struct bpf_iter_meta *meta;
> > > + struct kmem_cache *s;
> > > +} __attribute__((preserve_access_index));
> > > diff --git a/tools/testing/selftests/bpf/progs/kmem_cache_iter.c b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > > new file mode 100644
> > > index 0000000000000000..3f6ec15a1bf6344c
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > > @@ -0,0 +1,66 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/* Copyright (c) 2024 Google */
> > > +
> > > +#include "bpf_iter.h"
> > > +#include <bpf/bpf_helpers.h>
> > > +#include <bpf/bpf_tracing.h>
> > > +
> > > +char _license[] SEC("license") = "GPL";
> > > +
> > > +#define SLAB_NAME_MAX 256
> > > +
> > > +struct {
> > > + __uint(type, BPF_MAP_TYPE_HASH);
> > > + __uint(key_size, sizeof(void *));
> > > + __uint(value_size, SLAB_NAME_MAX);
> > > + __uint(max_entries, 1024);
> > > +} slab_hash SEC(".maps");
> > > +
> > > +extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
> > > +
> > > +/* result, will be checked by userspace */
> > > +int found;
> > > +
> > > +SEC("iter/kmem_cache")
> > > +int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
> > > +{
> > > + struct seq_file *seq = ctx->meta->seq;
> > > + struct kmem_cache *s = ctx->s;
> > > +
> > > + if (s) {
> > > + char name[SLAB_NAME_MAX];
> > > +
> > > + /*
> > > + * To make sure if the slab_iter implements the seq interface
> > > + * properly and it's also useful for debugging.
> > > + */
> > > + BPF_SEQ_PRINTF(seq, "%s: %u\n", s->name, s->object_size);
> > > +
> > > + bpf_probe_read_kernel_str(name, sizeof(name), s->name);
> > > + bpf_map_update_elem(&slab_hash, &s, name, BPF_NOEXIST);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +SEC("raw_tp/bpf_test_finish")
> > > +int BPF_PROG(check_task_struct)
> > > +{
> > > + __u64 curr = bpf_get_current_task();
> > > + struct kmem_cache *s;
> > > + char *name;
> > > +
> > > + s = bpf_get_kmem_cache(curr);
> > > + if (s == NULL) {
> > > + found = -1;
> > > + return 0;
> >
> > ... it cannot find a kmem_cache for the current task. This program is
> > run by bpf_prog_test_run_opts() with BPF_F_TEST_RUN_ON_CPU. So I think
> > the curr should point a task_struct in a slab cache.
> >
> > Am I missing something?
>
> Hi Namhyung,
>
> Out of curiosity I've been investigating this issue on my machine and
> running some experiments.
Thanks a lot for looking at this!
>
> When the test fails, calling dump_page() for the page the task_struct
> belongs to,
> shows that the page does not have the PGTY_slab flag set which is why
> virt_to_slab(current) returns NULL.
>
> Does the test always fails on your environment? On my machine, the
> test passed sometimes but failed some times.
I'm using vmtest.sh but it succeeded mostly. I thought I couldn't
reproduce it locally, but I also see the failure sometimes. I'll take a
deeper look.
>
> Maybe sometimes the value returned by 'current' macro belongs to a
> slab, but sometimes it does not.
> But that doesn't really make sense to me as IIUC task_struct
> descriptors are allocated from slab.
AFAIK the notable exception is the init_task which lives in the kernel
data. I'm not sure the if the test is running by PID 1.
>
> ....Or maybe some code can overwrote the page_type field of a slab?
> Hmm, it seems we need more information to identify what's gone wrong.
I doubt it's the case, but who knows? :)
>
> Just FYI, adding the output of the following code snippet in
> bpf_get_kmem_cache():
>
> pr_info("current = %llx\n", (unsigned long long)current);
> dump_page(virt_to_head_page(current), "virt_to_head_page()");
Thanks, I'll try this in my test too.
Namhyung
>
> # When the test passes
> [ 232.755028] current = ffff8ff5b9ebd200
> [ 232.755031] page: refcount:1 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0x139eb8
> [ 232.755033] head: order:3 mapcount:0 entire_mapcount:0
> nr_pages_mapped:0 pincount:0
> [ 232.755035] memcg:ffff8ff5b3ee0c01
> [ 232.755037] ksm flags:
> 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
> [ 232.755040] page_type: f5(slab)
> [ 232.755042] raw: 0017ffffc0000040 ffff8ff58028ab00 ffffdaba05b8fc00
> dead000000000003
> [ 232.755045] raw: 0000000000000000 0000000000030003 00000001f5000000
> ffff8ff5b3ee0c01
> [ 232.755047] head: 0017ffffc0000040 ffff8ff58028ab00
> ffffdaba05b8fc00 dead000000000003
> [ 232.755048] head: 0000000000000000 0000000000030003
> 00000001f5000000 ffff8ff5b3ee0c01
> [ 232.755050] head: 0017ffffc0000003 ffffdaba04e7ae01
> ffffffffffffffff 0000000000000000
> [ 232.755052] head: 0000000000000008 0000000000000000
> 00000000ffffffff 0000000000000000
> [ 232.755053] page dumped because: virt_to_head_page()
>
> # When the test fails
> [ 130.811626] current = ffffffff884110c0
> [ 130.811628] page: refcount:1 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0x8a9411
> [ 130.811632] flags:
> 0x17ffffc0002000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
> [ 130.811636] raw: 0017ffffc0002000 ffffdaba22a50448 ffffdaba22a50448
> 0000000000000000
> [ 130.811639] raw: 0000000000000000 0000000000000000 00000001ffffffff
> 0000000000000000
> [ 130.811641] page dumped because: virt_to_head_page()
>
> Best,
> Hyeonggon
>
> >
> > Thanks,
> > Namhyung
> >
> > > + }
> > > +
> > > + name = bpf_map_lookup_elem(&slab_hash, &s);
> > > + if (name && !bpf_strncmp(name, 11, "task_struct"))
> > > + found = 1;
> > > + else
> > > + found = -2;
> > > +
> > > + return 0;
> > > +}
> > > --
> > > 2.46.1.824.gd892dcdcdd-goog
> > >
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
2024-09-30 2:18 ` Namhyung Kim
@ 2024-09-30 3:24 ` Hyeonggon Yoo
2024-09-30 4:33 ` Namhyung Kim
0 siblings, 1 reply; 17+ messages in thread
From: Hyeonggon Yoo @ 2024-09-30 3:24 UTC (permalink / raw)
To: Namhyung Kim
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
linux-mm, Arnaldo Carvalho de Melo
On Mon, Sep 30, 2024 at 11:18 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello Hyeonggon,
>
> On Sun, Sep 29, 2024 at 11:27:25PM +0900, Hyeonggon Yoo wrote:
> > On Sun, Sep 29, 2024 at 3:13 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Fri, Sep 27, 2024 at 11:41:33AM -0700, Namhyung Kim wrote:
> > > > The test traverses all slab caches using the kmem_cache_iter and check
> > > > if current task's pointer is from "task_struct" slab cache.
> > > >
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > > .../bpf/prog_tests/kmem_cache_iter.c | 64 ++++++++++++++++++
> > > > tools/testing/selftests/bpf/progs/bpf_iter.h | 7 ++
> > > > .../selftests/bpf/progs/kmem_cache_iter.c | 66 +++++++++++++++++++
> > > > 3 files changed, 137 insertions(+)
> > > > create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > > > create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > > > new file mode 100644
> > > > index 0000000000000000..814bcc453e9f3ccd
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
> > > > @@ -0,0 +1,64 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/* Copyright (c) 2024 Google */
> > > > +
> > > > +#include <test_progs.h>
> > > > +#include <bpf/libbpf.h>
> > > > +#include <bpf/btf.h>
> > > > +#include "kmem_cache_iter.skel.h"
> > > > +
> > > > +static void test_kmem_cache_iter_check_task(struct kmem_cache_iter *skel)
> > > > +{
> > > > + LIBBPF_OPTS(bpf_test_run_opts, opts,
> > > > + .flags = BPF_F_TEST_RUN_ON_CPU,
> > > > + );
> > > > + int prog_fd = bpf_program__fd(skel->progs.check_task_struct);
> > > > +
> > > > + /* get task_struct and check it if's from a slab cache */
> > > > + bpf_prog_test_run_opts(prog_fd, &opts);
> > > > +
> > > > + /* the BPF program should set 'found' variable */
> > > > + ASSERT_EQ(skel->bss->found, 1, "found task_struct");
> > >
> > > Hmm.. I'm seeing a failure with found being -1, which means ...
> > >
> > > > +}
> > > > +
> > > > +void test_kmem_cache_iter(void)
> > > > +{
> > > > + DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
> > > > + struct kmem_cache_iter *skel = NULL;
> > > > + union bpf_iter_link_info linfo = {};
> > > > + struct bpf_link *link;
> > > > + char buf[1024];
> > > > + int iter_fd;
> > > > +
> > > > + skel = kmem_cache_iter__open_and_load();
> > > > + if (!ASSERT_OK_PTR(skel, "kmem_cache_iter__open_and_load"))
> > > > + return;
> > > > +
> > > > + opts.link_info = &linfo;
> > > > + opts.link_info_len = sizeof(linfo);
> > > > +
> > > > + link = bpf_program__attach_iter(skel->progs.slab_info_collector, &opts);
> > > > + if (!ASSERT_OK_PTR(link, "attach_iter"))
> > > > + goto destroy;
> > > > +
> > > > + iter_fd = bpf_iter_create(bpf_link__fd(link));
> > > > + if (!ASSERT_GE(iter_fd, 0, "iter_create"))
> > > > + goto free_link;
> > > > +
> > > > + memset(buf, 0, sizeof(buf));
> > > > + while (read(iter_fd, buf, sizeof(buf) > 0)) {
> > > > + /* read out all contents */
> > > > + printf("%s", buf);
> > > > + }
> > > > +
> > > > + /* next reads should return 0 */
> > > > + ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read");
> > > > +
> > > > + test_kmem_cache_iter_check_task(skel);
> > > > +
> > > > + close(iter_fd);
> > > > +
> > > > +free_link:
> > > > + bpf_link__destroy(link);
> > > > +destroy:
> > > > + kmem_cache_iter__destroy(skel);
> > > > +}
> > > > diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
> > > > index c41ee80533ca219a..3305dc3a74b32481 100644
> > > > --- a/tools/testing/selftests/bpf/progs/bpf_iter.h
> > > > +++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
> > > > @@ -24,6 +24,7 @@
> > > > #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
> > > > #define BTF_F_ZERO BTF_F_ZERO___not_used
> > > > #define bpf_iter__ksym bpf_iter__ksym___not_used
> > > > +#define bpf_iter__kmem_cache bpf_iter__kmem_cache___not_used
> > > > #include "vmlinux.h"
> > > > #undef bpf_iter_meta
> > > > #undef bpf_iter__bpf_map
> > > > @@ -48,6 +49,7 @@
> > > > #undef BTF_F_PTR_RAW
> > > > #undef BTF_F_ZERO
> > > > #undef bpf_iter__ksym
> > > > +#undef bpf_iter__kmem_cache
> > > >
> > > > struct bpf_iter_meta {
> > > > struct seq_file *seq;
> > > > @@ -165,3 +167,8 @@ struct bpf_iter__ksym {
> > > > struct bpf_iter_meta *meta;
> > > > struct kallsym_iter *ksym;
> > > > };
> > > > +
> > > > +struct bpf_iter__kmem_cache {
> > > > + struct bpf_iter_meta *meta;
> > > > + struct kmem_cache *s;
> > > > +} __attribute__((preserve_access_index));
> > > > diff --git a/tools/testing/selftests/bpf/progs/kmem_cache_iter.c b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > > > new file mode 100644
> > > > index 0000000000000000..3f6ec15a1bf6344c
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> > > > @@ -0,0 +1,66 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/* Copyright (c) 2024 Google */
> > > > +
> > > > +#include "bpf_iter.h"
> > > > +#include <bpf/bpf_helpers.h>
> > > > +#include <bpf/bpf_tracing.h>
> > > > +
> > > > +char _license[] SEC("license") = "GPL";
> > > > +
> > > > +#define SLAB_NAME_MAX 256
> > > > +
> > > > +struct {
> > > > + __uint(type, BPF_MAP_TYPE_HASH);
> > > > + __uint(key_size, sizeof(void *));
> > > > + __uint(value_size, SLAB_NAME_MAX);
> > > > + __uint(max_entries, 1024);
> > > > +} slab_hash SEC(".maps");
> > > > +
> > > > +extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
> > > > +
> > > > +/* result, will be checked by userspace */
> > > > +int found;
> > > > +
> > > > +SEC("iter/kmem_cache")
> > > > +int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
> > > > +{
> > > > + struct seq_file *seq = ctx->meta->seq;
> > > > + struct kmem_cache *s = ctx->s;
> > > > +
> > > > + if (s) {
> > > > + char name[SLAB_NAME_MAX];
> > > > +
> > > > + /*
> > > > + * To make sure if the slab_iter implements the seq interface
> > > > + * properly and it's also useful for debugging.
> > > > + */
> > > > + BPF_SEQ_PRINTF(seq, "%s: %u\n", s->name, s->object_size);
> > > > +
> > > > + bpf_probe_read_kernel_str(name, sizeof(name), s->name);
> > > > + bpf_map_update_elem(&slab_hash, &s, name, BPF_NOEXIST);
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +SEC("raw_tp/bpf_test_finish")
> > > > +int BPF_PROG(check_task_struct)
> > > > +{
> > > > + __u64 curr = bpf_get_current_task();
> > > > + struct kmem_cache *s;
> > > > + char *name;
> > > > +
> > > > + s = bpf_get_kmem_cache(curr);
> > > > + if (s == NULL) {
> > > > + found = -1;
> > > > + return 0;
> > >
> > > ... it cannot find a kmem_cache for the current task. This program is
> > > run by bpf_prog_test_run_opts() with BPF_F_TEST_RUN_ON_CPU. So I think
> > > the curr should point a task_struct in a slab cache.
> > >
> > > Am I missing something?
> >
> > Hi Namhyung,
> >
> > Out of curiosity I've been investigating this issue on my machine and
> > running some experiments.
>
> Thanks a lot for looking at this!
>
> >
> > When the test fails, calling dump_page() for the page the task_struct
> > belongs to,
> > shows that the page does not have the PGTY_slab flag set which is why
> > virt_to_slab(current) returns NULL.
> >
> > Does the test always fails on your environment? On my machine, the
> > test passed sometimes but failed some times.
>
> I'm using vmtest.sh but it succeeded mostly. I thought I couldn't
> reproduce it locally, but I also see the failure sometimes. I'll take a
> deeper look.
>
> >
> > Maybe sometimes the value returned by 'current' macro belongs to a
> > slab, but sometimes it does not.
> > But that doesn't really make sense to me as IIUC task_struct
> > descriptors are allocated from slab.
>
> AFAIK the notable exception is the init_task which lives in the kernel
> data. I'm not sure the if the test is running by PID 1.
I checked that the test is running under PID 0 (swapper) when it fails and
non-0 PID when it succeeds. This makes sense as the task_struct for PID 0
should be in the kernel image area, not in a slab.
Phew, fortunately, it's not a bug! :)
Any plans on how to adjust the test program?
Best,
Hyeonggon
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
2024-09-30 3:24 ` Hyeonggon Yoo
@ 2024-09-30 4:33 ` Namhyung Kim
2024-09-30 17:48 ` Namhyung Kim
0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2024-09-30 4:33 UTC (permalink / raw)
To: Hyeonggon Yoo
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
linux-mm, Arnaldo Carvalho de Melo
On Mon, Sep 30, 2024 at 12:24:52PM +0900, Hyeonggon Yoo wrote:
> On Mon, Sep 30, 2024 at 11:18 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello Hyeonggon,
> >
> > On Sun, Sep 29, 2024 at 11:27:25PM +0900, Hyeonggon Yoo wrote:
> > > On Sun, Sep 29, 2024 at 3:13 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > +SEC("raw_tp/bpf_test_finish")
> > > > > +int BPF_PROG(check_task_struct)
> > > > > +{
> > > > > + __u64 curr = bpf_get_current_task();
> > > > > + struct kmem_cache *s;
> > > > > + char *name;
> > > > > +
> > > > > + s = bpf_get_kmem_cache(curr);
> > > > > + if (s == NULL) {
> > > > > + found = -1;
> > > > > + return 0;
> > > >
> > > > ... it cannot find a kmem_cache for the current task. This program is
> > > > run by bpf_prog_test_run_opts() with BPF_F_TEST_RUN_ON_CPU. So I think
> > > > the curr should point a task_struct in a slab cache.
> > > >
> > > > Am I missing something?
> > >
> > > Hi Namhyung,
> > >
> > > Out of curiosity I've been investigating this issue on my machine and
> > > running some experiments.
> >
> > Thanks a lot for looking at this!
> >
> > >
> > > When the test fails, calling dump_page() for the page the task_struct
> > > belongs to,
> > > shows that the page does not have the PGTY_slab flag set which is why
> > > virt_to_slab(current) returns NULL.
> > >
> > > Does the test always fails on your environment? On my machine, the
> > > test passed sometimes but failed some times.
> >
> > I'm using vmtest.sh but it succeeded mostly. I thought I couldn't
> > reproduce it locally, but I also see the failure sometimes. I'll take a
> > deeper look.
> >
> > >
> > > Maybe sometimes the value returned by 'current' macro belongs to a
> > > slab, but sometimes it does not.
> > > But that doesn't really make sense to me as IIUC task_struct
> > > descriptors are allocated from slab.
> >
> > AFAIK the notable exception is the init_task which lives in the kernel
> > data. I'm not sure the if the test is running by PID 1.
>
> I checked that the test is running under PID 0 (swapper) when it fails and
> non-0 PID when it succeeds. This makes sense as the task_struct for PID 0
> should be in the kernel image area, not in a slab.
>
> Phew, fortunately, it's not a bug! :)
Thanks for the test, I've seen the same now.
>
> Any plans on how to adjust the test program?
I thought the test runs in a separate task. I'll think about how to
test this more reliably.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
2024-09-30 4:33 ` Namhyung Kim
@ 2024-09-30 17:48 ` Namhyung Kim
0 siblings, 0 replies; 17+ messages in thread
From: Namhyung Kim @ 2024-09-30 17:48 UTC (permalink / raw)
To: Hyeonggon Yoo
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
linux-mm, Arnaldo Carvalho de Melo
On Sun, Sep 29, 2024 at 09:33:05PM -0700, Namhyung Kim wrote:
> On Mon, Sep 30, 2024 at 12:24:52PM +0900, Hyeonggon Yoo wrote:
> > On Mon, Sep 30, 2024 at 11:18 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hello Hyeonggon,
> > >
> > > On Sun, Sep 29, 2024 at 11:27:25PM +0900, Hyeonggon Yoo wrote:
> > > > On Sun, Sep 29, 2024 at 3:13 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > +SEC("raw_tp/bpf_test_finish")
> > > > > > +int BPF_PROG(check_task_struct)
> > > > > > +{
> > > > > > + __u64 curr = bpf_get_current_task();
> > > > > > + struct kmem_cache *s;
> > > > > > + char *name;
> > > > > > +
> > > > > > + s = bpf_get_kmem_cache(curr);
> > > > > > + if (s == NULL) {
> > > > > > + found = -1;
> > > > > > + return 0;
> > > > >
> > > > > ... it cannot find a kmem_cache for the current task. This program is
> > > > > run by bpf_prog_test_run_opts() with BPF_F_TEST_RUN_ON_CPU. So I think
> > > > > the curr should point a task_struct in a slab cache.
> > > > >
> > > > > Am I missing something?
> > > >
> > > > Hi Namhyung,
> > > >
> > > > Out of curiosity I've been investigating this issue on my machine and
> > > > running some experiments.
> > >
> > > Thanks a lot for looking at this!
> > >
> > > >
> > > > When the test fails, calling dump_page() for the page the task_struct
> > > > belongs to,
> > > > shows that the page does not have the PGTY_slab flag set which is why
> > > > virt_to_slab(current) returns NULL.
> > > >
> > > > Does the test always fails on your environment? On my machine, the
> > > > test passed sometimes but failed some times.
> > >
> > > I'm using vmtest.sh but it succeeded mostly. I thought I couldn't
> > > reproduce it locally, but I also see the failure sometimes. I'll take a
> > > deeper look.
> > >
> > > >
> > > > Maybe sometimes the value returned by 'current' macro belongs to a
> > > > slab, but sometimes it does not.
> > > > But that doesn't really make sense to me as IIUC task_struct
> > > > descriptors are allocated from slab.
> > >
> > > AFAIK the notable exception is the init_task which lives in the kernel
> > > data. I'm not sure the if the test is running by PID 1.
> >
> > I checked that the test is running under PID 0 (swapper) when it fails and
> > non-0 PID when it succeeds. This makes sense as the task_struct for PID 0
> > should be in the kernel image area, not in a slab.
> >
> > Phew, fortunately, it's not a bug! :)
>
> Thanks for the test, I've seen the same now.
>
> >
> > Any plans on how to adjust the test program?
>
> I thought the test runs in a separate task. I'll think about how to
> test this more reliably.
Oh, I think BPF_F_TEST_RUN_ON_CPU was the problem since it requires to
run the test on the given CPU (cpu0 in this case). If the cpu0 was
idle, it would fail like this. I think removing the flag will run the
test on the current CPU so it won't get the swapper task anymore.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator
2024-09-30 2:08 ` Namhyung Kim
@ 2024-10-01 18:23 ` Alexei Starovoitov
0 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2024-10-01 18:23 UTC (permalink / raw)
To: Namhyung Kim
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo
On Sun, Sep 29, 2024 at 7:08 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Sun, Sep 29, 2024 at 10:04:00AM -0700, Alexei Starovoitov wrote:
> > On Fri, Sep 27, 2024 at 11:41 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
> > > +{
> > > + loff_t cnt = 0;
> > > + struct kmem_cache *s = NULL;
> > > +
> > > + mutex_lock(&slab_mutex);
> >
> > It would be better to find a way to iterate slabs without holding
> > the mutex for the duration of the loop.
> > Maybe use refcnt to hold the kmem_cache while bpf prog is looking at it?
>
> Do you mean that you want to not hold slab_mutex while BPF program is
> running?
yes.
> Maybe we can allocates an arary of pointers to the slab cahe
> (with refcounts) at the beginning and iterate them instead. And call
> kmem_cache_destroy() for each entry at the end. Is it ok to you?
That doesn't sound efficient.
Just grab a refcnt on kmem_cache before running the prog ?
Drop refcnt, and grab a mutex again to do a next step.
kmem_cache_iter_seq_next() will be running with mutex held, of course.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-10-01 18:24 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-27 18:41 [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Namhyung Kim
2024-09-27 18:41 ` [RFC/PATCH bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
2024-09-29 17:04 ` Alexei Starovoitov
2024-09-30 2:08 ` Namhyung Kim
2024-10-01 18:23 ` Alexei Starovoitov
2024-09-27 18:41 ` [RFC/PATCH bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
2024-09-29 17:05 ` Alexei Starovoitov
2024-09-30 2:09 ` Namhyung Kim
2024-09-27 18:41 ` [RFC/PATCH bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim
2024-09-29 6:13 ` Namhyung Kim
2024-09-29 14:27 ` Hyeonggon Yoo
2024-09-30 2:18 ` Namhyung Kim
2024-09-30 3:24 ` Hyeonggon Yoo
2024-09-30 4:33 ` Namhyung Kim
2024-09-30 17:48 ` Namhyung Kim
2024-09-29 17:00 ` [RFC/PATCH bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc (v2) Alexei Starovoitov
2024-09-30 1:51 ` Namhyung Kim
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox