[PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc
@ 2024-10-02 18:09 Namhyung Kim
  2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-02 18:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

Hello,

I'm proposing a new iterator and a kfunc for the slab memory allocator
to get information of each kmem_cache like in /proc/slabinfo or
/sys/kernel/slab in more flexible way.

v4 changes)

 * skip kmem_cache_destroy() in kmem_cache_iter_seq_stop() if possible  (Vlastimil)
 * fix a bug in the kmem_cache_iter_seq_start() for the last entry

v3: https://lore.kernel.org/lkml/20241002065456.1580143-1-namhyung@kernel.org/

 * rework kmem_cache_iter not to hold slab_mutex when running BPF  (Alexei)
 * add virt_addr_valid() check  (Alexei)
 * fix random test failure by running test with the current task  (Hyeonggon)

v2: https://lore.kernel.org/lkml/20240927184133.968283-1-namhyung@kernel.org/

 * rename it to "kmem_cache_iter"
 * fix a build issue
 * add Acked-by's from Roman and Vlastimil (Thanks!)
 * add error codes in the test for debugging

v1: https://lore.kernel.org/lkml/20240925223023.735947-1-namhyung@kernel.org/

My use case is `perf lock contention` tool which shows contended locks
but many of them are not global locks and don't have symbols.  If it
can tranlate the address of the lock in a slab object to the name of
the slab, it'd be much more useful.

I'm not aware of type information in slab yet, but I was told there's
a work to associate BTF ID with it.  It'd be definitely helpful to my
use case.  Probably we need another kfunc to get the start address of
the object or the offset in the object from an address if the type
info is available.  But I want to start with a simple thing first.

The kmem_cache_iter iterates kmem_cache objects under slab_mutex and
will be useful for userspace to prepare some work for specific slabs
like setting up filters in advance.  And the bpf_get_kmem_cache()
kfunc will return a pointer to a slab from the address of a lock.  And
the test code is to read from the iterator and make sure it finds a
slab cache of the task_struct for the current task.

The code is available at 'bpf/slab-iter-v4' branch in
https://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung

Namhyung Kim (3):
  bpf: Add kmem_cache iterator
  mm/bpf: Add bpf_get_kmem_cache() kfunc
  selftests/bpf: Add a test for kmem_cache_iter

 include/linux/btf_ids.h                       |   1 +
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/helpers.c                          |   1 +
 kernel/bpf/kmem_cache_iter.c                  | 174 ++++++++++++++++++
 mm/slab_common.c                              |  19 ++
 .../bpf/prog_tests/kmem_cache_iter.c          |  64 +++++++
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   7 +
 .../selftests/bpf/progs/kmem_cache_iter.c     |  66 +++++++
 8 files changed, 333 insertions(+)
 create mode 100644 kernel/bpf/kmem_cache_iter.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c

base-commit: 9502a7de5a61bec3bda841a830560c5d6d40ecac
-- 
2.46.1.824.gd892dcdcdd-goog

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-02 18:09 [PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc Namhyung Kim
@ 2024-10-02 18:09 ` Namhyung Kim
  2024-10-03  7:35   ` Vlastimil Babka
                     ` (2 more replies)
  2024-10-02 18:09 ` [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
  2024-10-02 18:09 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim
  2 siblings, 3 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-02 18:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

The new "kmem_cache" iterator will traverse the list of slab caches
and call attached BPF programs for each entry.  It should check the
argument (ctx.s) if it's NULL before using it.

Now the iteration grabs the slab_mutex only if it traverse the list and
releases the mutex when it runs the BPF program.  The kmem_cache entry
is protected by a refcount during the execution.

It includes the internal "mm/slab.h" header to access kmem_cache,
slab_caches and slab_mutex.  Hope it's ok to mm folks.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
I've removed the Acked-by's from Roman and Vlastimil since it's changed
not to hold the slab_mutex and to manage the refcount.  Please review
this change again!

 include/linux/btf_ids.h      |   1 +
 kernel/bpf/Makefile          |   1 +
 kernel/bpf/kmem_cache_iter.c | 174 +++++++++++++++++++++++++++++++++++
 3 files changed, 176 insertions(+)
 create mode 100644 kernel/bpf/kmem_cache_iter.c

diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h
index c0e3e1426a82f5c4..139bdececdcfaefb 100644
--- a/include/linux/btf_ids.h
+++ b/include/linux/btf_ids.h
@@ -283,5 +283,6 @@ extern u32 btf_tracing_ids[];
 extern u32 bpf_cgroup_btf_id[];
 extern u32 bpf_local_storage_map_btf_id[];
 extern u32 btf_bpf_map_id[];
+extern u32 bpf_kmem_cache_btf_id[];
 
 #endif
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 9b9c151b5c826b31..105328f0b9c04e37 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -52,3 +52,4 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
+obj-$(CONFIG_BPF_SYSCALL) += kmem_cache_iter.o
diff --git a/kernel/bpf/kmem_cache_iter.c b/kernel/bpf/kmem_cache_iter.c
new file mode 100644
index 0000000000000000..e103d25175126ab0
--- /dev/null
+++ b/kernel/bpf/kmem_cache_iter.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2024 Google */
+#include <linux/bpf.h>
+#include <linux/btf_ids.h>
+#include <linux/slab.h>
+#include <linux/kernel.h>
+#include <linux/seq_file.h>
+
+#include "../../mm/slab.h" /* kmem_cache, slab_caches and slab_mutex */
+
+struct bpf_iter__kmem_cache {
+	__bpf_md_ptr(struct bpf_iter_meta *, meta);
+	__bpf_md_ptr(struct kmem_cache *, s);
+};
+
+static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	loff_t cnt = 0;
+	bool found = false;
+	struct kmem_cache *s;
+
+	mutex_lock(&slab_mutex);
+
+	/*
+	 * Find an entry at the given position in the slab_caches list instead
+	 * of keeping a reference (of the last visited entry, if any) out of
+	 * slab_mutex. It might miss something if one is deleted in the middle
+	 * while it releases the lock.  But it should be rare and there's not
+	 * much we can do about it.
+	 */
+	list_for_each_entry(s, &slab_caches, list) {
+		if (cnt == *pos) {
+			/*
+			 * Make sure this entry remains in the list by getting
+			 * a new reference count.  Note that boot_cache entries
+			 * have a negative refcount, so don't touch them.
+			 */
+			if (s->refcount > 0)
+				s->refcount++;
+			found = true;
+			break;
+		}
+		cnt++;
+	}
+	mutex_unlock(&slab_mutex);
+
+	if (!found)
+		return NULL;
+
+	++*pos;
+	return s;
+}
+
+static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
+{
+	struct bpf_iter_meta meta;
+	struct bpf_iter__kmem_cache ctx = {
+		.meta = &meta,
+		.s = v,
+	};
+	struct bpf_prog *prog;
+	bool destroy = false;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, true);
+	if (prog)
+		bpf_iter_run_prog(prog, &ctx);
+
+	if (ctx.s == NULL)
+		return;
+
+	mutex_lock(&slab_mutex);
+
+	/* Skip kmem_cache_destroy() for active entries */
+	if (ctx.s->refcount > 1)
+		ctx.s->refcount--;
+	else if (ctx.s->refcount == 1)
+		destroy = true;
+
+	mutex_unlock(&slab_mutex);
+
+	if (destroy)
+		kmem_cache_destroy(ctx.s);
+}
+
+static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct kmem_cache *s = v;
+	struct kmem_cache *next = NULL;
+	bool destroy = false;
+
+	++*pos;
+
+	mutex_lock(&slab_mutex);
+
+	if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
+		next = list_next_entry(s, list);
+		if (next->refcount > 0)
+			next->refcount++;
+	}
+
+	/* Skip kmem_cache_destroy() for active entries */
+	if (s->refcount > 1)
+		s->refcount--;
+	else if (s->refcount == 1)
+		destroy = true;
+
+	mutex_unlock(&slab_mutex);
+
+	if (destroy)
+		kmem_cache_destroy(s);
+
+	return next;
+}
+
+static int kmem_cache_iter_seq_show(struct seq_file *seq, void *v)
+{
+	struct bpf_iter_meta meta;
+	struct bpf_iter__kmem_cache ctx = {
+		.meta = &meta,
+		.s = v,
+	};
+	struct bpf_prog *prog;
+	int ret = 0;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, false);
+	if (prog)
+		ret = bpf_iter_run_prog(prog, &ctx);
+
+	return ret;
+}
+
+static const struct seq_operations kmem_cache_iter_seq_ops = {
+	.start  = kmem_cache_iter_seq_start,
+	.next   = kmem_cache_iter_seq_next,
+	.stop   = kmem_cache_iter_seq_stop,
+	.show   = kmem_cache_iter_seq_show,
+};
+
+BTF_ID_LIST_GLOBAL_SINGLE(bpf_kmem_cache_btf_id, struct, kmem_cache)
+
+static const struct bpf_iter_seq_info kmem_cache_iter_seq_info = {
+	.seq_ops		= &kmem_cache_iter_seq_ops,
+};
+
+static void bpf_iter_kmem_cache_show_fdinfo(const struct bpf_iter_aux_info *aux,
+					    struct seq_file *seq)
+{
+	seq_puts(seq, "kmem_cache iter\n");
+}
+
+DEFINE_BPF_ITER_FUNC(kmem_cache, struct bpf_iter_meta *meta,
+		     struct kmem_cache *s)
+
+static struct bpf_iter_reg bpf_kmem_cache_reg_info = {
+	.target			= "kmem_cache",
+	.feature		= BPF_ITER_RESCHED,
+	.show_fdinfo		= bpf_iter_kmem_cache_show_fdinfo,
+	.ctx_arg_info_size	= 1,
+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_iter__kmem_cache, s),
+		  PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
+	},
+	.seq_info		= &kmem_cache_iter_seq_info,
+};
+
+static int __init bpf_kmem_cache_iter_init(void)
+{
+	bpf_kmem_cache_reg_info.ctx_arg_info[0].btf_id = bpf_kmem_cache_btf_id[0];
+	return bpf_iter_reg_target(&bpf_kmem_cache_reg_info);
+}
+
+late_initcall(bpf_kmem_cache_iter_init);
-- 
2.46.1.824.gd892dcdcdd-goog



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
@ 2024-10-03  7:35   ` Vlastimil Babka
  2024-10-04 20:33   ` Song Liu
  2024-10-04 20:45   ` Song Liu
  2 siblings, 0 replies; 26+ messages in thread
From: Vlastimil Babka @ 2024-10-03  7:35 UTC (permalink / raw)
  To: Namhyung Kim, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Roman Gushchin, Hyeonggon Yoo,
	linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On 10/2/24 20:09, Namhyung Kim wrote:
> The new "kmem_cache" iterator will traverse the list of slab caches
> and call attached BPF programs for each entry.  It should check the
> argument (ctx.s) if it's NULL before using it.
> 
> Now the iteration grabs the slab_mutex only if it traverse the list and
> releases the mutex when it runs the BPF program.  The kmem_cache entry
> is protected by a refcount during the execution.
> 
> It includes the internal "mm/slab.h" header to access kmem_cache,
> slab_caches and slab_mutex.  Hope it's ok to mm folks.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
  2024-10-03  7:35   ` Vlastimil Babka
@ 2024-10-04 20:33   ` Song Liu
  2024-10-04 21:37     ` Namhyung Kim
  2024-10-04 20:45   ` Song Liu
  2 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-10-04 20:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
[...]
> +
> +       mutex_lock(&slab_mutex);
> +
> +       /*
> +        * Find an entry at the given position in the slab_caches list instead

Nit: style of multi-line comment: "/* Find ...".

> +        * of keeping a reference (of the last visited entry, if any) out of
> +        * slab_mutex. It might miss something if one is deleted in the middle
> +        * while it releases the lock.  But it should be rare and there's not
> +        * much we can do about it.
> +        */
> +       list_for_each_entry(s, &slab_caches, list) {
> +               if (cnt == *pos) {
> +                       /*
> +                        * Make sure this entry remains in the list by getting
> +                        * a new reference count.  Note that boot_cache entries
> +                        * have a negative refcount, so don't touch them.
> +                        */
> +                       if (s->refcount > 0)
> +                               s->refcount++;
> +                       found = true;
> +                       break;
> +               }
> +               cnt++;
> +       }
> +       mutex_unlock(&slab_mutex);
> +
> +       if (!found)
> +               return NULL;
> +
> +       ++*pos;
> +       return s;
> +}
> +
> +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
> +{
> +       struct bpf_iter_meta meta;
> +       struct bpf_iter__kmem_cache ctx = {
> +               .meta = &meta,
> +               .s = v,
> +       };
> +       struct bpf_prog *prog;
> +       bool destroy = false;
> +
> +       meta.seq = seq;
> +       prog = bpf_iter_get_info(&meta, true);
> +       if (prog)
> +               bpf_iter_run_prog(prog, &ctx);
> +
> +       if (ctx.s == NULL)
> +               return;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       /* Skip kmem_cache_destroy() for active entries */
> +       if (ctx.s->refcount > 1)
> +               ctx.s->refcount--;
> +       else if (ctx.s->refcount == 1)
> +               destroy = true;
> +
> +       mutex_unlock(&slab_mutex);
> +
> +       if (destroy)
> +               kmem_cache_destroy(ctx.s);
> +}
> +
> +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +       struct kmem_cache *s = v;
> +       struct kmem_cache *next = NULL;
> +       bool destroy = false;
> +
> +       ++*pos;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> +               next = list_next_entry(s, list);
> +               if (next->refcount > 0)
> +                       next->refcount++;

What if next->refcount <=0? Shall we find next of next?

> +       }
> +
> +       /* Skip kmem_cache_destroy() for active entries */
> +       if (s->refcount > 1)
> +               s->refcount--;
> +       else if (s->refcount == 1)
> +               destroy = true;
> +
> +       mutex_unlock(&slab_mutex);
> +
> +       if (destroy)
> +               kmem_cache_destroy(s);
> +
> +       return next;
> +}
[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-04 20:33   ` Song Liu
@ 2024-10-04 21:37     ` Namhyung Kim
  2024-10-04 21:46       ` Song Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2024-10-04 21:37 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

Hi Song,

On Fri, Oct 04, 2024 at 01:33:19PM -0700, Song Liu wrote:
> On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> [...]
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       /*
> > +        * Find an entry at the given position in the slab_caches list instead
> 
> Nit: style of multi-line comment: "/* Find ...".

Ok, will update.

> 
> > +        * of keeping a reference (of the last visited entry, if any) out of
> > +        * slab_mutex. It might miss something if one is deleted in the middle
> > +        * while it releases the lock.  But it should be rare and there's not
> > +        * much we can do about it.
> > +        */
> > +       list_for_each_entry(s, &slab_caches, list) {
> > +               if (cnt == *pos) {
> > +                       /*
> > +                        * Make sure this entry remains in the list by getting
> > +                        * a new reference count.  Note that boot_cache entries
> > +                        * have a negative refcount, so don't touch them.
> > +                        */
> > +                       if (s->refcount > 0)
> > +                               s->refcount++;
> > +                       found = true;
> > +                       break;
> > +               }
> > +               cnt++;
> > +       }
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (!found)
> > +               return NULL;
> > +
> > +       ++*pos;
> > +       return s;
> > +}
> > +
> > +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
> > +{
> > +       struct bpf_iter_meta meta;
> > +       struct bpf_iter__kmem_cache ctx = {
> > +               .meta = &meta,
> > +               .s = v,
> > +       };
> > +       struct bpf_prog *prog;
> > +       bool destroy = false;
> > +
> > +       meta.seq = seq;
> > +       prog = bpf_iter_get_info(&meta, true);
> > +       if (prog)
> > +               bpf_iter_run_prog(prog, &ctx);
> > +
> > +       if (ctx.s == NULL)
> > +               return;
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       /* Skip kmem_cache_destroy() for active entries */
> > +       if (ctx.s->refcount > 1)
> > +               ctx.s->refcount--;
> > +       else if (ctx.s->refcount == 1)
> > +               destroy = true;
> > +
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (destroy)
> > +               kmem_cache_destroy(ctx.s);
> > +}
> > +
> > +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > +{
> > +       struct kmem_cache *s = v;
> > +       struct kmem_cache *next = NULL;
> > +       bool destroy = false;
> > +
> > +       ++*pos;
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> > +               next = list_next_entry(s, list);
> > +               if (next->refcount > 0)
> > +                       next->refcount++;
> 
> What if next->refcount <=0? Shall we find next of next?

The slab_mutex should protect refcount == 0 case so it won't see that.
The negative refcount means it's a boot_cache and we shouldn't touch the
refcount.

Thanks,
Namhyung

> 
> > +       }
> > +
> > +       /* Skip kmem_cache_destroy() for active entries */
> > +       if (s->refcount > 1)
> > +               s->refcount--;
> > +       else if (s->refcount == 1)
> > +               destroy = true;
> > +
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (destroy)
> > +               kmem_cache_destroy(s);
> > +
> > +       return next;
> > +}
> [...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-04 21:37     ` Namhyung Kim
@ 2024-10-04 21:46       ` Song Liu
  2024-10-04 23:29         ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-10-04 21:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 4, 2024 at 2:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Song,
>
> On Fri, Oct 04, 2024 at 01:33:19PM -0700, Song Liu wrote:
[...]
> > > +
> > > +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > > +{
> > > +       struct kmem_cache *s = v;
> > > +       struct kmem_cache *next = NULL;
> > > +       bool destroy = false;
> > > +
> > > +       ++*pos;
> > > +
> > > +       mutex_lock(&slab_mutex);
> > > +
> > > +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> > > +               next = list_next_entry(s, list);
> > > +               if (next->refcount > 0)
> > > +                       next->refcount++;
> >
> > What if next->refcount <=0? Shall we find next of next?
>
> The slab_mutex should protect refcount == 0 case so it won't see that.
> The negative refcount means it's a boot_cache and we shouldn't touch the
> refcount.

I see. Thanks for the explanation!

Please add a comment here, and maybe also add

  WARN_ON_ONCE(next ->refcount == 0).

Song


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-04 21:46       ` Song Liu
@ 2024-10-04 23:29         ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-04 23:29 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 04, 2024 at 02:46:43PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 2:37 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Song,
> >
> > On Fri, Oct 04, 2024 at 01:33:19PM -0700, Song Liu wrote:
> [...]
> > > > +
> > > > +static void *kmem_cache_iter_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> > > > +{
> > > > +       struct kmem_cache *s = v;
> > > > +       struct kmem_cache *next = NULL;
> > > > +       bool destroy = false;
> > > > +
> > > > +       ++*pos;
> > > > +
> > > > +       mutex_lock(&slab_mutex);
> > > > +
> > > > +       if (list_last_entry(&slab_caches, struct kmem_cache, list) != s) {
> > > > +               next = list_next_entry(s, list);
> > > > +               if (next->refcount > 0)
> > > > +                       next->refcount++;
> > >
> > > What if next->refcount <=0? Shall we find next of next?
> >
> > The slab_mutex should protect refcount == 0 case so it won't see that.
> > The negative refcount means it's a boot_cache and we shouldn't touch the
> > refcount.
> 
> I see. Thanks for the explanation!
> 
> Please add a comment here, and maybe also add
> 
>   WARN_ON_ONCE(next ->refcount == 0).

Sure, thanks for your review!
Namhyung



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
  2024-10-03  7:35   ` Vlastimil Babka
  2024-10-04 20:33   ` Song Liu
@ 2024-10-04 20:45   ` Song Liu
  2024-10-04 21:42     ` Namhyung Kim
  2 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-10-04 20:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
[...]
> +
> +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +       loff_t cnt = 0;
> +       bool found = false;
> +       struct kmem_cache *s;
> +
> +       mutex_lock(&slab_mutex);
> +
> +       /*
> +        * Find an entry at the given position in the slab_caches list instead
> +        * of keeping a reference (of the last visited entry, if any) out of
> +        * slab_mutex. It might miss something if one is deleted in the middle
> +        * while it releases the lock.  But it should be rare and there's not
> +        * much we can do about it.
> +        */
> +       list_for_each_entry(s, &slab_caches, list) {
> +               if (cnt == *pos) {
> +                       /*
> +                        * Make sure this entry remains in the list by getting
> +                        * a new reference count.  Note that boot_cache entries
> +                        * have a negative refcount, so don't touch them.
> +                        */
> +                       if (s->refcount > 0)
> +                               s->refcount++;
> +                       found = true;
> +                       break;
> +               }
> +               cnt++;
> +       }
> +       mutex_unlock(&slab_mutex);
> +
> +       if (!found)
> +               return NULL;
> +
> +       ++*pos;

This should be

if (*pos == 0)
    ++*pos;

> +       return s;
> +}
> +
> +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
[...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator
  2024-10-04 20:45   ` Song Liu
@ 2024-10-04 21:42     ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-04 21:42 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 04, 2024 at 01:45:09PM -0700, Song Liu wrote:
> On Wed, Oct 2, 2024 at 11:09 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> [...]
> > +
> > +static void *kmem_cache_iter_seq_start(struct seq_file *seq, loff_t *pos)
> > +{
> > +       loff_t cnt = 0;
> > +       bool found = false;
> > +       struct kmem_cache *s;
> > +
> > +       mutex_lock(&slab_mutex);
> > +
> > +       /*
> > +        * Find an entry at the given position in the slab_caches list instead
> > +        * of keeping a reference (of the last visited entry, if any) out of
> > +        * slab_mutex. It might miss something if one is deleted in the middle
> > +        * while it releases the lock.  But it should be rare and there's not
> > +        * much we can do about it.
> > +        */
> > +       list_for_each_entry(s, &slab_caches, list) {
> > +               if (cnt == *pos) {
> > +                       /*
> > +                        * Make sure this entry remains in the list by getting
> > +                        * a new reference count.  Note that boot_cache entries
> > +                        * have a negative refcount, so don't touch them.
> > +                        */
> > +                       if (s->refcount > 0)
> > +                               s->refcount++;
> > +                       found = true;
> > +                       break;
> > +               }
> > +               cnt++;
> > +       }
> > +       mutex_unlock(&slab_mutex);
> > +
> > +       if (!found)
> > +               return NULL;
> > +
> > +       ++*pos;
> 
> This should be
> 
> if (*pos == 0)
>     ++*pos;

Oh, I thought there's check for seq->count after the seq->op->show()
for the ->start().  I need to check this logic again, thanks for
pointing this out.

Thanks,
Namhyung

> 
> > +       return s;
> > +}
> > +
> > +static void kmem_cache_iter_seq_stop(struct seq_file *seq, void *v)
> [...]


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-02 18:09 [PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc Namhyung Kim
  2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
@ 2024-10-02 18:09 ` Namhyung Kim
  2024-10-04  5:31   ` Namhyung Kim
  2024-10-04 20:10   ` Song Liu
  2024-10-02 18:09 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim
  2 siblings, 2 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-02 18:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

The bpf_get_kmem_cache() is to get a slab cache information from a
virtual address like virt_to_cache().  If the address is a pointer
to a slab object, it'd return a valid kmem_cache pointer, otherwise
NULL is returned.

It doesn't grab a reference count of the kmem_cache so the caller is
responsible to manage the access.  The intended use case for now is to
symbolize locks in slab objects from the lock contention tracepoints.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/bpf/helpers.c |  1 +
 mm/slab_common.c     | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4053f279ed4cc7ab..3709fb14288105c6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7443244656150325..5484e1cd812f698e 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
 }
 EXPORT_SYMBOL(ksize);
 
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/btf.h>
+
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
+{
+	struct slab *slab;
+
+	if (!virt_addr_valid(addr))
+		return NULL;
+
+	slab = virt_to_slab((void *)(long)addr);
+	return slab ? slab->slab_cache : NULL;
+}
+
+__bpf_kfunc_end_defs();
+#endif /* CONFIG_BPF_SYSCALL */
+
 /* Tracepoints definitions. */
 EXPORT_TRACEPOINT_SYMBOL(kmalloc);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
-- 
2.46.1.824.gd892dcdcdd-goog



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-02 18:09 ` [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
@ 2024-10-04  5:31   ` Namhyung Kim
  2024-10-04 20:10   ` Song Liu
  1 sibling, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-04  5:31 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Wed, Oct 02, 2024 at 11:09:55AM -0700, Namhyung Kim wrote:
> The bpf_get_kmem_cache() is to get a slab cache information from a
> virtual address like virt_to_cache().  If the address is a pointer
> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> NULL is returned.
> 
> It doesn't grab a reference count of the kmem_cache so the caller is
> responsible to manage the access.  The intended use case for now is to
> symbolize locks in slab objects from the lock contention tracepoints.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/bpf/helpers.c |  1 +
>  mm/slab_common.c     | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 4053f279ed4cc7ab..3709fb14288105c6 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
>  BTF_KFUNCS_END(common_btf_ids)
>  
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 7443244656150325..5484e1cd812f698e 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
>  }
>  EXPORT_SYMBOL(ksize);
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +#include <linux/btf.h>
> +
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +{
> +	struct slab *slab;
> +
> +	if (!virt_addr_valid(addr))

Hmm.. 32-bit systems don't like this.  Is it ok to change the type of
the parameter (addr) to 'unsigned long'?  Or do you want to keep it as
u64 and add a cast here?

Thanks,
Namhyung


> +		return NULL;
> +
> +	slab = virt_to_slab((void *)(long)addr);
> +	return slab ? slab->slab_cache : NULL;
> +}
> +
> +__bpf_kfunc_end_defs();
> +#endif /* CONFIG_BPF_SYSCALL */
> +
>  /* Tracepoints definitions. */
>  EXPORT_TRACEPOINT_SYMBOL(kmalloc);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
> -- 
> 2.46.1.824.gd892dcdcdd-goog
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-02 18:09 ` [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
  2024-10-04  5:31   ` Namhyung Kim
@ 2024-10-04 20:10   ` Song Liu
  2024-10-04 21:25     ` Roman Gushchin
  1 sibling, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-10-04 20:10 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard Zingerman, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The bpf_get_kmem_cache() is to get a slab cache information from a
> virtual address like virt_to_cache().  If the address is a pointer
> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> NULL is returned.
>
> It doesn't grab a reference count of the kmem_cache so the caller is
> responsible to manage the access.  The intended use case for now is to
> symbolize locks in slab objects from the lock contention tracepoints.
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/bpf/helpers.c |  1 +
>  mm/slab_common.c     | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 4053f279ed4cc7ab..3709fb14288105c6 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
>  BTF_KFUNCS_END(common_btf_ids)
>
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 7443244656150325..5484e1cd812f698e 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
>  }
>  EXPORT_SYMBOL(ksize);
>
> +#ifdef CONFIG_BPF_SYSCALL
> +#include <linux/btf.h>
> +
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +{
> +       struct slab *slab;
> +
> +       if (!virt_addr_valid(addr))
> +               return NULL;
> +
> +       slab = virt_to_slab((void *)(long)addr);
> +       return slab ? slab->slab_cache : NULL;
> +}

Do we need to hold a refcount to the slab_cache? Given
we make this kfunc available everywhere, including
sleepable contexts, I think it is necessary.

Thanks
Song

> +
> +__bpf_kfunc_end_defs();
> +#endif /* CONFIG_BPF_SYSCALL */
> +
>  /* Tracepoints definitions. */
>  EXPORT_TRACEPOINT_SYMBOL(kmalloc);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
> --
> 2.46.1.824.gd892dcdcdd-goog
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 20:10   ` Song Liu
@ 2024-10-04 21:25     ` Roman Gushchin
  2024-10-04 21:36       ` Song Liu
  2024-10-07 12:57       ` Vlastimil Babka
  0 siblings, 2 replies; 26+ messages in thread
From: Roman Gushchin @ 2024-10-04 21:25 UTC (permalink / raw)
  To: Song Liu
  Cc: Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The bpf_get_kmem_cache() is to get a slab cache information from a
> > virtual address like virt_to_cache().  If the address is a pointer
> > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > NULL is returned.
> >
> > It doesn't grab a reference count of the kmem_cache so the caller is
> > responsible to manage the access.  The intended use case for now is to
> > symbolize locks in slab objects from the lock contention tracepoints.
> >
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  kernel/bpf/helpers.c |  1 +
> >  mm/slab_common.c     | 19 +++++++++++++++++++
> >  2 files changed, 20 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> >  BTF_KFUNCS_END(common_btf_ids)
> >
> >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 7443244656150325..5484e1cd812f698e 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> >  }
> >  EXPORT_SYMBOL(ksize);
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +#include <linux/btf.h>
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > +{
> > +       struct slab *slab;
> > +
> > +       if (!virt_addr_valid(addr))
> > +               return NULL;
> > +
> > +       slab = virt_to_slab((void *)(long)addr);
> > +       return slab ? slab->slab_cache : NULL;
> > +}
> 
> Do we need to hold a refcount to the slab_cache? Given
> we make this kfunc available everywhere, including
> sleepable contexts, I think it is necessary.

It's a really good question.

If the callee somehow owns the slab object, as in the example
provided in the series (current task), it's not necessarily.

If a user can pass a random address, you're right, we need to
grab the slab_cache's refcnt. But then we also can't guarantee
that the object still belongs to the same slab_cache, the
function becomes racy by the definition.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 21:25     ` Roman Gushchin
@ 2024-10-04 21:36       ` Song Liu
  2024-10-04 21:58         ` Namhyung Kim
  2024-10-07 12:57       ` Vlastimil Babka
  1 sibling, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-10-04 21:36 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > virtual address like virt_to_cache().  If the address is a pointer
> > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > NULL is returned.
> > >
> > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > responsible to manage the access.  The intended use case for now is to
> > > symbolize locks in slab objects from the lock contention tracepoints.
> > >
> > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  kernel/bpf/helpers.c |  1 +
> > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > >  2 files changed, 20 insertions(+)
> > >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > >  BTF_KFUNCS_END(common_btf_ids)
> > >
> > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > index 7443244656150325..5484e1cd812f698e 100644
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > >  }
> > >  EXPORT_SYMBOL(ksize);
> > >
> > > +#ifdef CONFIG_BPF_SYSCALL
> > > +#include <linux/btf.h>
> > > +
> > > +__bpf_kfunc_start_defs();
> > > +
> > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > +{
> > > +       struct slab *slab;
> > > +
> > > +       if (!virt_addr_valid(addr))
> > > +               return NULL;
> > > +
> > > +       slab = virt_to_slab((void *)(long)addr);
> > > +       return slab ? slab->slab_cache : NULL;
> > > +}
> >
> > Do we need to hold a refcount to the slab_cache? Given
> > we make this kfunc available everywhere, including
> > sleepable contexts, I think it is necessary.
>
> It's a really good question.
>
> If the callee somehow owns the slab object, as in the example
> provided in the series (current task), it's not necessarily.
>
> If a user can pass a random address, you're right, we need to
> grab the slab_cache's refcnt. But then we also can't guarantee
> that the object still belongs to the same slab_cache, the
> function becomes racy by the definition.

To be safe, we can limit the kfunc to sleepable context only. Then
we can lock slab_mutex for virt_to_slab, and hold a refcount
to slab_cache. We will need a KF_RELEASE kfunc to release
the refcount later.

IIUC, this limitation (sleepable context only) shouldn't be a problem
for perf use case?

Thanks,
Song


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 21:36       ` Song Liu
@ 2024-10-04 21:58         ` Namhyung Kim
  2024-10-04 22:57           ` Song Liu
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2024-10-04 21:58 UTC (permalink / raw)
  To: Song Liu
  Cc: Roman Gushchin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > NULL is returned.
> > > >
> > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > responsible to manage the access.  The intended use case for now is to
> > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > >
> > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > >  kernel/bpf/helpers.c |  1 +
> > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > >  2 files changed, 20 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > >  BTF_KFUNCS_END(common_btf_ids)
> > > >
> > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > --- a/mm/slab_common.c
> > > > +++ b/mm/slab_common.c
> > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > >  }
> > > >  EXPORT_SYMBOL(ksize);
> > > >
> > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > +#include <linux/btf.h>
> > > > +
> > > > +__bpf_kfunc_start_defs();
> > > > +
> > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > +{
> > > > +       struct slab *slab;
> > > > +
> > > > +       if (!virt_addr_valid(addr))
> > > > +               return NULL;
> > > > +
> > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > +       return slab ? slab->slab_cache : NULL;
> > > > +}
> > >
> > > Do we need to hold a refcount to the slab_cache? Given
> > > we make this kfunc available everywhere, including
> > > sleepable contexts, I think it is necessary.
> >
> > It's a really good question.
> >
> > If the callee somehow owns the slab object, as in the example
> > provided in the series (current task), it's not necessarily.
> >
> > If a user can pass a random address, you're right, we need to
> > grab the slab_cache's refcnt. But then we also can't guarantee
> > that the object still belongs to the same slab_cache, the
> > function becomes racy by the definition.
> 
> To be safe, we can limit the kfunc to sleepable context only. Then
> we can lock slab_mutex for virt_to_slab, and hold a refcount
> to slab_cache. We will need a KF_RELEASE kfunc to release
> the refcount later.

Then it needs to call kmem_cache_destroy() for release which contains
rcu_barrier. :(

> 
> IIUC, this limitation (sleepable context only) shouldn't be a problem
> for perf use case?

No, it would be called from the lock contention path including
spinlocks. :(

Can we limit it to non-sleepable ctx and not to pass arbtrary address
somehow (or not to save the result pointer)?

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 21:58         ` Namhyung Kim
@ 2024-10-04 22:57           ` Song Liu
  2024-10-04 23:28             ` Namhyung Kim
  2024-10-04 23:44             ` Alexei Starovoitov
  0 siblings, 2 replies; 26+ messages in thread
From: Song Liu @ 2024-10-04 22:57 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Roman Gushchin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 4, 2024 at 2:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> > On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > > NULL is returned.
> > > > >
> > > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > > responsible to manage the access.  The intended use case for now is to
> > > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > > >
> > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > ---
> > > > >  kernel/bpf/helpers.c |  1 +
> > > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > > >  2 files changed, 20 insertions(+)
> > > > >
> > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > > --- a/kernel/bpf/helpers.c
> > > > > +++ b/kernel/bpf/helpers.c
> > > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > > >  BTF_KFUNCS_END(common_btf_ids)
> > > > >
> > > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > > --- a/mm/slab_common.c
> > > > > +++ b/mm/slab_common.c
> > > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > > >  }
> > > > >  EXPORT_SYMBOL(ksize);
> > > > >
> > > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > > +#include <linux/btf.h>
> > > > > +
> > > > > +__bpf_kfunc_start_defs();
> > > > > +
> > > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > > +{
> > > > > +       struct slab *slab;
> > > > > +
> > > > > +       if (!virt_addr_valid(addr))
> > > > > +               return NULL;
> > > > > +
> > > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > > +       return slab ? slab->slab_cache : NULL;
> > > > > +}
> > > >
> > > > Do we need to hold a refcount to the slab_cache? Given
> > > > we make this kfunc available everywhere, including
> > > > sleepable contexts, I think it is necessary.
> > >
> > > It's a really good question.
> > >
> > > If the callee somehow owns the slab object, as in the example
> > > provided in the series (current task), it's not necessarily.
> > >
> > > If a user can pass a random address, you're right, we need to
> > > grab the slab_cache's refcnt. But then we also can't guarantee
> > > that the object still belongs to the same slab_cache, the
> > > function becomes racy by the definition.
> >
> > To be safe, we can limit the kfunc to sleepable context only. Then
> > we can lock slab_mutex for virt_to_slab, and hold a refcount
> > to slab_cache. We will need a KF_RELEASE kfunc to release
> > the refcount later.
>
> Then it needs to call kmem_cache_destroy() for release which contains
> rcu_barrier. :(
>
> >
> > IIUC, this limitation (sleepable context only) shouldn't be a problem
> > for perf use case?
>
> No, it would be called from the lock contention path including
> spinlocks. :(
>
> Can we limit it to non-sleepable ctx and not to pass arbtrary address
> somehow (or not to save the result pointer)?

I hacked something like the following. It is not ideal, because we are
taking spinlock_t pointer instead of void pointer. To use this with void
'pointer, we will need some verifier changes.

Thanks,
Song


diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
index 3709fb142881..7311a26ecb01 100644
--- i/kernel/bpf/helpers.c
+++ w/kernel/bpf/helpers.c
@@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
-BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
| KF_RCU_PROTECTED)
 BTF_KFUNCS_END(common_btf_ids)

 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git i/mm/slab_common.c w/mm/slab_common.c
index 5484e1cd812f..3e3e5f172f2e 100644
--- i/mm/slab_common.c
+++ w/mm/slab_common.c
@@ -1327,14 +1327,15 @@ EXPORT_SYMBOL(ksize);

 __bpf_kfunc_start_defs();

-__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
+__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr)
 {
        struct slab *slab;
+       unsigned long a = (unsigned long)addr;

-       if (!virt_addr_valid(addr))
+       if (!virt_addr_valid(a))
                return NULL;

-       slab = virt_to_slab((void *)(long)addr);
+       slab = virt_to_slab(addr);
        return slab ? slab->slab_cache : NULL;
 }

@@ -1346,4 +1347,3 @@ EXPORT_TRACEPOINT_SYMBOL(kmalloc);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
 EXPORT_TRACEPOINT_SYMBOL(kfree);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
-
diff --git i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
index 3f6ec15a1bf6..8238155a5055 100644
--- i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
+++ w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
@@ -16,7 +16,7 @@ struct {
        __uint(max_entries, 1024);
 } slab_hash SEC(".maps");

-extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
+extern struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr) __ksym;

 /* result, will be checked by userspace */
 int found;
@@ -46,21 +46,23 @@ int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
 SEC("raw_tp/bpf_test_finish")
 int BPF_PROG(check_task_struct)
 {
-       __u64 curr = bpf_get_current_task();
+       struct task_struct *curr = bpf_get_current_task_btf();
        struct kmem_cache *s;
        char *name;

-       s = bpf_get_kmem_cache(curr);
+       s = bpf_get_kmem_cache(&curr->alloc_lock);
        if (s == NULL) {
                found = -1;
                return 0;
        }

+       bpf_rcu_read_lock();
        name = bpf_map_lookup_elem(&slab_hash, &s);
        if (name && !bpf_strncmp(name, 11, "task_struct"))
                found = 1;
        else
                found = -2;
+       bpf_rcu_read_unlock();

        return 0;
 }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 22:57           ` Song Liu
@ 2024-10-04 23:28             ` Namhyung Kim
  2024-10-04 23:44             ` Alexei Starovoitov
  1 sibling, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-04 23:28 UTC (permalink / raw)
  To: Song Liu
  Cc: Roman Gushchin, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Vlastimil Babka,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 04, 2024 at 03:57:26PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 2:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> > > On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > > > NULL is returned.
> > > > > >
> > > > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > > > responsible to manage the access.  The intended use case for now is to
> > > > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > > > >
> > > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > >  kernel/bpf/helpers.c |  1 +
> > > > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > > > >  2 files changed, 20 insertions(+)
> > > > > >
> > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > > > >  BTF_KFUNCS_END(common_btf_ids)
> > > > > >
> > > > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > > > --- a/mm/slab_common.c
> > > > > > +++ b/mm/slab_common.c
> > > > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(ksize);
> > > > > >
> > > > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > > > +#include <linux/btf.h>
> > > > > > +
> > > > > > +__bpf_kfunc_start_defs();
> > > > > > +
> > > > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > > > +{
> > > > > > +       struct slab *slab;
> > > > > > +
> > > > > > +       if (!virt_addr_valid(addr))
> > > > > > +               return NULL;
> > > > > > +
> > > > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > > > +       return slab ? slab->slab_cache : NULL;
> > > > > > +}
> > > > >
> > > > > Do we need to hold a refcount to the slab_cache? Given
> > > > > we make this kfunc available everywhere, including
> > > > > sleepable contexts, I think it is necessary.
> > > >
> > > > It's a really good question.
> > > >
> > > > If the callee somehow owns the slab object, as in the example
> > > > provided in the series (current task), it's not necessarily.
> > > >
> > > > If a user can pass a random address, you're right, we need to
> > > > grab the slab_cache's refcnt. But then we also can't guarantee
> > > > that the object still belongs to the same slab_cache, the
> > > > function becomes racy by the definition.
> > >
> > > To be safe, we can limit the kfunc to sleepable context only. Then
> > > we can lock slab_mutex for virt_to_slab, and hold a refcount
> > > to slab_cache. We will need a KF_RELEASE kfunc to release
> > > the refcount later.
> >
> > Then it needs to call kmem_cache_destroy() for release which contains
> > rcu_barrier. :(
> >
> > >
> > > IIUC, this limitation (sleepable context only) shouldn't be a problem
> > > for perf use case?
> >
> > No, it would be called from the lock contention path including
> > spinlocks. :(
> >
> > Can we limit it to non-sleepable ctx and not to pass arbtrary address
> > somehow (or not to save the result pointer)?
> 
> I hacked something like the following. It is not ideal, because we are
> taking spinlock_t pointer instead of void pointer. To use this with void
> 'pointer, we will need some verifier changes.

Thanks a lot for doing this!!  I'll take a look at the verifier what
needs to be done.

Namhyung

> 
> 
> diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> index 3709fb142881..7311a26ecb01 100644
> --- i/kernel/bpf/helpers.c
> +++ w/kernel/bpf/helpers.c
> @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> | KF_RCU_PROTECTED)
>  BTF_KFUNCS_END(common_btf_ids)
> 
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git i/mm/slab_common.c w/mm/slab_common.c
> index 5484e1cd812f..3e3e5f172f2e 100644
> --- i/mm/slab_common.c
> +++ w/mm/slab_common.c
> @@ -1327,14 +1327,15 @@ EXPORT_SYMBOL(ksize);
> 
>  __bpf_kfunc_start_defs();
> 
> -__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr)
>  {
>         struct slab *slab;
> +       unsigned long a = (unsigned long)addr;
> 
> -       if (!virt_addr_valid(addr))
> +       if (!virt_addr_valid(a))
>                 return NULL;
> 
> -       slab = virt_to_slab((void *)(long)addr);
> +       slab = virt_to_slab(addr);
>         return slab ? slab->slab_cache : NULL;
>  }
> 
> @@ -1346,4 +1347,3 @@ EXPORT_TRACEPOINT_SYMBOL(kmalloc);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
>  EXPORT_TRACEPOINT_SYMBOL(kfree);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
> -
> diff --git i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> index 3f6ec15a1bf6..8238155a5055 100644
> --- i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> +++ w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> @@ -16,7 +16,7 @@ struct {
>         __uint(max_entries, 1024);
>  } slab_hash SEC(".maps");
> 
> -extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
> +extern struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr) __ksym;
> 
>  /* result, will be checked by userspace */
>  int found;
> @@ -46,21 +46,23 @@ int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
>  SEC("raw_tp/bpf_test_finish")
>  int BPF_PROG(check_task_struct)
>  {
> -       __u64 curr = bpf_get_current_task();
> +       struct task_struct *curr = bpf_get_current_task_btf();
>         struct kmem_cache *s;
>         char *name;
> 
> -       s = bpf_get_kmem_cache(curr);
> +       s = bpf_get_kmem_cache(&curr->alloc_lock);
>         if (s == NULL) {
>                 found = -1;
>                 return 0;
>         }
> 
> +       bpf_rcu_read_lock();
>         name = bpf_map_lookup_elem(&slab_hash, &s);
>         if (name && !bpf_strncmp(name, 11, "task_struct"))
>                 found = 1;
>         else
>                 found = -2;
> +       bpf_rcu_read_unlock();
> 
>         return 0;
>  }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 22:57           ` Song Liu
  2024-10-04 23:28             ` Namhyung Kim
@ 2024-10-04 23:44             ` Alexei Starovoitov
  2024-10-04 23:56               ` Song Liu
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2024-10-04 23:44 UTC (permalink / raw)
  To: Song Liu
  Cc: Namhyung Kim, Roman Gushchin, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Vlastimil Babka, Hyeonggon Yoo, linux-mm,
	Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 4, 2024 at 3:57 PM Song Liu <song@kernel.org> wrote:
>
> On Fri, Oct 4, 2024 at 2:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> > > On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > > > NULL is returned.
> > > > > >
> > > > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > > > responsible to manage the access.  The intended use case for now is to
> > > > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > > > >
> > > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > >  kernel/bpf/helpers.c |  1 +
> > > > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > > > >  2 files changed, 20 insertions(+)
> > > > > >
> > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > > > >  BTF_KFUNCS_END(common_btf_ids)
> > > > > >
> > > > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > > > --- a/mm/slab_common.c
> > > > > > +++ b/mm/slab_common.c
> > > > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(ksize);
> > > > > >
> > > > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > > > +#include <linux/btf.h>
> > > > > > +
> > > > > > +__bpf_kfunc_start_defs();
> > > > > > +
> > > > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > > > +{
> > > > > > +       struct slab *slab;
> > > > > > +
> > > > > > +       if (!virt_addr_valid(addr))
> > > > > > +               return NULL;
> > > > > > +
> > > > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > > > +       return slab ? slab->slab_cache : NULL;
> > > > > > +}
> > > > >
> > > > > Do we need to hold a refcount to the slab_cache? Given
> > > > > we make this kfunc available everywhere, including
> > > > > sleepable contexts, I think it is necessary.
> > > >
> > > > It's a really good question.
> > > >
> > > > If the callee somehow owns the slab object, as in the example
> > > > provided in the series (current task), it's not necessarily.
> > > >
> > > > If a user can pass a random address, you're right, we need to
> > > > grab the slab_cache's refcnt. But then we also can't guarantee
> > > > that the object still belongs to the same slab_cache, the
> > > > function becomes racy by the definition.
> > >
> > > To be safe, we can limit the kfunc to sleepable context only. Then
> > > we can lock slab_mutex for virt_to_slab, and hold a refcount
> > > to slab_cache. We will need a KF_RELEASE kfunc to release
> > > the refcount later.
> >
> > Then it needs to call kmem_cache_destroy() for release which contains
> > rcu_barrier. :(
> >
> > >
> > > IIUC, this limitation (sleepable context only) shouldn't be a problem
> > > for perf use case?
> >
> > No, it would be called from the lock contention path including
> > spinlocks. :(
> >
> > Can we limit it to non-sleepable ctx and not to pass arbtrary address
> > somehow (or not to save the result pointer)?
>
> I hacked something like the following. It is not ideal, because we are
> taking spinlock_t pointer instead of void pointer. To use this with void
> 'pointer, we will need some verifier changes.
>
> Thanks,
> Song
>
>
> diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> index 3709fb142881..7311a26ecb01 100644
> --- i/kernel/bpf/helpers.c
> +++ w/kernel/bpf/helpers.c
> @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> | KF_RCU_PROTECTED)

I don't think KF_TRUSTED_ARGS approach would fit here.
Namhyung's use case is tracing. The 'addr' will be some potentially
arbitrary address from somewhere. The chance to see a trusted pointer
is probably very low in such a tracing use case.

The verifier change can mainly be the following:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7d9b38ffd220..e09eb108e956 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12834,6 +12834,9 @@ static int check_kfunc_call(struct
bpf_verifier_env *env, struct bpf_insn *insn,
                        regs[BPF_REG_0].type = PTR_TO_BTF_ID;
                        regs[BPF_REG_0].btf_id = ptr_type_id;

+                       if (meta.func_id ==
special_kfunc_list[KF_get_kmem_cache])
+                               regs[BPF_REG_0].type |= PTR_UNTRUSTED;
+
                        if (is_iter_next_kfunc(&meta)) {
                                struct bpf_reg_state *cur_iter;

The returned 'struct kmem_cache *' won't be refcnt-ed (acquired).
It will be readonly via ptr_to_btf_id logic.
s->flags;
s->size;
s->offset;
access will be allowed but the verifier will sanitize them
with an inlined version of probe_read_kernel.
Even KF_RET_NULL can be dropped.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 23:44             ` Alexei Starovoitov
@ 2024-10-04 23:56               ` Song Liu
  2024-10-06 19:00                 ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Song Liu @ 2024-10-04 23:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Namhyung Kim, Roman Gushchin, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Vlastimil Babka, Hyeonggon Yoo, linux-mm,
	Arnaldo Carvalho de Melo, Kees Cook

On Fri, Oct 4, 2024 at 4:44 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
[...]
> > diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> > index 3709fb142881..7311a26ecb01 100644
> > --- i/kernel/bpf/helpers.c
> > +++ w/kernel/bpf/helpers.c
> > @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> > | KF_RCU_PROTECTED)
>
> I don't think KF_TRUSTED_ARGS approach would fit here.
> Namhyung's use case is tracing. The 'addr' will be some potentially
> arbitrary address from somewhere. The chance to see a trusted pointer
> is probably very low in such a tracing use case.

I thought the primary use case was to trace lock contention, for
example, queued_spin_lock_slowpath(). Of course, a more
general solution is better.

>
> The verifier change can mainly be the following:
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 7d9b38ffd220..e09eb108e956 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12834,6 +12834,9 @@ static int check_kfunc_call(struct
> bpf_verifier_env *env, struct bpf_insn *insn,
>                         regs[BPF_REG_0].type = PTR_TO_BTF_ID;
>                         regs[BPF_REG_0].btf_id = ptr_type_id;
>
> +                       if (meta.func_id ==
> special_kfunc_list[KF_get_kmem_cache])
> +                               regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> +
>                         if (is_iter_next_kfunc(&meta)) {
>                                 struct bpf_reg_state *cur_iter;

This is easier than I thought.

Thanks,
Song

> The returned 'struct kmem_cache *' won't be refcnt-ed (acquired).
> It will be readonly via ptr_to_btf_id logic.
> s->flags;
> s->size;
> s->offset;
> access will be allowed but the verifier will sanitize them
> with an inlined version of probe_read_kernel.
> Even KF_RET_NULL can be dropped.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 23:56               ` Song Liu
@ 2024-10-06 19:00                 ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-06 19:00 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Roman Gushchin, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Vlastimil Babka, Hyeonggon Yoo, linux-mm,
	Arnaldo Carvalho de Melo, Kees Cook

Hello,

On Fri, Oct 04, 2024 at 04:56:57PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 4:44 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> [...]
> > > diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> > > index 3709fb142881..7311a26ecb01 100644
> > > --- i/kernel/bpf/helpers.c
> > > +++ w/kernel/bpf/helpers.c
> > > @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> > > | KF_RCU_PROTECTED)
> >
> > I don't think KF_TRUSTED_ARGS approach would fit here.
> > Namhyung's use case is tracing. The 'addr' will be some potentially
> > arbitrary address from somewhere. The chance to see a trusted pointer
> > is probably very low in such a tracing use case.
> 
> I thought the primary use case was to trace lock contention, for
> example, queued_spin_lock_slowpath(). Of course, a more
> general solution is better.

Right, my intended use case is the lock contention profiling so probably
it's ok to limit it for trusted pointers if it helps.  But as Song said,
a general solution should be better. :)

> 
> >
> > The verifier change can mainly be the following:
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 7d9b38ffd220..e09eb108e956 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12834,6 +12834,9 @@ static int check_kfunc_call(struct
> > bpf_verifier_env *env, struct bpf_insn *insn,
> >                         regs[BPF_REG_0].type = PTR_TO_BTF_ID;
> >                         regs[BPF_REG_0].btf_id = ptr_type_id;
> >
> > +                       if (meta.func_id == special_kfunc_list[KF_get_kmem_cache])
> > +                               regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> > +
> >                         if (is_iter_next_kfunc(&meta)) {
> >                                 struct bpf_reg_state *cur_iter;
> 
> This is easier than I thought.

Indeed!  Thanks for providing the code.

> 
> > The returned 'struct kmem_cache *' won't be refcnt-ed (acquired).
> > It will be readonly via ptr_to_btf_id logic.
> > s->flags;
> > s->size;
> > s->offset;
> > access will be allowed but the verifier will sanitize them
> > with an inlined version of probe_read_kernel.
> > Even KF_RET_NULL can be dropped.

Ok, I'll check this out.  By having PTR_UNTRUSTED, are the callers
still required to check NULL or is it handled by probe_read_kernel()?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-04 21:25     ` Roman Gushchin
  2024-10-04 21:36       ` Song Liu
@ 2024-10-07 12:57       ` Vlastimil Babka
  2024-10-09  7:17         ` Namhyung Kim
  1 sibling, 1 reply; 26+ messages in thread
From: Vlastimil Babka @ 2024-10-07 12:57 UTC (permalink / raw)
  To: Roman Gushchin, Song Liu
  Cc: Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Hyeonggon Yoo,
	linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On 10/4/24 11:25 PM, Roman Gushchin wrote:
> On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
>> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
>>>
>>> The bpf_get_kmem_cache() is to get a slab cache information from a
>>> virtual address like virt_to_cache().  If the address is a pointer
>>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
>>> NULL is returned.
>>>
>>> It doesn't grab a reference count of the kmem_cache so the caller is
>>> responsible to manage the access.  The intended use case for now is to
>>> symbolize locks in slab objects from the lock contention tracepoints.
>>>
>>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
>>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
>>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>


So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
thought the perf use case was:

- at the beginning it iterates the kmem caches and stores anything of
possible interest in bpf maps or somewhere - hence we have the iterator
- during profiling, from object it gets to a cache, but doesn't need to
access the cache - just store the kmem_cache address in the perf record
- after profiling itself, use the information in the maps from the first
step together with cache pointers from the second step to calculate
whatever is necessary

So at no point it should be necessary to take refcount to a kmem_cache?

But maybe "bpf_get_kmem_cache()" is implemented here as too generic
given the above use case and it should be implemented in a way that the
pointer it returns cannot be used to access anything (which could be
unsafe), but only as a bpf map key - so it should return e.g. an
unsigned long instead?

>>> ---
>>>  kernel/bpf/helpers.c |  1 +
>>>  mm/slab_common.c     | 19 +++++++++++++++++++
>>>  2 files changed, 20 insertions(+)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index 4053f279ed4cc7ab..3709fb14288105c6 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>>>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>>>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>>>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
>>> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
>>>  BTF_KFUNCS_END(common_btf_ids)
>>>
>>>  static const struct btf_kfunc_id_set common_kfunc_set = {
>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>> index 7443244656150325..5484e1cd812f698e 100644
>>> --- a/mm/slab_common.c
>>> +++ b/mm/slab_common.c
>>> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
>>>  }
>>>  EXPORT_SYMBOL(ksize);
>>>
>>> +#ifdef CONFIG_BPF_SYSCALL
>>> +#include <linux/btf.h>
>>> +
>>> +__bpf_kfunc_start_defs();
>>> +
>>> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
>>> +{
>>> +       struct slab *slab;
>>> +
>>> +       if (!virt_addr_valid(addr))
>>> +               return NULL;
>>> +
>>> +       slab = virt_to_slab((void *)(long)addr);
>>> +       return slab ? slab->slab_cache : NULL;
>>> +}
>>
>> Do we need to hold a refcount to the slab_cache? Given
>> we make this kfunc available everywhere, including
>> sleepable contexts, I think it is necessary.
> 
> It's a really good question.
> 
> If the callee somehow owns the slab object, as in the example
> provided in the series (current task), it's not necessarily.
> 
> If a user can pass a random address, you're right, we need to
> grab the slab_cache's refcnt. But then we also can't guarantee
> that the object still belongs to the same slab_cache, the
> function becomes racy by the definition.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-07 12:57       ` Vlastimil Babka
@ 2024-10-09  7:17         ` Namhyung Kim
  2024-10-10 16:46           ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2024-10-09  7:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Roman Gushchin, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Hyeonggon Yoo,
	linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >>>
> >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> >>> virtual address like virt_to_cache().  If the address is a pointer
> >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> >>> NULL is returned.
> >>>
> >>> It doesn't grab a reference count of the kmem_cache so the caller is
> >>> responsible to manage the access.  The intended use case for now is to
> >>> symbolize locks in slab objects from the lock contention tracepoints.
> >>>
> >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> 
> 
> So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> thought the perf use case was:
> 
> - at the beginning it iterates the kmem caches and stores anything of
> possible interest in bpf maps or somewhere - hence we have the iterator
> - during profiling, from object it gets to a cache, but doesn't need to
> access the cache - just store the kmem_cache address in the perf record
> - after profiling itself, use the information in the maps from the first
> step together with cache pointers from the second step to calculate
> whatever is necessary

Correct.

> 
> So at no point it should be necessary to take refcount to a kmem_cache?
> 
> But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> given the above use case and it should be implemented in a way that the
> pointer it returns cannot be used to access anything (which could be
> unsafe), but only as a bpf map key - so it should return e.g. an
> unsigned long instead?

Yep, this should work for my use case.  Maybe we don't need the
iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
we can get the necessary info at the moment.  But I think it'd be less
efficient as more work need to be done at the event (lock contention).
It'd better setting up necessary info in a map before monitoring (using
the iterator), and just looking up the map with the kfunc while
monitoring the lock contention.

Thanks,
Namhyung

> 
> >>> ---
> >>>  kernel/bpf/helpers.c |  1 +
> >>>  mm/slab_common.c     | 19 +++++++++++++++++++
> >>>  2 files changed, 20 insertions(+)
> >>>
> >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>> index 4053f279ed4cc7ab..3709fb14288105c6 100644
> >>> --- a/kernel/bpf/helpers.c
> >>> +++ b/kernel/bpf/helpers.c
> >>> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> >>>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> >>>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> >>>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> >>> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> >>>  BTF_KFUNCS_END(common_btf_ids)
> >>>
> >>>  static const struct btf_kfunc_id_set common_kfunc_set = {
> >>> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >>> index 7443244656150325..5484e1cd812f698e 100644
> >>> --- a/mm/slab_common.c
> >>> +++ b/mm/slab_common.c
> >>> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> >>>  }
> >>>  EXPORT_SYMBOL(ksize);
> >>>
> >>> +#ifdef CONFIG_BPF_SYSCALL
> >>> +#include <linux/btf.h>
> >>> +
> >>> +__bpf_kfunc_start_defs();
> >>> +
> >>> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> >>> +{
> >>> +       struct slab *slab;
> >>> +
> >>> +       if (!virt_addr_valid(addr))
> >>> +               return NULL;
> >>> +
> >>> +       slab = virt_to_slab((void *)(long)addr);
> >>> +       return slab ? slab->slab_cache : NULL;
> >>> +}
> >>
> >> Do we need to hold a refcount to the slab_cache? Given
> >> we make this kfunc available everywhere, including
> >> sleepable contexts, I think it is necessary.
> > 
> > It's a really good question.
> > 
> > If the callee somehow owns the slab object, as in the example
> > provided in the series (current task), it's not necessarily.
> > 
> > If a user can pass a random address, you're right, we need to
> > grab the slab_cache's refcnt. But then we also can't guarantee
> > that the object still belongs to the same slab_cache, the
> > function becomes racy by the definition.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-09  7:17         ` Namhyung Kim
@ 2024-10-10 16:46           ` Namhyung Kim
  2024-10-10 17:04             ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2024-10-10 16:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Roman Gushchin, Song Liu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Hyeonggon Yoo,
	linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Wed, Oct 09, 2024 at 12:17:12AM -0700, Namhyung Kim wrote:
> On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> > On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >>>
> > >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> > >>> virtual address like virt_to_cache().  If the address is a pointer
> > >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > >>> NULL is returned.
> > >>>
> > >>> It doesn't grab a reference count of the kmem_cache so the caller is
> > >>> responsible to manage the access.  The intended use case for now is to
> > >>> symbolize locks in slab objects from the lock contention tracepoints.
> > >>>
> > >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > 
> > 
> > So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> > thought the perf use case was:
> > 
> > - at the beginning it iterates the kmem caches and stores anything of
> > possible interest in bpf maps or somewhere - hence we have the iterator
> > - during profiling, from object it gets to a cache, but doesn't need to
> > access the cache - just store the kmem_cache address in the perf record
> > - after profiling itself, use the information in the maps from the first
> > step together with cache pointers from the second step to calculate
> > whatever is necessary
> 
> Correct.
> 
> > 
> > So at no point it should be necessary to take refcount to a kmem_cache?
> > 
> > But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> > given the above use case and it should be implemented in a way that the
> > pointer it returns cannot be used to access anything (which could be
> > unsafe), but only as a bpf map key - so it should return e.g. an
> > unsigned long instead?
> 
> Yep, this should work for my use case.  Maybe we don't need the
> iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
> we can get the necessary info at the moment.  But I think it'd be less
> efficient as more work need to be done at the event (lock contention).
> It'd better setting up necessary info in a map before monitoring (using
> the iterator), and just looking up the map with the kfunc while
> monitoring the lock contention.

Maybe it's still better to return a non-refcounted pointer for future
use.  I'll leave it for v5.

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-10 16:46           ` Namhyung Kim
@ 2024-10-10 17:04             ` Alexei Starovoitov
  2024-10-10 22:56               ` Namhyung Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2024-10-10 17:04 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Vlastimil Babka, Roman Gushchin, Song Liu, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Thu, Oct 10, 2024 at 9:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Oct 09, 2024 at 12:17:12AM -0700, Namhyung Kim wrote:
> > On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> > > On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >>>
> > > >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> > > >>> virtual address like virt_to_cache().  If the address is a pointer
> > > >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > >>> NULL is returned.
> > > >>>
> > > >>> It doesn't grab a reference count of the kmem_cache so the caller is
> > > >>> responsible to manage the access.  The intended use case for now is to
> > > >>> symbolize locks in slab objects from the lock contention tracepoints.
> > > >>>
> > > >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > >
> > >
> > > So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> > > thought the perf use case was:
> > >
> > > - at the beginning it iterates the kmem caches and stores anything of
> > > possible interest in bpf maps or somewhere - hence we have the iterator
> > > - during profiling, from object it gets to a cache, but doesn't need to
> > > access the cache - just store the kmem_cache address in the perf record
> > > - after profiling itself, use the information in the maps from the first
> > > step together with cache pointers from the second step to calculate
> > > whatever is necessary
> >
> > Correct.
> >
> > >
> > > So at no point it should be necessary to take refcount to a kmem_cache?
> > >
> > > But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> > > given the above use case and it should be implemented in a way that the
> > > pointer it returns cannot be used to access anything (which could be
> > > unsafe), but only as a bpf map key - so it should return e.g. an
> > > unsigned long instead?
> >
> > Yep, this should work for my use case.  Maybe we don't need the
> > iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
> > we can get the necessary info at the moment.  But I think it'd be less
> > efficient as more work need to be done at the event (lock contention).
> > It'd better setting up necessary info in a map before monitoring (using
> > the iterator), and just looking up the map with the kfunc while
> > monitoring the lock contention.
>
> Maybe it's still better to return a non-refcounted pointer for future
> use.  I'll leave it for v5.

Pls keep it as:
__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)

just make sure it's PTR_UNTRUSTED.
No need to make it return long or void *.
The users can do:
  bpf_core_cast(any_value, struct kmem_cache);
anyway, but it would be an unnecessary step.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
  2024-10-10 17:04             ` Alexei Starovoitov
@ 2024-10-10 22:56               ` Namhyung Kim
  0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-10 22:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlastimil Babka, Roman Gushchin, Song Liu, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Yonghong Song, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, LKML, bpf, Andrew Morton,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

On Thu, Oct 10, 2024 at 10:04:24AM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 10, 2024 at 9:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Oct 09, 2024 at 12:17:12AM -0700, Namhyung Kim wrote:
> > > On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> > > > On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >>>
> > > > >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > >>> virtual address like virt_to_cache().  If the address is a pointer
> > > > >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > >>> NULL is returned.
> > > > >>>
> > > > >>> It doesn't grab a reference count of the kmem_cache so the caller is
> > > > >>> responsible to manage the access.  The intended use case for now is to
> > > > >>> symbolize locks in slab objects from the lock contention tracepoints.
> > > > >>>
> > > > >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > >
> > > >
> > > > So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> > > > thought the perf use case was:
> > > >
> > > > - at the beginning it iterates the kmem caches and stores anything of
> > > > possible interest in bpf maps or somewhere - hence we have the iterator
> > > > - during profiling, from object it gets to a cache, but doesn't need to
> > > > access the cache - just store the kmem_cache address in the perf record
> > > > - after profiling itself, use the information in the maps from the first
> > > > step together with cache pointers from the second step to calculate
> > > > whatever is necessary
> > >
> > > Correct.
> > >
> > > >
> > > > So at no point it should be necessary to take refcount to a kmem_cache?
> > > >
> > > > But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> > > > given the above use case and it should be implemented in a way that the
> > > > pointer it returns cannot be used to access anything (which could be
> > > > unsafe), but only as a bpf map key - so it should return e.g. an
> > > > unsigned long instead?
> > >
> > > Yep, this should work for my use case.  Maybe we don't need the
> > > iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
> > > we can get the necessary info at the moment.  But I think it'd be less
> > > efficient as more work need to be done at the event (lock contention).
> > > It'd better setting up necessary info in a map before monitoring (using
> > > the iterator), and just looking up the map with the kfunc while
> > > monitoring the lock contention.
> >
> > Maybe it's still better to return a non-refcounted pointer for future
> > use.  I'll leave it for v5.
> 
> Pls keep it as:
> __bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> 
> just make sure it's PTR_UNTRUSTED.

Sure, will do.

> No need to make it return long or void *.
> The users can do:
>   bpf_core_cast(any_value, struct kmem_cache);
> anyway, but it would be an unnecessary step.

Yeah I thought there would be a way to do that.

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter
  2024-10-02 18:09 [PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc Namhyung Kim
  2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
  2024-10-02 18:09 ` [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
@ 2024-10-02 18:09 ` Namhyung Kim
  2 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-10-02 18:09 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	LKML, bpf, Andrew Morton, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Arnaldo Carvalho de Melo, Kees Cook

The test traverses all slab caches using the kmem_cache_iter and check
if current task's pointer is from "task_struct" slab cache.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 .../bpf/prog_tests/kmem_cache_iter.c          | 64 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_iter.h  |  7 ++
 .../selftests/bpf/progs/kmem_cache_iter.c     | 66 +++++++++++++++++++
 3 files changed, 137 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/kmem_cache_iter.c

diff --git a/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
new file mode 100644
index 0000000000000000..3965e2924ac82d91
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kmem_cache_iter.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Google */
+
+#include <test_progs.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+#include "kmem_cache_iter.skel.h"
+
+static void test_kmem_cache_iter_check_task(struct kmem_cache_iter *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, opts,
+		.flags = 0,  /* run it with the current task */
+	);
+	int prog_fd = bpf_program__fd(skel->progs.check_task_struct);
+
+	/* get task_struct and check it if's from a slab cache */
+	bpf_prog_test_run_opts(prog_fd, &opts);
+
+	/* the BPF program should set 'found' variable */
+	ASSERT_EQ(skel->bss->found, 1, "found task_struct");
+}
+
+void test_kmem_cache_iter(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct kmem_cache_iter *skel = NULL;
+	union bpf_iter_link_info linfo = {};
+	struct bpf_link *link;
+	char buf[1024];
+	int iter_fd;
+
+	skel = kmem_cache_iter__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "kmem_cache_iter__open_and_load"))
+		return;
+
+	opts.link_info = &linfo;
+	opts.link_info_len = sizeof(linfo);
+
+	link = bpf_program__attach_iter(skel->progs.slab_info_collector, &opts);
+	if (!ASSERT_OK_PTR(link, "attach_iter"))
+		goto destroy;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (!ASSERT_GE(iter_fd, 0, "iter_create"))
+		goto free_link;
+
+	memset(buf, 0, sizeof(buf));
+	while (read(iter_fd, buf, sizeof(buf) > 0)) {
+		/* read out all contents */
+		printf("%s", buf);
+	}
+
+	/* next reads should return 0 */
+	ASSERT_EQ(read(iter_fd, buf, sizeof(buf)), 0, "read");
+
+	test_kmem_cache_iter_check_task(skel);
+
+	close(iter_fd);
+
+free_link:
+	bpf_link__destroy(link);
+destroy:
+	kmem_cache_iter__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter.h b/tools/testing/selftests/bpf/progs/bpf_iter.h
index c41ee80533ca219a..3305dc3a74b32481 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter.h
+++ b/tools/testing/selftests/bpf/progs/bpf_iter.h
@@ -24,6 +24,7 @@
 #define BTF_F_PTR_RAW BTF_F_PTR_RAW___not_used
 #define BTF_F_ZERO BTF_F_ZERO___not_used
 #define bpf_iter__ksym bpf_iter__ksym___not_used
+#define bpf_iter__kmem_cache bpf_iter__kmem_cache___not_used
 #include "vmlinux.h"
 #undef bpf_iter_meta
 #undef bpf_iter__bpf_map
@@ -48,6 +49,7 @@
 #undef BTF_F_PTR_RAW
 #undef BTF_F_ZERO
 #undef bpf_iter__ksym
+#undef bpf_iter__kmem_cache
 
 struct bpf_iter_meta {
 	struct seq_file *seq;
@@ -165,3 +167,8 @@ struct bpf_iter__ksym {
 	struct bpf_iter_meta *meta;
 	struct kallsym_iter *ksym;
 };
+
+struct bpf_iter__kmem_cache {
+	struct bpf_iter_meta *meta;
+	struct kmem_cache *s;
+} __attribute__((preserve_access_index));
diff --git a/tools/testing/selftests/bpf/progs/kmem_cache_iter.c b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
new file mode 100644
index 0000000000000000..3f6ec15a1bf6344c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Google */
+
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#define SLAB_NAME_MAX  256
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(void *));
+	__uint(value_size, SLAB_NAME_MAX);
+	__uint(max_entries, 1024);
+} slab_hash SEC(".maps");
+
+extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
+
+/* result, will be checked by userspace */
+int found;
+
+SEC("iter/kmem_cache")
+int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
+{
+	struct seq_file *seq = ctx->meta->seq;
+	struct kmem_cache *s = ctx->s;
+
+	if (s) {
+		char name[SLAB_NAME_MAX];
+
+		/*
+		 * To make sure if the slab_iter implements the seq interface
+		 * properly and it's also useful for debugging.
+		 */
+		BPF_SEQ_PRINTF(seq, "%s: %u\n", s->name, s->object_size);
+
+		bpf_probe_read_kernel_str(name, sizeof(name), s->name);
+		bpf_map_update_elem(&slab_hash, &s, name, BPF_NOEXIST);
+	}
+
+	return 0;
+}
+
+SEC("raw_tp/bpf_test_finish")
+int BPF_PROG(check_task_struct)
+{
+	__u64 curr = bpf_get_current_task();
+	struct kmem_cache *s;
+	char *name;
+
+	s = bpf_get_kmem_cache(curr);
+	if (s == NULL) {
+		found = -1;
+		return 0;
+	}
+
+	name = bpf_map_lookup_elem(&slab_hash, &s);
+	if (name && !bpf_strncmp(name, 11, "task_struct"))
+		found = 1;
+	else
+		found = -2;
+
+	return 0;
+}
-- 
2.46.1.824.gd892dcdcdd-goog



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-10-10 22:56 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-02 18:09 [PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc Namhyung Kim
2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
2024-10-03  7:35   ` Vlastimil Babka
2024-10-04 20:33   ` Song Liu
2024-10-04 21:37     ` Namhyung Kim
2024-10-04 21:46       ` Song Liu
2024-10-04 23:29         ` Namhyung Kim
2024-10-04 20:45   ` Song Liu
2024-10-04 21:42     ` Namhyung Kim
2024-10-02 18:09 ` [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
2024-10-04  5:31   ` Namhyung Kim
2024-10-04 20:10   ` Song Liu
2024-10-04 21:25     ` Roman Gushchin
2024-10-04 21:36       ` Song Liu
2024-10-04 21:58         ` Namhyung Kim
2024-10-04 22:57           ` Song Liu
2024-10-04 23:28             ` Namhyung Kim
2024-10-04 23:44             ` Alexei Starovoitov
2024-10-04 23:56               ` Song Liu
2024-10-06 19:00                 ` Namhyung Kim
2024-10-07 12:57       ` Vlastimil Babka
2024-10-09  7:17         ` Namhyung Kim
2024-10-10 16:46           ` Namhyung Kim
2024-10-10 17:04             ` Alexei Starovoitov
2024-10-10 22:56               ` Namhyung Kim
2024-10-02 18:09 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox