linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup
@ 2024-11-22  3:59 Andrii Nakryiko
  2024-11-22  3:59 ` [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2024-11-22  3:59 UTC (permalink / raw)
  To: linux-trace-kernel, linux-mm, akpm, peterz, mingo, torvalds
  Cc: oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa, paulmck,
	willy, surenb, mjguzik, brauner, jannh, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca, Andrii Nakryiko

Implement speculative (lockless) resolution of VMA to inode to uprobe,
bypassing the need to take mmap_lock for reads, if possible. This series is
based on Suren's patch set [2], which adds mm_struct helpers that help detect
whether mm_struct was changed, which is used by uprobe logic to validate that
speculative results can be trusted after all the lookup logic results in
a valid uprobe instance.

Patch #1 is a simplification to uprobe VMA flag checking, suggested by Oleg.

Patch #2 is the speculative VMA-to-uprobe resolution logic itself, and is the
focal point of this patch set. It makes entry uprobes in common case scale
very well with number of CPUs, as we avoid any locking or cache line bouncing
between CPUs. See corresponding patch for details and benchmarking results.

Note, this patch set assumes that FMODE_BACKING files were switched to have
SLAB_TYPE_SAFE_BY_RCU semantics, which was recently done by Christian Brauner
in [0]. This change can be pulled into perf/core through stable
tags/vfs-6.13.for-bpf.file tag from [1].

  [0] https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs-6.13.for-bpf.file&id=8b1bc2590af61129b82a189e9dc7c2804c34400e
  [1] git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
  [2] https://lore.kernel.org/linux-mm/20241121162826.987947-1-surenb@google.com/

v4->v5:
- rebase on top of Suren's latest version of mm patches addressing Peter's
  comment and API renaming request;
v3->v4:
- rebased and dropped data_race(), given mm_struct uses real seqcount (Peter);
v2->v3:
- dropped kfree_rcu() patch (Christian);
- added data_race() annotations for fields of vma and vma->vm_file which could
  be modified during speculative lookup (Oleg);
- fixed int->long problem in stubs for mmap_lock_speculation_{start,end}(),
  caught by Kernel test robot;
v1->v2:
- adjusted vma_end_write_all() comment to point out it should never be called
  manually now, but I wasn't sure how ACQUIRE/RELEASE comments should be
  reworded (previously requested by Jann), so I'd appreciate some help there
  (Jann);
- int -> long change for mm_lock_seq, as agreed at LPC2024 (Jann, Suren, Liam);
- kfree_rcu_mightsleep() for FMODE_BACKING (Suren, Christian);
- vm_flags simplification in find_active_uprobe_rcu() and
  find_active_uprobe_speculative() (Oleg);
- guard(rcu)() simplified find_active_uprobe_speculative() implementation.

Andrii Nakryiko (2):
  uprobes: simplify find_active_uprobe_rcu() VMA checks
  uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution

 kernel/events/uprobes.c | 47 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

-- 
2.43.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks
  2024-11-22  3:59 [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
@ 2024-11-22  3:59 ` Andrii Nakryiko
  2024-11-26 22:19   ` Jann Horn
  2024-11-22  3:59 ` [PATCH v5 tip/perf/core 2/2] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Andrii Nakryiko
  2024-11-22 11:07 ` [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Peter Zijlstra
  2 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2024-11-22  3:59 UTC (permalink / raw)
  To: linux-trace-kernel, linux-mm, akpm, peterz, mingo, torvalds
  Cc: oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa, paulmck,
	willy, surenb, mjguzik, brauner, jannh, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca, Andrii Nakryiko

At the point where find_active_uprobe_rcu() is used we know that VMA in
question has triggered software breakpoint, so we don't need to validate
vma->vm_flags. Keep only vma->vm_file NULL check.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/uprobes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index a76ddc5fc982..c4da8f741f3a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2305,7 +2305,7 @@ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swb
 	mmap_read_lock(mm);
 	vma = vma_lookup(mm, bp_vaddr);
 	if (vma) {
-		if (valid_vma(vma, false)) {
+		if (vma->vm_file) {
 			struct inode *inode = file_inode(vma->vm_file);
 			loff_t offset = vaddr_to_offset(vma, bp_vaddr);
 
-- 
2.43.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v5 tip/perf/core 2/2] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
  2024-11-22  3:59 [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
  2024-11-22  3:59 ` [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
@ 2024-11-22  3:59 ` Andrii Nakryiko
  2024-11-22 14:56   ` Liam R. Howlett
  2024-11-22 11:07 ` [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Peter Zijlstra
  2 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2024-11-22  3:59 UTC (permalink / raw)
  To: linux-trace-kernel, linux-mm, akpm, peterz, mingo, torvalds
  Cc: oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa, paulmck,
	willy, surenb, mjguzik, brauner, jannh, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca, Andrii Nakryiko

Given filp_cachep is marked SLAB_TYPESAFE_BY_RCU (and FMODE_BACKING
files, a special case, now goes through RCU-delated freeing), we can
safely access vma->vm_file->f_inode field locklessly under just
rcu_read_lock() protection, which enables looking up uprobe from
uprobes_tree completely locklessly and speculatively without the need to
acquire mmap_lock for reads. In most cases, anyway, assuming that there
are no parallel mm and/or VMA modifications. The underlying struct
file's memory won't go away from under us (even if struct file can be
reused in the meantime).

We rely on newly added mmap_lock_speculate_{try_begin,retry}() helpers to
validate that mm_struct stays intact for entire duration of this
speculation. If not, we fall back to mmap_lock-protected lookup.
The speculative logic is written in such a way that it will safely
handle any garbage values that might be read from vma or file structs.

Benchmarking results speak for themselves.

BEFORE (latest tip/perf/core)
=============================
uprobe-nop            ( 1 cpus):    3.384 ± 0.004M/s  (  3.384M/s/cpu)
uprobe-nop            ( 2 cpus):    5.456 ± 0.005M/s  (  2.728M/s/cpu)
uprobe-nop            ( 3 cpus):    7.863 ± 0.015M/s  (  2.621M/s/cpu)
uprobe-nop            ( 4 cpus):    9.442 ± 0.008M/s  (  2.360M/s/cpu)
uprobe-nop            ( 5 cpus):   11.036 ± 0.013M/s  (  2.207M/s/cpu)
uprobe-nop            ( 6 cpus):   10.884 ± 0.019M/s  (  1.814M/s/cpu)
uprobe-nop            ( 7 cpus):    7.897 ± 0.145M/s  (  1.128M/s/cpu)
uprobe-nop            ( 8 cpus):   10.021 ± 0.128M/s  (  1.253M/s/cpu)
uprobe-nop            (10 cpus):    9.932 ± 0.170M/s  (  0.993M/s/cpu)
uprobe-nop            (12 cpus):    8.369 ± 0.056M/s  (  0.697M/s/cpu)
uprobe-nop            (14 cpus):    8.678 ± 0.017M/s  (  0.620M/s/cpu)
uprobe-nop            (16 cpus):    7.392 ± 0.003M/s  (  0.462M/s/cpu)
uprobe-nop            (24 cpus):    5.326 ± 0.178M/s  (  0.222M/s/cpu)
uprobe-nop            (32 cpus):    5.426 ± 0.059M/s  (  0.170M/s/cpu)
uprobe-nop            (40 cpus):    5.262 ± 0.070M/s  (  0.132M/s/cpu)
uprobe-nop            (48 cpus):    6.121 ± 0.010M/s  (  0.128M/s/cpu)
uprobe-nop            (56 cpus):    6.252 ± 0.035M/s  (  0.112M/s/cpu)
uprobe-nop            (64 cpus):    7.644 ± 0.023M/s  (  0.119M/s/cpu)
uprobe-nop            (72 cpus):    7.781 ± 0.001M/s  (  0.108M/s/cpu)
uprobe-nop            (80 cpus):    8.992 ± 0.048M/s  (  0.112M/s/cpu)

AFTER
=====
uprobe-nop            ( 1 cpus):    3.534 ± 0.033M/s  (  3.534M/s/cpu)
uprobe-nop            ( 2 cpus):    6.701 ± 0.007M/s  (  3.351M/s/cpu)
uprobe-nop            ( 3 cpus):   10.031 ± 0.007M/s  (  3.344M/s/cpu)
uprobe-nop            ( 4 cpus):   13.003 ± 0.012M/s  (  3.251M/s/cpu)
uprobe-nop            ( 5 cpus):   16.274 ± 0.006M/s  (  3.255M/s/cpu)
uprobe-nop            ( 6 cpus):   19.563 ± 0.024M/s  (  3.261M/s/cpu)
uprobe-nop            ( 7 cpus):   22.696 ± 0.054M/s  (  3.242M/s/cpu)
uprobe-nop            ( 8 cpus):   24.534 ± 0.010M/s  (  3.067M/s/cpu)
uprobe-nop            (10 cpus):   30.475 ± 0.117M/s  (  3.047M/s/cpu)
uprobe-nop            (12 cpus):   33.371 ± 0.017M/s  (  2.781M/s/cpu)
uprobe-nop            (14 cpus):   38.864 ± 0.004M/s  (  2.776M/s/cpu)
uprobe-nop            (16 cpus):   41.476 ± 0.020M/s  (  2.592M/s/cpu)
uprobe-nop            (24 cpus):   64.696 ± 0.021M/s  (  2.696M/s/cpu)
uprobe-nop            (32 cpus):   85.054 ± 0.027M/s  (  2.658M/s/cpu)
uprobe-nop            (40 cpus):  101.979 ± 0.032M/s  (  2.549M/s/cpu)
uprobe-nop            (48 cpus):  110.518 ± 0.056M/s  (  2.302M/s/cpu)
uprobe-nop            (56 cpus):  117.737 ± 0.020M/s  (  2.102M/s/cpu)
uprobe-nop            (64 cpus):  124.613 ± 0.079M/s  (  1.947M/s/cpu)
uprobe-nop            (72 cpus):  133.239 ± 0.032M/s  (  1.851M/s/cpu)
uprobe-nop            (80 cpus):  142.037 ± 0.138M/s  (  1.775M/s/cpu)

Previously total throughput was maxing out at 11mln/s, and gradually
declining past 8 cores. With this change, it now keeps growing with each
added CPU, reaching 142mln/s at 80 CPUs (this was measured on a 80-core
Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz).

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/events/uprobes.c | 45 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c4da8f741f3a..3f5577d4032a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2295,6 +2295,47 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
 	return is_trap_insn(&opcode);
 }
 
+static struct uprobe *find_active_uprobe_speculative(unsigned long bp_vaddr)
+{
+	struct mm_struct *mm = current->mm;
+	struct uprobe *uprobe = NULL;
+	struct vm_area_struct *vma;
+	struct file *vm_file;
+	loff_t offset;
+	unsigned int seq;
+
+	guard(rcu)();
+
+	if (!mmap_lock_speculate_try_begin(mm, &seq))
+		return NULL;
+
+	vma = vma_lookup(mm, bp_vaddr);
+	if (!vma)
+		return NULL;
+
+	/*
+	 * vm_file memory can be reused for another instance of struct file,
+	 * but can't be freed from under us, so it's safe to read fields from
+	 * it, even if the values are some garbage values; ultimately
+	 * find_uprobe_rcu() + mmap_lock_speculation_end() check will ensure
+	 * that whatever we speculatively found is correct
+	 */
+	vm_file = READ_ONCE(vma->vm_file);
+	if (!vm_file)
+		return NULL;
+
+	offset = (loff_t)(vma->vm_pgoff << PAGE_SHIFT) + (bp_vaddr - vma->vm_start);
+	uprobe = find_uprobe_rcu(vm_file->f_inode, offset);
+	if (!uprobe)
+		return NULL;
+
+	/* now double check that nothing about MM changed */
+	if (mmap_lock_speculate_retry(mm, seq))
+		return NULL;
+
+	return uprobe;
+}
+
 /* assumes being inside RCU protected region */
 static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swbp)
 {
@@ -2302,6 +2343,10 @@ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swb
 	struct uprobe *uprobe = NULL;
 	struct vm_area_struct *vma;
 
+	uprobe = find_active_uprobe_speculative(bp_vaddr);
+	if (uprobe)
+		return uprobe;
+
 	mmap_read_lock(mm);
 	vma = vma_lookup(mm, bp_vaddr);
 	if (vma) {
-- 
2.43.5



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup
  2024-11-22  3:59 [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
  2024-11-22  3:59 ` [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
  2024-11-22  3:59 ` [PATCH v5 tip/perf/core 2/2] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Andrii Nakryiko
@ 2024-11-22 11:07 ` Peter Zijlstra
  2024-11-22 15:04   ` Suren Baghdasaryan
  2 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2024-11-22 11:07 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: linux-trace-kernel, linux-mm, akpm, mingo, torvalds, oleg,
	rostedt, mhiramat, bpf, linux-kernel, jolsa, paulmck, willy,
	surenb, mjguzik, brauner, jannh, mhocko, vbabka, shakeel.butt,
	hannes, lorenzo.stoakes, Liam.Howlett, david, arnd, viro, hca

On Thu, Nov 21, 2024 at 07:59:20PM -0800, Andrii Nakryiko wrote:

> Andrii Nakryiko (2):
>   uprobes: simplify find_active_uprobe_rcu() VMA checks
>   uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution

Thanks, assuming Suren is okay with me carrying his patches through tip,
I'll make this land in tip/perf/core after -rc1.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 2/2] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
  2024-11-22  3:59 ` [PATCH v5 tip/perf/core 2/2] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Andrii Nakryiko
@ 2024-11-22 14:56   ` Liam R. Howlett
  0 siblings, 0 replies; 12+ messages in thread
From: Liam R. Howlett @ 2024-11-22 14:56 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: linux-trace-kernel, linux-mm, akpm, peterz, mingo, torvalds,
	oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa, paulmck,
	willy, surenb, mjguzik, brauner, jannh, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, david, arnd, viro, hca

* Andrii Nakryiko <andrii@kernel.org> [241121 22:59]:
> Given filp_cachep is marked SLAB_TYPESAFE_BY_RCU (and FMODE_BACKING
> files, a special case, now goes through RCU-delated freeing), we can
> safely access vma->vm_file->f_inode field locklessly under just
> rcu_read_lock() protection, which enables looking up uprobe from
> uprobes_tree completely locklessly and speculatively without the need to
> acquire mmap_lock for reads. In most cases, anyway, assuming that there
> are no parallel mm and/or VMA modifications. The underlying struct
> file's memory won't go away from under us (even if struct file can be
> reused in the meantime).
> 
> We rely on newly added mmap_lock_speculate_{try_begin,retry}() helpers to
> validate that mm_struct stays intact for entire duration of this
> speculation. If not, we fall back to mmap_lock-protected lookup.
> The speculative logic is written in such a way that it will safely
> handle any garbage values that might be read from vma or file structs.
> 
> Benchmarking results speak for themselves.
> 
> BEFORE (latest tip/perf/core)
> =============================
> uprobe-nop            ( 1 cpus):    3.384 ± 0.004M/s  (  3.384M/s/cpu)
> uprobe-nop            ( 2 cpus):    5.456 ± 0.005M/s  (  2.728M/s/cpu)
> uprobe-nop            ( 3 cpus):    7.863 ± 0.015M/s  (  2.621M/s/cpu)
> uprobe-nop            ( 4 cpus):    9.442 ± 0.008M/s  (  2.360M/s/cpu)
> uprobe-nop            ( 5 cpus):   11.036 ± 0.013M/s  (  2.207M/s/cpu)
> uprobe-nop            ( 6 cpus):   10.884 ± 0.019M/s  (  1.814M/s/cpu)
> uprobe-nop            ( 7 cpus):    7.897 ± 0.145M/s  (  1.128M/s/cpu)
> uprobe-nop            ( 8 cpus):   10.021 ± 0.128M/s  (  1.253M/s/cpu)
> uprobe-nop            (10 cpus):    9.932 ± 0.170M/s  (  0.993M/s/cpu)
> uprobe-nop            (12 cpus):    8.369 ± 0.056M/s  (  0.697M/s/cpu)
> uprobe-nop            (14 cpus):    8.678 ± 0.017M/s  (  0.620M/s/cpu)
> uprobe-nop            (16 cpus):    7.392 ± 0.003M/s  (  0.462M/s/cpu)
> uprobe-nop            (24 cpus):    5.326 ± 0.178M/s  (  0.222M/s/cpu)
> uprobe-nop            (32 cpus):    5.426 ± 0.059M/s  (  0.170M/s/cpu)
> uprobe-nop            (40 cpus):    5.262 ± 0.070M/s  (  0.132M/s/cpu)
> uprobe-nop            (48 cpus):    6.121 ± 0.010M/s  (  0.128M/s/cpu)
> uprobe-nop            (56 cpus):    6.252 ± 0.035M/s  (  0.112M/s/cpu)
> uprobe-nop            (64 cpus):    7.644 ± 0.023M/s  (  0.119M/s/cpu)
> uprobe-nop            (72 cpus):    7.781 ± 0.001M/s  (  0.108M/s/cpu)
> uprobe-nop            (80 cpus):    8.992 ± 0.048M/s  (  0.112M/s/cpu)
> 
> AFTER
> =====
> uprobe-nop            ( 1 cpus):    3.534 ± 0.033M/s  (  3.534M/s/cpu)
> uprobe-nop            ( 2 cpus):    6.701 ± 0.007M/s  (  3.351M/s/cpu)
> uprobe-nop            ( 3 cpus):   10.031 ± 0.007M/s  (  3.344M/s/cpu)
> uprobe-nop            ( 4 cpus):   13.003 ± 0.012M/s  (  3.251M/s/cpu)
> uprobe-nop            ( 5 cpus):   16.274 ± 0.006M/s  (  3.255M/s/cpu)
> uprobe-nop            ( 6 cpus):   19.563 ± 0.024M/s  (  3.261M/s/cpu)
> uprobe-nop            ( 7 cpus):   22.696 ± 0.054M/s  (  3.242M/s/cpu)
> uprobe-nop            ( 8 cpus):   24.534 ± 0.010M/s  (  3.067M/s/cpu)
> uprobe-nop            (10 cpus):   30.475 ± 0.117M/s  (  3.047M/s/cpu)
> uprobe-nop            (12 cpus):   33.371 ± 0.017M/s  (  2.781M/s/cpu)
> uprobe-nop            (14 cpus):   38.864 ± 0.004M/s  (  2.776M/s/cpu)
> uprobe-nop            (16 cpus):   41.476 ± 0.020M/s  (  2.592M/s/cpu)
> uprobe-nop            (24 cpus):   64.696 ± 0.021M/s  (  2.696M/s/cpu)
> uprobe-nop            (32 cpus):   85.054 ± 0.027M/s  (  2.658M/s/cpu)
> uprobe-nop            (40 cpus):  101.979 ± 0.032M/s  (  2.549M/s/cpu)
> uprobe-nop            (48 cpus):  110.518 ± 0.056M/s  (  2.302M/s/cpu)
> uprobe-nop            (56 cpus):  117.737 ± 0.020M/s  (  2.102M/s/cpu)
> uprobe-nop            (64 cpus):  124.613 ± 0.079M/s  (  1.947M/s/cpu)
> uprobe-nop            (72 cpus):  133.239 ± 0.032M/s  (  1.851M/s/cpu)
> uprobe-nop            (80 cpus):  142.037 ± 0.138M/s  (  1.775M/s/cpu)
> 
> Previously total throughput was maxing out at 11mln/s, and gradually
> declining past 8 cores. With this change, it now keeps growing with each
> added CPU, reaching 142mln/s at 80 CPUs (this was measured on a 80-core
> Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz).
> 
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>

> ---
>  kernel/events/uprobes.c | 45 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 45 insertions(+)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index c4da8f741f3a..3f5577d4032a 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -2295,6 +2295,47 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
>  	return is_trap_insn(&opcode);
>  }
>  
> +static struct uprobe *find_active_uprobe_speculative(unsigned long bp_vaddr)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct uprobe *uprobe = NULL;
> +	struct vm_area_struct *vma;
> +	struct file *vm_file;
> +	loff_t offset;
> +	unsigned int seq;
> +
> +	guard(rcu)();
> +
> +	if (!mmap_lock_speculate_try_begin(mm, &seq))
> +		return NULL;
> +
> +	vma = vma_lookup(mm, bp_vaddr);
> +	if (!vma)
> +		return NULL;
> +
> +	/*
> +	 * vm_file memory can be reused for another instance of struct file,
> +	 * but can't be freed from under us, so it's safe to read fields from
> +	 * it, even if the values are some garbage values; ultimately
> +	 * find_uprobe_rcu() + mmap_lock_speculation_end() check will ensure
> +	 * that whatever we speculatively found is correct
> +	 */
> +	vm_file = READ_ONCE(vma->vm_file);
> +	if (!vm_file)
> +		return NULL;
> +
> +	offset = (loff_t)(vma->vm_pgoff << PAGE_SHIFT) + (bp_vaddr - vma->vm_start);
> +	uprobe = find_uprobe_rcu(vm_file->f_inode, offset);
> +	if (!uprobe)
> +		return NULL;
> +
> +	/* now double check that nothing about MM changed */
> +	if (mmap_lock_speculate_retry(mm, seq))
> +		return NULL;
> +
> +	return uprobe;
> +}
> +
>  /* assumes being inside RCU protected region */
>  static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swbp)
>  {
> @@ -2302,6 +2343,10 @@ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swb
>  	struct uprobe *uprobe = NULL;
>  	struct vm_area_struct *vma;
>  
> +	uprobe = find_active_uprobe_speculative(bp_vaddr);
> +	if (uprobe)
> +		return uprobe;
> +
>  	mmap_read_lock(mm);
>  	vma = vma_lookup(mm, bp_vaddr);
>  	if (vma) {
> -- 
> 2.43.5
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup
  2024-11-22 11:07 ` [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Peter Zijlstra
@ 2024-11-22 15:04   ` Suren Baghdasaryan
  2024-11-22 17:48     ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: Suren Baghdasaryan @ 2024-11-22 15:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, linux-trace-kernel, linux-mm, akpm, mingo,
	torvalds, oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa,
	paulmck, willy, mjguzik, brauner, jannh, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca

On Fri, Nov 22, 2024 at 3:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Nov 21, 2024 at 07:59:20PM -0800, Andrii Nakryiko wrote:
>
> > Andrii Nakryiko (2):
> >   uprobes: simplify find_active_uprobe_rcu() VMA checks
> >   uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
>
> Thanks, assuming Suren is okay with me carrying his patches through tip,
> I'll make this land in tip/perf/core after -rc1.

No objections from me. Thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup
  2024-11-22 15:04   ` Suren Baghdasaryan
@ 2024-11-22 17:48     ` Suren Baghdasaryan
  2024-11-23 20:35       ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Suren Baghdasaryan @ 2024-11-22 17:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrii Nakryiko, linux-trace-kernel, linux-mm, akpm, mingo,
	torvalds, oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa,
	paulmck, willy, mjguzik, brauner, jannh, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca

On Fri, Nov 22, 2024 at 7:04 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Fri, Nov 22, 2024 at 3:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Nov 21, 2024 at 07:59:20PM -0800, Andrii Nakryiko wrote:
> >
> > > Andrii Nakryiko (2):
> > >   uprobes: simplify find_active_uprobe_rcu() VMA checks
> > >   uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
> >
> > Thanks, assuming Suren is okay with me carrying his patches through tip,
> > I'll make this land in tip/perf/core after -rc1.
>
> No objections from me. Thanks!

I just fixed a build issue in one of my patches for an odd config, so
please use the latest version of the patchset from here:
https://lore.kernel.org/all/20241122174416.1367052-1-surenb@google.com/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup
  2024-11-22 17:48     ` Suren Baghdasaryan
@ 2024-11-23 20:35       ` Peter Zijlstra
  2024-11-24 19:56         ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2024-11-23 20:35 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrii Nakryiko, linux-trace-kernel, linux-mm, akpm, mingo,
	torvalds, oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa,
	paulmck, willy, mjguzik, brauner, jannh, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca

On Fri, Nov 22, 2024 at 09:48:11AM -0800, Suren Baghdasaryan wrote:
> On Fri, Nov 22, 2024 at 7:04 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Fri, Nov 22, 2024 at 3:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Nov 21, 2024 at 07:59:20PM -0800, Andrii Nakryiko wrote:
> > >
> > > > Andrii Nakryiko (2):
> > > >   uprobes: simplify find_active_uprobe_rcu() VMA checks
> > > >   uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
> > >
> > > Thanks, assuming Suren is okay with me carrying his patches through tip,
> > > I'll make this land in tip/perf/core after -rc1.
> >
> > No objections from me. Thanks!
> 
> I just fixed a build issue in one of my patches for an odd config, so
> please use the latest version of the patchset from here:
> https://lore.kernel.org/all/20241122174416.1367052-1-surenb@google.com/

updated, thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup
  2024-11-23 20:35       ` Peter Zijlstra
@ 2024-11-24 19:56         ` Suren Baghdasaryan
  0 siblings, 0 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2024-11-24 19:56 UTC (permalink / raw)
  To: akpm, Peter Zijlstra
  Cc: Andrii Nakryiko, linux-trace-kernel, linux-mm, mingo, torvalds,
	oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa, paulmck,
	willy, mjguzik, brauner, jannh, mhocko, vbabka, shakeel.butt,
	hannes, lorenzo.stoakes, Liam.Howlett, david, arnd, viro, hca

On Sat, Nov 23, 2024 at 12:35 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Nov 22, 2024 at 09:48:11AM -0800, Suren Baghdasaryan wrote:
> > On Fri, Nov 22, 2024 at 7:04 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 3:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Thu, Nov 21, 2024 at 07:59:20PM -0800, Andrii Nakryiko wrote:
> > > >
> > > > > Andrii Nakryiko (2):
> > > > >   uprobes: simplify find_active_uprobe_rcu() VMA checks
> > > > >   uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution
> > > >
> > > > Thanks, assuming Suren is okay with me carrying his patches through tip,
> > > > I'll make this land in tip/perf/core after -rc1.
> > >
> > > No objections from me. Thanks!
> >
> > I just fixed a build issue in one of my patches for an odd config, so
> > please use the latest version of the patchset from here:
> > https://lore.kernel.org/all/20241122174416.1367052-1-surenb@google.com/
>
> updated, thanks!

Hi Andrew,
I just noticed that patches from an old version of my patchset are
present in mm-unstable, more specifically these two:

fb23aacd2a14 "mm: convert mm_lock_seq to a proper seqcount"
549aeb99ccf1 "mm: introduce mmap_lock_speculation_{begin|end}"

Peter will be merging the latest version into tip/perf/core, so this
will cause a conflict at some point.
It should be easy to update them in mm-unstable to the latest version
[1]. I was able to revert the old ones and apply the new ones without
any merge conflict. If possible, please update them instead of
dropping them from mm-unstable. Some patches I'm preparing to post
have dependencies on this patchset.
Thanks,
Suren.

[1] https://lore.kernel.org/all/20241122174416.1367052-1-surenb@google.com/


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks
  2024-11-22  3:59 ` [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
@ 2024-11-26 22:19   ` Jann Horn
  2024-11-27  4:49     ` Andrii Nakryiko
  2024-11-27  8:01     ` Oleg Nesterov
  0 siblings, 2 replies; 12+ messages in thread
From: Jann Horn @ 2024-11-26 22:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: linux-trace-kernel, linux-mm, akpm, peterz, mingo, torvalds,
	oleg, rostedt, mhiramat, bpf, linux-kernel, jolsa, paulmck,
	willy, surenb, mjguzik, brauner, mhocko, vbabka, shakeel.butt,
	hannes, lorenzo.stoakes, Liam.Howlett, david, arnd, viro, hca

On Fri, Nov 22, 2024 at 4:59 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> At the point where find_active_uprobe_rcu() is used we know that VMA in
> question has triggered software breakpoint, so we don't need to validate
> vma->vm_flags. Keep only vma->vm_file NULL check.

How do we know that the VMA we find triggered a software breakpoint?
Between the time a software breakpoint was hit and the time we took
the mmap_read_lock(), the VMA could have been replaced with an
entirely different one, right?

I don't know this code well, and your change looks like it's probably
fine (since the file is just used to look up its inode in some tree,
and therefore for incompatible files, the lookup is guaranteed to fail
and nothing will happen). But I think the commit message looks dodgy.

> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Acked-by: Oleg Nesterov <oleg@redhat.com>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/events/uprobes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index a76ddc5fc982..c4da8f741f3a 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -2305,7 +2305,7 @@ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swb
>         mmap_read_lock(mm);
>         vma = vma_lookup(mm, bp_vaddr);
>         if (vma) {
> -               if (valid_vma(vma, false)) {
> +               if (vma->vm_file) {
>                         struct inode *inode = file_inode(vma->vm_file);
>                         loff_t offset = vaddr_to_offset(vma, bp_vaddr);
>
> --
> 2.43.5
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks
  2024-11-26 22:19   ` Jann Horn
@ 2024-11-27  4:49     ` Andrii Nakryiko
  2024-11-27  8:01     ` Oleg Nesterov
  1 sibling, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2024-11-27  4:49 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrii Nakryiko, linux-trace-kernel, linux-mm, akpm, peterz,
	mingo, torvalds, oleg, rostedt, mhiramat, bpf, linux-kernel,
	jolsa, paulmck, willy, surenb, mjguzik, brauner, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca

On Tue, Nov 26, 2024 at 2:20 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Nov 22, 2024 at 4:59 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > At the point where find_active_uprobe_rcu() is used we know that VMA in
> > question has triggered software breakpoint, so we don't need to validate
> > vma->vm_flags. Keep only vma->vm_file NULL check.
>
> How do we know that the VMA we find triggered a software breakpoint?
> Between the time a software breakpoint was hit and the time we took
> the mmap_read_lock(), the VMA could have been replaced with an
> entirely different one, right?

We need that VMA only to get inode + file offset, and whether it is
the original VMA with uprobe installed, or someone raced and replaced
it with some other VMA shouldn't matter. We either find uprobe at that
offset within that inode, or not. So this seems fine.

>
> I don't know this code well, and your change looks like it's probably
> fine (since the file is just used to look up its inode in some tree,
> and therefore for incompatible files, the lookup is guaranteed to fail
> and nothing will happen). But I think the commit message looks dodgy.
>
> > Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > Acked-by: Oleg Nesterov <oleg@redhat.com>
> > Suggested-by: Oleg Nesterov <oleg@redhat.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  kernel/events/uprobes.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > index a76ddc5fc982..c4da8f741f3a 100644
> > --- a/kernel/events/uprobes.c
> > +++ b/kernel/events/uprobes.c
> > @@ -2305,7 +2305,7 @@ static struct uprobe *find_active_uprobe_rcu(unsigned long bp_vaddr, int *is_swb
> >         mmap_read_lock(mm);
> >         vma = vma_lookup(mm, bp_vaddr);
> >         if (vma) {
> > -               if (valid_vma(vma, false)) {
> > +               if (vma->vm_file) {
> >                         struct inode *inode = file_inode(vma->vm_file);
> >                         loff_t offset = vaddr_to_offset(vma, bp_vaddr);
> >
> > --
> > 2.43.5
> >


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks
  2024-11-26 22:19   ` Jann Horn
  2024-11-27  4:49     ` Andrii Nakryiko
@ 2024-11-27  8:01     ` Oleg Nesterov
  1 sibling, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2024-11-27  8:01 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrii Nakryiko, linux-trace-kernel, linux-mm, akpm, peterz,
	mingo, torvalds, rostedt, mhiramat, bpf, linux-kernel, jolsa,
	paulmck, willy, surenb, mjguzik, brauner, mhocko, vbabka,
	shakeel.butt, hannes, lorenzo.stoakes, Liam.Howlett, david, arnd,
	viro, hca

On 11/26, Jann Horn wrote:
>
> On Fri, Nov 22, 2024 at 4:59 AM Andrii Nakryiko <andrii@kernel.org> wrote:
> > At the point where find_active_uprobe_rcu() is used we know that VMA in
> > question has triggered software breakpoint, so we don't need to validate
> > vma->vm_flags. Keep only vma->vm_file NULL check.
>
> How do we know that the VMA we find triggered a software breakpoint?
> Between the time a software breakpoint was hit and the time we took
> the mmap_read_lock(), the VMA could have been replaced with an
> entirely different one, right?

Right, but this doesn't really differ from the case when another thread
replaces (or even unmaps) this VMA after find_active_uprobe_rcu() drops
mm->mmap_lock and returns a found uprobe.

So I think this is fine.

Oleg.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-12-05 15:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-22  3:59 [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
2024-11-22  3:59 ` [PATCH v5 tip/perf/core 1/2] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
2024-11-26 22:19   ` Jann Horn
2024-11-27  4:49     ` Andrii Nakryiko
2024-11-27  8:01     ` Oleg Nesterov
2024-11-22  3:59 ` [PATCH v5 tip/perf/core 2/2] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Andrii Nakryiko
2024-11-22 14:56   ` Liam R. Howlett
2024-11-22 11:07 ` [PATCH v5 tip/perf/core 0/2] uprobes: speculative lockless VMA-to-uprobe lookup Peter Zijlstra
2024-11-22 15:04   ` Suren Baghdasaryan
2024-11-22 17:48     ` Suren Baghdasaryan
2024-11-23 20:35       ` Peter Zijlstra
2024-11-24 19:56         ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox