linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Yonghong Song <yhs@fb.com>, bpf@vger.kernel.org, linux-mm@kvack.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	kernel-team@fb.com, Luigi Rizzo <lrizzo@google.com>,
	liam.howlett@oracle.com, akpm@linux-foundation.org
Subject: Re: [PATCH mm/bpf v2] mm: bpf: add find_vma_no_check() without lockdep_assert on mm->mmap_lock
Date: Wed, 8 Sep 2021 14:20:17 +0200	[thread overview]
Message-ID: <ebcddf07-f329-05fa-8fdc-b2b9d6b0127b@iogearbox.net> (raw)
In-Reply-To: <20210908044427.3632119-1-yhs@fb.com>

On 9/8/21 6:44 AM, Yonghong Song wrote:
> Current bpf-next bpf selftest "get_stack_raw_tp" triggered the warning:
> 
>    [ 1411.304463] WARNING: CPU: 3 PID: 140 at include/linux/mmap_lock.h:164 find_vma+0x47/0xa0
>    [ 1411.304469] Modules linked in: bpf_testmod(O) [last unloaded: bpf_testmod]
>    [ 1411.304476] CPU: 3 PID: 140 Comm: systemd-journal Tainted: G        W  O      5.14.0+ #53
>    [ 1411.304479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>    [ 1411.304481] RIP: 0010:find_vma+0x47/0xa0
>    [ 1411.304484] Code: de 48 89 ef e8 ba f5 fe ff 48 85 c0 74 2e 48 83 c4 08 5b 5d c3 48 8d bf 28 01 00 00 be ff ff ff ff e8 2d 9f d8 00 85 c0 75 d4 <0f> 0b 48 89 de 48 8
>    [ 1411.304487] RSP: 0018:ffffabd440403db8 EFLAGS: 00010246
>    [ 1411.304490] RAX: 0000000000000000 RBX: 00007f00ad80a0e0 RCX: 0000000000000000
>    [ 1411.304492] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>    [ 1411.304494] RBP: ffff9cf5c2f50000 R08: ffff9cf5c3eb25d8 R09: 00000000fffffffe
>    [ 1411.304496] R10: 0000000000000001 R11: 00000000ef974e19 R12: ffff9cf5c39ae0e0
>    [ 1411.304498] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9cf5c39ae0e0
>    [ 1411.304501] FS:  00007f00ae754780(0000) GS:ffff9cf5fba00000(0000) knlGS:0000000000000000
>    [ 1411.304504] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    [ 1411.304506] CR2: 000000003e34343c CR3: 0000000103a98005 CR4: 0000000000370ee0
>    [ 1411.304508] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>    [ 1411.304510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>    [ 1411.304512] Call Trace:
>    [ 1411.304517]  stack_map_get_build_id_offset+0x17c/0x260
>    [ 1411.304528]  __bpf_get_stack+0x18f/0x230
>    [ 1411.304541]  bpf_get_stack_raw_tp+0x5a/0x70
>    [ 1411.305752] RAX: 0000000000000000 RBX: 5541f689495641d7 RCX: 0000000000000000
>    [ 1411.305756] RDX: 0000000000000001 RSI: ffffffff9776b144 RDI: ffffffff977e1b0e
>    [ 1411.305758] RBP: ffff9cf5c02b2f40 R08: ffff9cf5ca7606c0 R09: ffffcbd43ee02c04
>    [ 1411.306978]  bpf_prog_32007c34f7726d29_bpf_prog1+0xaf/0xd9c
>    [ 1411.307861] R10: 0000000000000001 R11: 0000000000000044 R12: ffff9cf5c2ef60e0
>    [ 1411.307865] R13: 0000000000000005 R14: 0000000000000000 R15: ffff9cf5c2ef6108
>    [ 1411.309074]  bpf_trace_run2+0x8f/0x1a0
>    [ 1411.309891] FS:  00007ff485141700(0000) GS:ffff9cf5fae00000(0000) knlGS:0000000000000000
>    [ 1411.309896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    [ 1411.311221]  syscall_trace_enter.isra.20+0x161/0x1f0
>    [ 1411.311600] CR2: 00007ff48514d90e CR3: 0000000107114001 CR4: 0000000000370ef0
>    [ 1411.312291]  do_syscall_64+0x15/0x80
>    [ 1411.312941] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>    [ 1411.313803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>    [ 1411.314223] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>    [ 1411.315082] RIP: 0033:0x7f00ad80a0e0
>    [ 1411.315626] Call Trace:
>    [ 1411.315632]  stack_map_get_build_id_offset+0x17c/0x260
> 
> To reproduce, first build `test_progs` binary:
>    make -C tools/testing/selftests/bpf -j60
> and then run the binary at tools/testing/selftests/bpf directory:
>    ./test_progs -t get_stack_raw_tp
> 
> The warning is due to commit 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
> which added mmap_assert_locked() in find_vma() function. The mmap_assert_locked() function
> asserts that mm->mmap_lock needs to be held. But this is not the case for
> bpf_get_stack() or bpf_get_stackid() helper (kernel/bpf/stackmap.c), which
> uses mmap_read_trylock_non_owner() instead. Since mm->mmap_lock is not held
> in bpf_get_stack[id]() use case, the above warning is emitted during test run.
> 
> This patch added function find_vma_no_check() which does not have mmap_assert_locked() call and
> bpf_get_stack[id]() helpers call find_vma_no_check() instead. This resolved the above warning.
> 
> I didn't use __find_vma() name because it has been used in drivers/gpu/drm/i915/i915_gpu_error.c:
>      static struct i915_vma_coredump *
>      __find_vma(struct i915_vma_coredump *vma, const char *name) { ... }
> 
> Cc: Luigi Rizzo <lrizzo@google.com>
> Fixes: 5b78ed24e8ec("mm/pagemap: add mmap_assert_locked() annotations to find_vma*()")
> Signed-off-by: Yonghong Song <yhs@fb.com>

Luigi / Liam / Andrew, if the below looks reasonable to you, any objections to route the
fix with your ACKs via bpf tree to Linus (or strong preference via -mm fixes)?

Thanks,
Daniel

> ---
>   include/linux/mm.h    | 2 ++
>   kernel/bpf/stackmap.c | 2 +-
>   mm/mmap.c             | 9 +++++++--
>   3 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 50e2c2914ac2..ba7a11d900f5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2669,6 +2669,8 @@ extern int expand_upwards(struct vm_area_struct *vma, unsigned long address);
>   #endif
>   
>   /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
> +extern struct vm_area_struct * find_vma_no_check(struct mm_struct * mm,
> +						 unsigned long addr);
>   extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long addr);
>   extern struct vm_area_struct * find_vma_prev(struct mm_struct * mm, unsigned long addr,
>   					     struct vm_area_struct **pprev);
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index e8eefdf8cf3e..838a2c141497 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -190,7 +190,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
>   	}
>   
>   	for (i = 0; i < trace_nr; i++) {
> -		vma = find_vma(current->mm, ips[i]);
> +		vma = find_vma_no_check(current->mm, ips[i]);
>   		if (!vma || build_id_parse(vma, id_offs[i].build_id, NULL)) {
>   			/* per entry fall back to ips */
>   			id_offs[i].status = BPF_STACK_BUILD_ID_IP;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 88dcc5c25225..8d5e340114f8 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2270,12 +2270,11 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>   EXPORT_SYMBOL(get_unmapped_area);
>   
>   /* Look up the first VMA which satisfies  addr < vm_end,  NULL if none. */
> -struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> +struct vm_area_struct *find_vma_no_check(struct mm_struct *mm, unsigned long addr)
>   {
>   	struct rb_node *rb_node;
>   	struct vm_area_struct *vma;
>   
> -	mmap_assert_locked(mm);
>   	/* Check the cache first. */
>   	vma = vmacache_find(mm, addr);
>   	if (likely(vma))
> @@ -2302,6 +2301,12 @@ struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
>   	return vma;
>   }
>   
> +struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
> +{
> +	mmap_assert_locked(mm);
> +	return find_vma_no_check(mm, addr);
> +}
> +
>   EXPORT_SYMBOL(find_vma);
>   
>   /*
> 



  reply	other threads:[~2021-09-08 12:20 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-08  4:44 Yonghong Song
2021-09-08 12:20 ` Daniel Borkmann [this message]
2021-09-08 13:53   ` Jason Gunthorpe
2021-09-08 14:15     ` Peter Zijlstra
2021-09-08 14:43       ` Luigi Rizzo
2021-09-08 15:12         ` Liam Howlett
2021-09-08 16:09           ` Yonghong Song
2021-09-08 17:09             ` Luigi Rizzo
2021-09-08 17:21               ` Alexei Starovoitov
2021-09-08 17:52                 ` Andrew Morton
2021-09-08 18:02                   ` Alexei Starovoitov
2021-09-08 18:15                     ` Andrew Morton
2021-09-08 18:20                       ` Alexei Starovoitov
2021-09-08 18:30                         ` Liam Howlett
2021-09-08 18:45                           ` Yonghong Song
2021-09-08 18:49                           ` Jason Gunthorpe
2021-09-08 19:11                             ` Yonghong Song
2021-09-08 23:33                               ` Jason Gunthorpe
2021-09-09  5:50                                 ` Yonghong Song
2021-09-09  8:05                           ` Peter Zijlstra
2021-09-08 18:43                     ` Liam Howlett
2021-09-08 19:42                       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebcddf07-f329-05fa-8fdc-b2b9d6b0127b@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=lrizzo@google.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox