linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
@ 2023-09-04 18:08 Joel Fernandes (Google)
  2023-09-04 18:08 ` [PATCH v3 2/2] rcu: Dump vmalloc memory info safely Joel Fernandes (Google)
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Joel Fernandes (Google) @ 2023-09-04 18:08 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, Paul E. McKenney, Vlastimil Babka
  Cc: Joel Fernandes (Google), Zhen Lei, rcu, Zqiang, stable, linux-mm

It is unsafe to dump vmalloc area information when trying to do so from
some contexts. Add a safer trylock version of the same function to do a
best-effort VMA finding and use it from vmalloc_dump_obj().

[applied test robot feedback on unused function fix.]
[applied Uladzislau feedback on locking.]

Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: rcu@vger.kernel.org
Cc: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/vmalloc.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 93cf99aba335..2c6a0e2ff404 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 #ifdef CONFIG_PRINTK
 bool vmalloc_dump_obj(void *object)
 {
-	struct vm_struct *vm;
 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
+	const void *caller;
+	struct vm_struct *vm;
+	struct vmap_area *va;
+	unsigned long addr;
+	unsigned int nr_pages;
 
-	vm = find_vm_area(objp);
-	if (!vm)
+	if (!spin_trylock(&vmap_area_lock))
+		return false;
+	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
+	if (!va) {
+		spin_unlock(&vmap_area_lock);
 		return false;
+	}
+
+	vm = va->vm;
+	if (!vm) {
+		spin_unlock(&vmap_area_lock);
+		return false;
+	}
+	addr = (unsigned long)vm->addr;
+	caller = vm->caller;
+	nr_pages = vm->nr_pages;
+	spin_unlock(&vmap_area_lock);
 	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
-		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
+		nr_pages, addr, caller);
 	return true;
 }
 #endif
-- 
2.42.0.283.g2d96d420d3-goog



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 2/2] rcu: Dump vmalloc memory info safely
  2023-09-04 18:08 [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Joel Fernandes (Google)
@ 2023-09-04 18:08 ` Joel Fernandes (Google)
  2023-09-05  7:00   ` Lorenzo Stoakes
  2023-09-05  7:09 ` [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Lorenzo Stoakes
  2023-09-07  6:53 ` Vlastimil Babka
  2 siblings, 1 reply; 17+ messages in thread
From: Joel Fernandes (Google) @ 2023-09-04 18:08 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Paul E. McKenney, Vlastimil Babka
  Cc: Zqiang, Zhen Lei, rcu, Matthew Wilcox, stable, Joel Fernandes, linux-mm

From: Zqiang <qiang.zhang1211@gmail.com>

Currently, for double invoke call_rcu(), will dump rcu_head objects
memory info, if the objects is not allocated from the slab allocator,
the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
need to be held, since the call_rcu() can be invoked in interrupt context,
therefore, there is a possibility of spinlock deadlock scenarios.

And in Preempt-RT kernel, the rcutorture test also trigger the following
lockdep warning:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
3 locks held by swapper/0/1:
 #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
 #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
 #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
irq event stamp: 565512
hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
Preemption disabled at:
[<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x68/0xb0
 dump_stack+0x14/0x20
 __might_resched+0x1aa/0x280
 ? __pfx_rcu_torture_err_cb+0x10/0x10
 rt_spin_lock+0x53/0x130
 ? find_vmap_area+0x1f/0x70
 find_vmap_area+0x1f/0x70
 vmalloc_dump_obj+0x20/0x60
 mem_dump_obj+0x22/0x90
 __call_rcu_common+0x5bf/0x940
 ? debug_smp_processor_id+0x1b/0x30
 call_rcu_hurry+0x14/0x20
 rcu_torture_init+0x1f82/0x2370
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_leak_cb+0x10/0x10
 ? __pfx_rcu_torture_init+0x10/0x10
 do_one_initcall+0x6c/0x300
 ? debug_smp_processor_id+0x1b/0x30
 kernel_init_freeable+0x2b9/0x540
 ? __pfx_kernel_init+0x10/0x10
 kernel_init+0x1f/0x150
 ret_from_fork+0x40/0x50
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

The previous patch fixes this by using the deadlock-safe best-effort
version of find_vm_area. However, in case of failure print the fact that
the pointer was a vmalloc pointer so that we print at least something.

Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: rcu@vger.kernel.org
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
Cc: stable@vger.kernel.org
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/util.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/util.c b/mm/util.c
index dd12b9531ac4..406634f26918 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
 	if (vmalloc_dump_obj(object))
 		return;
 
-	if (virt_addr_valid(object))
+	if (is_vmalloc_addr(object))
+		type = "vmalloc memory";
+	else if (virt_addr_valid(object))
 		type = "non-slab/vmalloc memory";
 	else if (object == NULL)
 		type = "NULL pointer";
-- 
2.42.0.283.g2d96d420d3-goog



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] rcu: Dump vmalloc memory info safely
  2023-09-04 18:08 ` [PATCH v3 2/2] rcu: Dump vmalloc memory info safely Joel Fernandes (Google)
@ 2023-09-05  7:00   ` Lorenzo Stoakes
  2023-09-05 11:48     ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-09-05  7:00 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Vlastimil Babka,
	Zqiang, Zhen Lei, rcu, Matthew Wilcox, stable, linux-mm

On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> From: Zqiang <qiang.zhang1211@gmail.com>
>
> Currently, for double invoke call_rcu(), will dump rcu_head objects
> memory info, if the objects is not allocated from the slab allocator,
> the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> need to be held, since the call_rcu() can be invoked in interrupt context,
> therefore, there is a possibility of spinlock deadlock scenarios.
>
> And in Preempt-RT kernel, the rcutorture test also trigger the following
> lockdep warning:
>
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 1, expected: 0
> RCU nest depth: 1, expected: 1
> 3 locks held by swapper/0/1:
>  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
>  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
>  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> irq event stamp: 565512
> hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> Preemption disabled at:
> [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x68/0xb0
>  dump_stack+0x14/0x20
>  __might_resched+0x1aa/0x280
>  ? __pfx_rcu_torture_err_cb+0x10/0x10
>  rt_spin_lock+0x53/0x130
>  ? find_vmap_area+0x1f/0x70
>  find_vmap_area+0x1f/0x70
>  vmalloc_dump_obj+0x20/0x60
>  mem_dump_obj+0x22/0x90
>  __call_rcu_common+0x5bf/0x940
>  ? debug_smp_processor_id+0x1b/0x30
>  call_rcu_hurry+0x14/0x20
>  rcu_torture_init+0x1f82/0x2370
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>  ? __pfx_rcu_torture_init+0x10/0x10
>  do_one_initcall+0x6c/0x300
>  ? debug_smp_processor_id+0x1b/0x30
>  kernel_init_freeable+0x2b9/0x540
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x1f/0x150
>  ret_from_fork+0x40/0x50
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
>
> The previous patch fixes this by using the deadlock-safe best-effort
> version of find_vm_area. However, in case of failure print the fact that
> the pointer was a vmalloc pointer so that we print at least something.
>
> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: rcu@vger.kernel.org
> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> Cc: stable@vger.kernel.org
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  mm/util.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/util.c b/mm/util.c
> index dd12b9531ac4..406634f26918 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
>  	if (vmalloc_dump_obj(object))
>  		return;
>
> -	if (virt_addr_valid(object))
> +	if (is_vmalloc_addr(object))
> +		type = "vmalloc memory";
> +	else if (virt_addr_valid(object))
>  		type = "non-slab/vmalloc memory";

I think you should update this to say non-slab/non-vmalloc memory (as much
as that description sucks!) as this phrasing in the past meant to say
'non-slab or vmalloc memory' (already confusing phrasing) so better to be
clear.

>  	else if (object == NULL)
>  		type = "NULL pointer";
> --
> 2.42.0.283.g2d96d420d3-goog
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-04 18:08 [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Joel Fernandes (Google)
  2023-09-04 18:08 ` [PATCH v3 2/2] rcu: Dump vmalloc memory info safely Joel Fernandes (Google)
@ 2023-09-05  7:09 ` Lorenzo Stoakes
  2023-09-05 11:47   ` Joel Fernandes
  2023-09-07  6:53 ` Vlastimil Babka
  2 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-09-05  7:09 UTC (permalink / raw)
  To: Joel Fernandes (Google)
  Cc: linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Paul E. McKenney, Vlastimil Babka, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> It is unsafe to dump vmalloc area information when trying to do so from
> some contexts. Add a safer trylock version of the same function to do a
> best-effort VMA finding and use it from vmalloc_dump_obj().

It'd be nice to have more details as to precisely which contexts and what this
resolves.

>
> [applied test robot feedback on unused function fix.]
> [applied Uladzislau feedback on locking.]
>
> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: rcu@vger.kernel.org
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> Cc: stable@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  mm/vmalloc.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 93cf99aba335..2c6a0e2ff404 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  #ifdef CONFIG_PRINTK
>  bool vmalloc_dump_obj(void *object)
>  {
> -	struct vm_struct *vm;
>  	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> +	const void *caller;
> +	struct vm_struct *vm;
> +	struct vmap_area *va;
> +	unsigned long addr;
> +	unsigned int nr_pages;
>
> -	vm = find_vm_area(objp);
> -	if (!vm)
> +	if (!spin_trylock(&vmap_area_lock))
> +		return false;

It'd be good to have a comment here explaining why we must trylock here. I am
also concerned that in the past this function would return false only if the
address was not a vmalloc one, but now it might just return false due to lock
contention and the user has no idea which it is?

I'd want to at least output "vmalloc region cannot lookup lock contention"
vs. the below cannot find case.

Under heavy lock contention aren't you potentially breaking the ability to
introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
contexts under which acquiring this spinlock is not appropriate?

> +	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> +	if (!va) {
> +		spin_unlock(&vmap_area_lock);
>  		return false;
> +	}
> +
> +	vm = va->vm;
> +	if (!vm) {
> +		spin_unlock(&vmap_area_lock);
> +		return false;
> +	}
> +	addr = (unsigned long)vm->addr;
> +	caller = vm->caller;
> +	nr_pages = vm->nr_pages;
> +	spin_unlock(&vmap_area_lock);
>  	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> -		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> +		nr_pages, addr, caller);
>  	return true;
>  }
>  #endif
> --
> 2.42.0.283.g2d96d420d3-goog
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-05  7:09 ` [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Lorenzo Stoakes
@ 2023-09-05 11:47   ` Joel Fernandes
  2023-09-06 19:23     ` Lorenzo Stoakes
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2023-09-05 11:47 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Paul E. McKenney, Vlastimil Babka, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > It is unsafe to dump vmalloc area information when trying to do so from
> > some contexts. Add a safer trylock version of the same function to do a
> > best-effort VMA finding and use it from vmalloc_dump_obj().
> 
> It'd be nice to have more details as to precisely which contexts and what this
> resolves.

True. I was hoping the 'trylock' mention would be sufficient (example hardirq
context interrupting a lock-held region) but you're right.

> > [applied test robot feedback on unused function fix.]
> > [applied Uladzislau feedback on locking.]
> >
> > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Cc: rcu@vger.kernel.org
> > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 93cf99aba335..2c6a0e2ff404 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> >  #ifdef CONFIG_PRINTK
> >  bool vmalloc_dump_obj(void *object)
> >  {
> > -	struct vm_struct *vm;
> >  	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > +	const void *caller;
> > +	struct vm_struct *vm;
> > +	struct vmap_area *va;
> > +	unsigned long addr;
> > +	unsigned int nr_pages;
> >
> > -	vm = find_vm_area(objp);
> > -	if (!vm)
> > +	if (!spin_trylock(&vmap_area_lock))
> > +		return false;
> 
> It'd be good to have a comment here explaining why we must trylock here. I am
> also concerned that in the past this function would return false only if the
> address was not a vmalloc one, but now it might just return false due to lock
> contention and the user has no idea which it is?
> 
> I'd want to at least output "vmalloc region cannot lookup lock contention"
> vs. the below cannot find case.

In the patch 2/2 we do print if the address looks like a vmalloc address even
if the vmalloc look up fails.

Also the reporter's usecase is not a common one. We only attempt to dump
information if there was a debug objects failure (example if somebody did a
double call_rcu). In such a situation, the patch will prevent a deadlock and
still print something about the address.

> Under heavy lock contention aren't you potentially breaking the ability to
> introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> contexts under which acquiring this spinlock is not appropriate?

Yes this is a good point, but there's another case as well: PREEMPT_RT can
sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
call_rcu() as it may be called in contexts that cannot sleep. So we handle
that also using trylock.

Thanks for the review!

 - Joel


> 
> > +	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> > +	if (!va) {
> > +		spin_unlock(&vmap_area_lock);
> >  		return false;
> > +	}
> > +
> > +	vm = va->vm;
> > +	if (!vm) {
> > +		spin_unlock(&vmap_area_lock);
> > +		return false;
> > +	}
> > +	addr = (unsigned long)vm->addr;
> > +	caller = vm->caller;
> > +	nr_pages = vm->nr_pages;
> > +	spin_unlock(&vmap_area_lock);
> >  	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > -		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> > +		nr_pages, addr, caller);
> >  	return true;
> >  }
> >  #endif
> > --
> > 2.42.0.283.g2d96d420d3-goog
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] rcu: Dump vmalloc memory info safely
  2023-09-05  7:00   ` Lorenzo Stoakes
@ 2023-09-05 11:48     ` Joel Fernandes
  2023-09-06 19:18       ` Lorenzo Stoakes
  0 siblings, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2023-09-05 11:48 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Vlastimil Babka,
	Zqiang, Zhen Lei, rcu, Matthew Wilcox, stable, linux-mm

On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
> On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> > From: Zqiang <qiang.zhang1211@gmail.com>
> >
> > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > memory info, if the objects is not allocated from the slab allocator,
> > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > need to be held, since the call_rcu() can be invoked in interrupt context,
> > therefore, there is a possibility of spinlock deadlock scenarios.
> >
> > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > lockdep warning:
> >
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 1, expected: 0
> > RCU nest depth: 1, expected: 1
> > 3 locks held by swapper/0/1:
> >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
> >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
> >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> > irq event stamp: 565512
> > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> > Preemption disabled at:
> > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x68/0xb0
> >  dump_stack+0x14/0x20
> >  __might_resched+0x1aa/0x280
> >  ? __pfx_rcu_torture_err_cb+0x10/0x10
> >  rt_spin_lock+0x53/0x130
> >  ? find_vmap_area+0x1f/0x70
> >  find_vmap_area+0x1f/0x70
> >  vmalloc_dump_obj+0x20/0x60
> >  mem_dump_obj+0x22/0x90
> >  __call_rcu_common+0x5bf/0x940
> >  ? debug_smp_processor_id+0x1b/0x30
> >  call_rcu_hurry+0x14/0x20
> >  rcu_torture_init+0x1f82/0x2370
> >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >  ? __pfx_rcu_torture_init+0x10/0x10
> >  do_one_initcall+0x6c/0x300
> >  ? debug_smp_processor_id+0x1b/0x30
> >  kernel_init_freeable+0x2b9/0x540
> >  ? __pfx_kernel_init+0x10/0x10
> >  kernel_init+0x1f/0x150
> >  ret_from_fork+0x40/0x50
> >  ? __pfx_kernel_init+0x10/0x10
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> >
> > The previous patch fixes this by using the deadlock-safe best-effort
> > version of find_vm_area. However, in case of failure print the fact that
> > the pointer was a vmalloc pointer so that we print at least something.
> >
> > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > Cc: Paul E. McKenney <paulmck@kernel.org>
> > Cc: rcu@vger.kernel.org
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  mm/util.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/util.c b/mm/util.c
> > index dd12b9531ac4..406634f26918 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
> >  	if (vmalloc_dump_obj(object))
> >  		return;
> >
> > -	if (virt_addr_valid(object))
> > +	if (is_vmalloc_addr(object))
> > +		type = "vmalloc memory";
> > +	else if (virt_addr_valid(object))
> >  		type = "non-slab/vmalloc memory";
> 
> I think you should update this to say non-slab/non-vmalloc memory (as much
> as that description sucks!) as this phrasing in the past meant to say
> 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
> clear.

True, though the issue you mentioned it is in existing code, a respin of this
patch could update it to say non-vmalloc. Good point, thanks for reviewing!

 - Joel



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] rcu: Dump vmalloc memory info safely
  2023-09-05 11:48     ` Joel Fernandes
@ 2023-09-06 19:18       ` Lorenzo Stoakes
  2023-09-07  7:10         ` Vlastimil Babka
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-09-06 19:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Vlastimil Babka,
	Zqiang, Zhen Lei, rcu, Matthew Wilcox, stable, linux-mm

On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> > > From: Zqiang <qiang.zhang1211@gmail.com>
> > >
> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
> > > memory info, if the objects is not allocated from the slab allocator,
> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> > > need to be held, since the call_rcu() can be invoked in interrupt context,
> > > therefore, there is a possibility of spinlock deadlock scenarios.
> > >
> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
> > > lockdep warning:
> > >
> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > > preempt_count: 1, expected: 0
> > > RCU nest depth: 1, expected: 1
> > > 3 locks held by swapper/0/1:
> > >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
> > >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
> > >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> > > irq event stamp: 565512
> > > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> > > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> > > Preemption disabled at:
> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0x68/0xb0
> > >  dump_stack+0x14/0x20
> > >  __might_resched+0x1aa/0x280
> > >  ? __pfx_rcu_torture_err_cb+0x10/0x10
> > >  rt_spin_lock+0x53/0x130
> > >  ? find_vmap_area+0x1f/0x70
> > >  find_vmap_area+0x1f/0x70
> > >  vmalloc_dump_obj+0x20/0x60
> > >  mem_dump_obj+0x22/0x90
> > >  __call_rcu_common+0x5bf/0x940
> > >  ? debug_smp_processor_id+0x1b/0x30
> > >  call_rcu_hurry+0x14/0x20
> > >  rcu_torture_init+0x1f82/0x2370
> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> > >  ? __pfx_rcu_torture_init+0x10/0x10
> > >  do_one_initcall+0x6c/0x300
> > >  ? debug_smp_processor_id+0x1b/0x30
> > >  kernel_init_freeable+0x2b9/0x540
> > >  ? __pfx_kernel_init+0x10/0x10
> > >  kernel_init+0x1f/0x150
> > >  ret_from_fork+0x40/0x50
> > >  ? __pfx_kernel_init+0x10/0x10
> > >  ret_from_fork_asm+0x1b/0x30
> > >  </TASK>
> > >
> > > The previous patch fixes this by using the deadlock-safe best-effort
> > > version of find_vm_area. However, in case of failure print the fact that
> > > the pointer was a vmalloc pointer so that we print at least something.
> > >
> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: rcu@vger.kernel.org
> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  mm/util.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/util.c b/mm/util.c
> > > index dd12b9531ac4..406634f26918 100644
> > > --- a/mm/util.c
> > > +++ b/mm/util.c
> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
> > >     if (vmalloc_dump_obj(object))
> > >             return;
> > >
> > > -   if (virt_addr_valid(object))
> > > +   if (is_vmalloc_addr(object))
> > > +           type = "vmalloc memory";
> > > +   else if (virt_addr_valid(object))
> > >             type = "non-slab/vmalloc memory";
> >
> > I think you should update this to say non-slab/non-vmalloc memory (as much
> > as that description sucks!) as this phrasing in the past meant to say
> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
> > clear.
>
> True, though the issue you mentioned it is in existing code, a respin of this
> patch could update it to say non-vmalloc. Good point, thanks for reviewing!

No it's not, you're changing the meaning, because you changed the code
that determines the output...

This has been merged now despite my outstanding comments (!) so I
guess I'll have to send a follow up patch to address this.

>
>  - Joel
>



--
Lorenzo Stoakes
https://ljs.io


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-05 11:47   ` Joel Fernandes
@ 2023-09-06 19:23     ` Lorenzo Stoakes
  2023-09-06 19:46       ` Lorenzo Stoakes
  2023-09-06 22:46       ` Joel Fernandes
  0 siblings, 2 replies; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-09-06 19:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Paul E. McKenney, Vlastimil Babka, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > It is unsafe to dump vmalloc area information when trying to do so from
> > > some contexts. Add a safer trylock version of the same function to do a
> > > best-effort VMA finding and use it from vmalloc_dump_obj().
> >
> > It'd be nice to have more details as to precisely which contexts and what this
> > resolves.
>
> True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> context interrupting a lock-held region) but you're right.
>
> > > [applied test robot feedback on unused function fix.]
> > > [applied Uladzislau feedback on locking.]
> > >
> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: rcu@vger.kernel.org
> > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > ---
> > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 93cf99aba335..2c6a0e2ff404 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > >  #ifdef CONFIG_PRINTK
> > >  bool vmalloc_dump_obj(void *object)
> > >  {
> > > -   struct vm_struct *vm;
> > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > +   const void *caller;
> > > +   struct vm_struct *vm;
> > > +   struct vmap_area *va;
> > > +   unsigned long addr;
> > > +   unsigned int nr_pages;
> > >
> > > -   vm = find_vm_area(objp);
> > > -   if (!vm)
> > > +   if (!spin_trylock(&vmap_area_lock))
> > > +           return false;
> >
> > It'd be good to have a comment here explaining why we must trylock here. I am
> > also concerned that in the past this function would return false only if the
> > address was not a vmalloc one, but now it might just return false due to lock
> > contention and the user has no idea which it is?
> >
> > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > vs. the below cannot find case.
>
> In the patch 2/2 we do print if the address looks like a vmalloc address even
> if the vmalloc look up fails.

No, you output exactly what was output before, only changing what it
means and in no way differentiating between couldn't find vmalloc
area/couldn't get lock.

>
> Also the reporter's usecase is not a common one. We only attempt to dump
> information if there was a debug objects failure (example if somebody did a
> double call_rcu). In such a situation, the patch will prevent a deadlock and
> still print something about the address.

Right, but the function still purports to do X but does Y.

>
> > Under heavy lock contention aren't you potentially breaking the ability to
> > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > contexts under which acquiring this spinlock is not appropriate?
>
> Yes this is a good point, but there's another case as well: PREEMPT_RT can
> sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> call_rcu() as it may be called in contexts that cannot sleep. So we handle
> that also using trylock.

Right so somebody now has to find this email to realise that. I hate
implicit knowledge like this, it needs a comment. It also furthers the
point that it'd be useful to differentiate between the two.

>
> Thanks for the review!

This got merged despite my outstanding comments so I guess I'll have
to follow up with a patch.

>
>  - Joel
>
>
> >
> > > +   va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> > > +   if (!va) {
> > > +           spin_unlock(&vmap_area_lock);
> > >             return false;
> > > +   }
> > > +
> > > +   vm = va->vm;
> > > +   if (!vm) {
> > > +           spin_unlock(&vmap_area_lock);
> > > +           return false;
> > > +   }
> > > +   addr = (unsigned long)vm->addr;
> > > +   caller = vm->caller;
> > > +   nr_pages = vm->nr_pages;
> > > +   spin_unlock(&vmap_area_lock);
> > >     pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > > -           vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> > > +           nr_pages, addr, caller);
> > >     return true;
> > >  }
> > >  #endif
> > > --
> > > 2.42.0.283.g2d96d420d3-goog
> > >

This reads like another 'nice review and I agree but I won't change
anything!'...


-- 
Lorenzo Stoakes
https://ljs.io


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-06 19:23     ` Lorenzo Stoakes
@ 2023-09-06 19:46       ` Lorenzo Stoakes
  2023-09-06 22:46       ` Joel Fernandes
  1 sibling, 0 replies; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-09-06 19:46 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Paul E. McKenney, Vlastimil Babka, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > some contexts. Add a safer trylock version of the same function to do a
> > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > >
> > > It'd be nice to have more details as to precisely which contexts and what this
> > > resolves.
> >
> > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > context interrupting a lock-held region) but you're right.
> >
> > > > [applied test robot feedback on unused function fix.]
> > > > [applied Uladzislau feedback on locking.]
> > > >
> > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: rcu@vger.kernel.org
> > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > >  #ifdef CONFIG_PRINTK
> > > >  bool vmalloc_dump_obj(void *object)
> > > >  {
> > > > -   struct vm_struct *vm;
> > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > +   const void *caller;
> > > > +   struct vm_struct *vm;
> > > > +   struct vmap_area *va;
> > > > +   unsigned long addr;
> > > > +   unsigned int nr_pages;
> > > >
> > > > -   vm = find_vm_area(objp);
> > > > -   if (!vm)
> > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > +           return false;
> > >
> > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > also concerned that in the past this function would return false only if the
> > > address was not a vmalloc one, but now it might just return false due to lock
> > > contention and the user has no idea which it is?
> > >
> > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > vs. the below cannot find case.
> >
> > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > if the vmalloc look up fails.
>
> No, you output exactly what was output before, only changing what it
> means and in no way differentiating between couldn't find vmalloc
> area/couldn't get lock.
>
> >
> > Also the reporter's usecase is not a common one. We only attempt to dump
> > information if there was a debug objects failure (example if somebody did a
> > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > still print something about the address.
>
> Right, but the function still purports to do X but does Y.
>
> >
> > > Under heavy lock contention aren't you potentially breaking the ability to
> > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > contexts under which acquiring this spinlock is not appropriate?
> >
> > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > that also using trylock.
>
> Right so somebody now has to find this email to realise that. I hate
> implicit knowledge like this, it needs a comment. It also furthers the
> point that it'd be useful to differentiate between the two.
>
> >
> > Thanks for the review!
>
> This got merged despite my outstanding comments so I guess I'll have
> to follow up with a patch.
>
> >
> >  - Joel
> >
> >
> > >
> > > > +   va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> > > > +   if (!va) {
> > > > +           spin_unlock(&vmap_area_lock);
> > > >             return false;
> > > > +   }
> > > > +
> > > > +   vm = va->vm;
> > > > +   if (!vm) {
> > > > +           spin_unlock(&vmap_area_lock);
> > > > +           return false;
> > > > +   }
> > > > +   addr = (unsigned long)vm->addr;
> > > > +   caller = vm->caller;
> > > > +   nr_pages = vm->nr_pages;
> > > > +   spin_unlock(&vmap_area_lock);
> > > >     pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> > > > -           vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> > > > +           nr_pages, addr, caller);
> > > >     return true;
> > > >  }
> > > >  #endif
> > > > --
> > > > 2.42.0.283.g2d96d420d3-goog
> > > >
>
> This reads like another 'nice review and I agree but I won't change
> anything!'...
>

Sorry I actually wrote this unkind comment in a moment of annoyance then
meant to delete it but of course forgot to :>) Disregard this bit.

Happy for pushback/disagreement, just feel like a few little touchups would
have helped improve documentation/clarity of what this series does.

Obviously stability matters more so perhaps touch-ups best as a follow up
series... though would be nice to have a comment to that effect.

Thanks!


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-06 19:23     ` Lorenzo Stoakes
  2023-09-06 19:46       ` Lorenzo Stoakes
@ 2023-09-06 22:46       ` Joel Fernandes
  2023-09-07  7:11         ` Lorenzo Stoakes
  1 sibling, 1 reply; 17+ messages in thread
From: Joel Fernandes @ 2023-09-06 22:46 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Paul E. McKenney, Vlastimil Babka, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > some contexts. Add a safer trylock version of the same function to do a
> > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > >
> > > It'd be nice to have more details as to precisely which contexts and what this
> > > resolves.
> >
> > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > context interrupting a lock-held region) but you're right.
> >
> > > > [applied test robot feedback on unused function fix.]
> > > > [applied Uladzislau feedback on locking.]
> > > >
> > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: rcu@vger.kernel.org
> > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > ---
> > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > >  #ifdef CONFIG_PRINTK
> > > >  bool vmalloc_dump_obj(void *object)
> > > >  {
> > > > -   struct vm_struct *vm;
> > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > +   const void *caller;
> > > > +   struct vm_struct *vm;
> > > > +   struct vmap_area *va;
> > > > +   unsigned long addr;
> > > > +   unsigned int nr_pages;
> > > >
> > > > -   vm = find_vm_area(objp);
> > > > -   if (!vm)
> > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > +           return false;
> > >
> > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > also concerned that in the past this function would return false only if the
> > > address was not a vmalloc one, but now it might just return false due to lock
> > > contention and the user has no idea which it is?
> > >
> > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > vs. the below cannot find case.
> >
> > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > if the vmalloc look up fails.
> 
> No, you output exactly what was output before, only changing what it
> means and in no way differentiating between couldn't find vmalloc
> area/couldn't get lock.

2/2 does this:
                         -     if (virt_addr_valid(object))
                         +     if (is_vmalloc_addr(object))
                         +             type = "vmalloc memory";
                         +     else if (virt_addr_valid(object))
                                       type = "non-slab/vmalloc memory";

This code is executed only if vmalloc_dump_obj() returns false. The
is_vmalloc_addr() was added by 2/2 which is newly added right?

You are right we are not differentiating between trylock failure and failure to
find the vmalloc area. I was just saying, even though we don't differentiate,
we do print "vmalloc memory" right? That wasn't being printed before.

> > Also the reporter's usecase is not a common one. We only attempt to dump
> > information if there was a debug objects failure (example if somebody did a
> > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > still print something about the address.
> 
> Right, but the function still purports to do X but does Y.
> 
> >
> > > Under heavy lock contention aren't you potentially breaking the ability to
> > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > contexts under which acquiring this spinlock is not appropriate?
> >
> > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > that also using trylock.
> 
> Right so somebody now has to find this email to realise that. I hate
> implicit knowledge like this, it needs a comment. It also furthers the
> point that it'd be useful to differentiate between the two.

This is a valid point, and I acknowledged it in last email. A code comment could
indeed be useful.

So I guess from an agreement standpoint, I agree:

1/2 could use an additional comment explaining why we need trylock (sighting
the RT sleeping lock issue).

2/2 could update the existing code to convert "non-slab/vmalloc" to
"non-slab/non-vmalloc". Note: that's an *existing* issue.

The issue in 2/2 is not a new one so that can certainly be a separate patch.
And while at it, we could update the comment in that patch as well.

But the whole differentiating between trylock vs vmalloc area lookup failure
is not that useful -- just my opinion fwiw! I honestly feel differentiating
between trylock vs vmalloc area lookup failure complicates the code because
it will require passing this information down from vmalloc_dump_obj() to the
caller AFAICS and I am not sure if the person reading the debug will really
care much. But I am OK with whatever the -mm community wants and I am happy
to send out a new patch on top with the above that I agree on since Andrew
took these 2 (but for the stuff I don't agree, I would appreciate if you
could send a patch for review and I am happy to review it!).

As you mentioned, this series is a stability fix and we can put touch-ups on
top of it if needed, and there is also plenty of time till the next merge
window. Allow me a few days and I'll do the new patch on top (I'd say dont
bother to spend your time on it, I'll do it).

thanks,

 - Joel


> 
> 
> -- 
> Lorenzo Stoakes
> https://ljs.io


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-04 18:08 [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Joel Fernandes (Google)
  2023-09-04 18:08 ` [PATCH v3 2/2] rcu: Dump vmalloc memory info safely Joel Fernandes (Google)
  2023-09-05  7:09 ` [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Lorenzo Stoakes
@ 2023-09-07  6:53 ` Vlastimil Babka
  2023-09-08  0:47   ` Joel Fernandes
  2 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2023-09-07  6:53 UTC (permalink / raw)
  To: Joel Fernandes (Google),
	linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, Paul E. McKenney
  Cc: Zhen Lei, rcu, Zqiang, stable, linux-mm

Hi,

On 9/4/23 20:08, Joel Fernandes (Google) wrote:
> It is unsafe to dump vmalloc area information when trying to do so from
> some contexts. Add a safer trylock version of the same function to do a
> best-effort VMA finding and use it from vmalloc_dump_obj().

I was a bit confused by the subject which suggests a new function is added,
but it seems open-coded in its only caller. I assume it's due to evolution
of the series. Something like:

mm/vmalloc: use trylock for vmap_area_lock in vmalloc_dump_obj()

?

I also notice it's trying hard to copy everything from "vm" to temporary
variables before unlocking, presumably to prevent use-after-free, so should
that be also mentioned in the changelog?

> [applied test robot feedback on unused function fix.]
> [applied Uladzislau feedback on locking.]
> 
> Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: rcu@vger.kernel.org
> Cc: Zqiang <qiang.zhang1211@gmail.com>
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> Cc: stable@vger.kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
>  mm/vmalloc.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 93cf99aba335..2c6a0e2ff404 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
>  #ifdef CONFIG_PRINTK
>  bool vmalloc_dump_obj(void *object)
>  {
> -	struct vm_struct *vm;
>  	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> +	const void *caller;
> +	struct vm_struct *vm;
> +	struct vmap_area *va;
> +	unsigned long addr;
> +	unsigned int nr_pages;
>  
> -	vm = find_vm_area(objp);
> -	if (!vm)
> +	if (!spin_trylock(&vmap_area_lock))
> +		return false;
> +	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
> +	if (!va) {
> +		spin_unlock(&vmap_area_lock);
>  		return false;
> +	}
> +
> +	vm = va->vm;
> +	if (!vm) {
> +		spin_unlock(&vmap_area_lock);
> +		return false;
> +	}
> +	addr = (unsigned long)vm->addr;
> +	caller = vm->caller;
> +	nr_pages = vm->nr_pages;
> +	spin_unlock(&vmap_area_lock);
>  	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
> -		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
> +		nr_pages, addr, caller);
>  	return true;
>  }
>  #endif



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] rcu: Dump vmalloc memory info safely
  2023-09-06 19:18       ` Lorenzo Stoakes
@ 2023-09-07  7:10         ` Vlastimil Babka
  2023-09-08  0:26           ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Vlastimil Babka @ 2023-09-07  7:10 UTC (permalink / raw)
  To: Lorenzo Stoakes, Joel Fernandes
  Cc: linux-kernel, Andrew Morton, Paul E. McKenney, Zqiang, Zhen Lei,
	rcu, Matthew Wilcox, stable, linux-mm

On 9/6/23 21:18, Lorenzo Stoakes wrote:
> On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@joelfernandes.org> wrote:
>>
>> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
>> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
>> > > From: Zqiang <qiang.zhang1211@gmail.com>
>> > >
>> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
>> > > memory info, if the objects is not allocated from the slab allocator,
>> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
>> > > need to be held, since the call_rcu() can be invoked in interrupt context,
>> > > therefore, there is a possibility of spinlock deadlock scenarios.
>> > >
>> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
>> > > lockdep warning:
>> > >
>> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
>> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
>> > > preempt_count: 1, expected: 0
>> > > RCU nest depth: 1, expected: 1
>> > > 3 locks held by swapper/0/1:
>> > >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
>> > >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
>> > >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
>> > > irq event stamp: 565512
>> > > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
>> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
>> > > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
>> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
>> > > Preemption disabled at:
>> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
>> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
>> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
>> > > Call Trace:
>> > >  <TASK>
>> > >  dump_stack_lvl+0x68/0xb0
>> > >  dump_stack+0x14/0x20
>> > >  __might_resched+0x1aa/0x280
>> > >  ? __pfx_rcu_torture_err_cb+0x10/0x10
>> > >  rt_spin_lock+0x53/0x130
>> > >  ? find_vmap_area+0x1f/0x70
>> > >  find_vmap_area+0x1f/0x70
>> > >  vmalloc_dump_obj+0x20/0x60
>> > >  mem_dump_obj+0x22/0x90
>> > >  __call_rcu_common+0x5bf/0x940
>> > >  ? debug_smp_processor_id+0x1b/0x30
>> > >  call_rcu_hurry+0x14/0x20
>> > >  rcu_torture_init+0x1f82/0x2370
>> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
>> > >  ? __pfx_rcu_torture_init+0x10/0x10
>> > >  do_one_initcall+0x6c/0x300
>> > >  ? debug_smp_processor_id+0x1b/0x30
>> > >  kernel_init_freeable+0x2b9/0x540
>> > >  ? __pfx_kernel_init+0x10/0x10
>> > >  kernel_init+0x1f/0x150
>> > >  ret_from_fork+0x40/0x50
>> > >  ? __pfx_kernel_init+0x10/0x10
>> > >  ret_from_fork_asm+0x1b/0x30
>> > >  </TASK>
>> > >
>> > > The previous patch fixes this by using the deadlock-safe best-effort
>> > > version of find_vm_area. However, in case of failure print the fact that
>> > > the pointer was a vmalloc pointer so that we print at least something.
>> > >
>> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
>> > > Cc: Paul E. McKenney <paulmck@kernel.org>
>> > > Cc: rcu@vger.kernel.org
>> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
>> > > Cc: stable@vger.kernel.org
>> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
>> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> > > ---
>> > >  mm/util.c | 4 +++-
>> > >  1 file changed, 3 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/mm/util.c b/mm/util.c
>> > > index dd12b9531ac4..406634f26918 100644
>> > > --- a/mm/util.c
>> > > +++ b/mm/util.c
>> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
>> > >     if (vmalloc_dump_obj(object))
>> > >             return;
>> > >
>> > > -   if (virt_addr_valid(object))
>> > > +   if (is_vmalloc_addr(object))
>> > > +           type = "vmalloc memory";
>> > > +   else if (virt_addr_valid(object))
>> > >             type = "non-slab/vmalloc memory";
>> >
>> > I think you should update this to say non-slab/non-vmalloc memory (as much
>> > as that description sucks!) as this phrasing in the past meant to say
>> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
>> > clear.
>>
>> True, though the issue you mentioned it is in existing code, a respin of this
>> patch could update it to say non-vmalloc. Good point, thanks for reviewing!
> 
> No it's not, you're changing the meaning, because you changed the code
> that determines the output...

I think it has always meant (but clearly it's not unambiguously worded) "not
slab && not vmalloc", that is before and after this patch. Only in case
patch 1 is applied and patch 2 not, can the output be wrong in that a
vmalloc pointer will (in case of trylock fail) be classified as "not slab &&
not vmalloc", but seems fine to me after patch 2.

I guess if we wanted, we could also rewrite it to be more like the kmem
check in the beginning of mem_dump_obj(), so there would be:

if (is_vmalloc_addr(...)) {
    vmalloc_dump_obj(...);
    return;
}

where vmalloc_dump_obj() itself would print at least "vmalloc memory" with
no further details in case of trylock failure.

that assumes is_vmalloc_addr() is guaranteed to be true for all addresses
that __find_vmap_area resolves, otherwise it could miss something compared
to current code. Is it guaranteed?

> This has been merged now despite my outstanding comments (!) so I
> guess I'll have to send a follow up patch to address this.
> 
>>
>>  - Joel
>>
> 
> 
> 
> --
> Lorenzo Stoakes
> https://ljs.io



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-06 22:46       ` Joel Fernandes
@ 2023-09-07  7:11         ` Lorenzo Stoakes
  2023-09-07  9:23           ` Uladzislau Rezki
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Stoakes @ 2023-09-07  7:11 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Paul E. McKenney, Vlastimil Babka, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Wed, Sep 06, 2023 at 10:46:08PM +0000, Joel Fernandes wrote:
> On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> > On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> > >
> > > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > > some contexts. Add a safer trylock version of the same function to do a
> > > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > > >
> > > > It'd be nice to have more details as to precisely which contexts and what this
> > > > resolves.
> > >
> > > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > > context interrupting a lock-held region) but you're right.
> > >
> > > > > [applied test robot feedback on unused function fix.]
> > > > > [applied Uladzislau feedback on locking.]
> > > > >
> > > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > > Cc: rcu@vger.kernel.org
> > > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > ---
> > > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > > >  #ifdef CONFIG_PRINTK
> > > > >  bool vmalloc_dump_obj(void *object)
> > > > >  {
> > > > > -   struct vm_struct *vm;
> > > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > > +   const void *caller;
> > > > > +   struct vm_struct *vm;
> > > > > +   struct vmap_area *va;
> > > > > +   unsigned long addr;
> > > > > +   unsigned int nr_pages;
> > > > >
> > > > > -   vm = find_vm_area(objp);
> > > > > -   if (!vm)
> > > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > > +           return false;
> > > >
> > > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > > also concerned that in the past this function would return false only if the
> > > > address was not a vmalloc one, but now it might just return false due to lock
> > > > contention and the user has no idea which it is?
> > > >
> > > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > > vs. the below cannot find case.
> > >
> > > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > > if the vmalloc look up fails.
> >
> > No, you output exactly what was output before, only changing what it
> > means and in no way differentiating between couldn't find vmalloc
> > area/couldn't get lock.
>
> 2/2 does this:
>                          -     if (virt_addr_valid(object))
>                          +     if (is_vmalloc_addr(object))
>                          +             type = "vmalloc memory";
>                          +     else if (virt_addr_valid(object))
>                                        type = "non-slab/vmalloc memory";
>
> This code is executed only if vmalloc_dump_obj() returns false. The
> is_vmalloc_addr() was added by 2/2 which is newly added right?
>
> You are right we are not differentiating between trylock failure and failure to
> find the vmalloc area. I was just saying, even though we don't differentiate,
> we do print "vmalloc memory" right? That wasn't being printed before.
>
> > > Also the reporter's usecase is not a common one. We only attempt to dump
> > > information if there was a debug objects failure (example if somebody did a
> > > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > > still print something about the address.
> >
> > Right, but the function still purports to do X but does Y.
> >
> > >
> > > > Under heavy lock contention aren't you potentially breaking the ability to
> > > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > > contexts under which acquiring this spinlock is not appropriate?
> > >
> > > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > > that also using trylock.
> >
> > Right so somebody now has to find this email to realise that. I hate
> > implicit knowledge like this, it needs a comment. It also furthers the
> > point that it'd be useful to differentiate between the two.
>
> This is a valid point, and I acknowledged it in last email. A code comment could
> indeed be useful.

Thanks, yeah this may seem trivial, but I am quite sensitive about things
being added to the code base that are neither described in commit msg nor
in a comment or elsewhere and become 'implicit' in a sense.

So just a simple comment here would be helpful, and I'm glad we're in
agreement on that, will leave to you to do a follow up patch.

>
> So I guess from an agreement standpoint, I agree:
>
> 1/2 could use an additional comment explaining why we need trylock (sighting
> the RT sleeping lock issue).
>
> 2/2 could update the existing code to convert "non-slab/vmalloc" to
> "non-slab/non-vmalloc". Note: that's an *existing* issue.

Yeah sorry this whole thing was rather confusing, it did indeed (unclearly)
specify non-/non- in the past (on assumption dumping function would work),
addition of vmalloc check now makes that correct again, the phrasing is the
issue.

You can leave this as-is as yeah, you're right, this was a pre-existing issue.

virt_addr_valid() returns true for a slab addr, but kmem_valid_obj() is
checked above so already been ruled out, now you ruled out vmalloc.

Just a bit tricksy.

>
> The issue in 2/2 is not a new one so that can certainly be a separate patch.
> And while at it, we could update the comment in that patch as well.
>
> But the whole differentiating between trylock vs vmalloc area lookup failure
> is not that useful -- just my opinion fwiw! I honestly feel differentiating
> between trylock vs vmalloc area lookup failure complicates the code because
> it will require passing this information down from vmalloc_dump_obj() to the
> caller AFAICS and I am not sure if the person reading the debug will really
> care much. But I am OK with whatever the -mm community wants and I am happy
> to send out a new patch on top with the above that I agree on since Andrew
> took these 2 (but for the stuff I don't agree, I would appreciate if you
> could send a patch for review and I am happy to review it!).

Ah right, I think maybe I wasn't clear, all I meant to suggest is to output
log output rather than feed anything back to caller, something like:-

if (!spin_trylock(&vmap_area_lock)) {
        pr_cont(" [couldn't acquire vmap lock]\n");
	...
}

My concern is that in the past this function would only return false if it
couldn't find the address in a VA, now it returns false also if you happen
to call it when the spinlock is locked, which might be confusing for
somebody debugging this.

HOWEVER, since you now indicate that the address is vmalloc anyway, and you
_absolutely cannot_ give any further details safely, perhaps this
additional information is indeed not that usful.

My concern was just feeling antsy that we suddenly don't do something
because a lock happens to be applied but as you say that cannot be helped
in certain contexts.

So actually, leave this.

>
> As you mentioned, this series is a stability fix and we can put touch-ups on
> top of it if needed, and there is also plenty of time till the next merge
> window. Allow me a few days and I'll do the new patch on top (I'd say dont
> bother to spend your time on it, I'll do it).

Ack, I was just a little frustrated we didn't reach a resolution on review
(either deciding things could be deferred or having changes) before
merge. Obviously fine to prioritise, but would be good to have that
explicitly stated.

>
> thanks,
>
>  - Joel
>
>
> >
> >

Anyway, so TL;DR:-

1. As we both agree, add a comment to explain why you need the spin trylock.
(there are no further steps :P)

And I don't believe this actually needs any further changes after this
discussion*, so if you fancy doing a follow up to that effect that will
suffice for me thanks!

* Though I strongly feel vmalloc as a whole needs top-to-bottom
  refactoring, but that's another story...


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-07  7:11         ` Lorenzo Stoakes
@ 2023-09-07  9:23           ` Uladzislau Rezki
  2023-09-08  0:18             ` Joel Fernandes
  0 siblings, 1 reply; 17+ messages in thread
From: Uladzislau Rezki @ 2023-09-07  9:23 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Joel Fernandes, linux-kernel, Andrew Morton, Uladzislau Rezki,
	Christoph Hellwig, Paul E. McKenney, Vlastimil Babka, Zhen Lei,
	rcu, Zqiang, stable, linux-mm

On Thu, Sep 07, 2023 at 08:11:48AM +0100, Lorenzo Stoakes wrote:
> On Wed, Sep 06, 2023 at 10:46:08PM +0000, Joel Fernandes wrote:
> > On Wed, Sep 06, 2023 at 08:23:18PM +0100, Lorenzo Stoakes wrote:
> > > On Tue, 5 Sept 2023 at 12:47, Joel Fernandes <joel@joelfernandes.org> wrote:
> > > >
> > > > On Tue, Sep 05, 2023 at 08:09:16AM +0100, Lorenzo Stoakes wrote:
> > > > > On Mon, Sep 04, 2023 at 06:08:04PM +0000, Joel Fernandes (Google) wrote:
> > > > > > It is unsafe to dump vmalloc area information when trying to do so from
> > > > > > some contexts. Add a safer trylock version of the same function to do a
> > > > > > best-effort VMA finding and use it from vmalloc_dump_obj().
> > > > >
> > > > > It'd be nice to have more details as to precisely which contexts and what this
> > > > > resolves.
> > > >
> > > > True. I was hoping the 'trylock' mention would be sufficient (example hardirq
> > > > context interrupting a lock-held region) but you're right.
> > > >
> > > > > > [applied test robot feedback on unused function fix.]
> > > > > > [applied Uladzislau feedback on locking.]
> > > > > >
> > > > > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> > > > > > Cc: Paul E. McKenney <paulmck@kernel.org>
> > > > > > Cc: rcu@vger.kernel.org
> > > > > > Cc: Zqiang <qiang.zhang1211@gmail.com>
> > > > > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > > > > ---
> > > > > >  mm/vmalloc.c | 26 ++++++++++++++++++++++----
> > > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index 93cf99aba335..2c6a0e2ff404 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -4274,14 +4274,32 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
> > > > > >  #ifdef CONFIG_PRINTK
> > > > > >  bool vmalloc_dump_obj(void *object)
> > > > > >  {
> > > > > > -   struct vm_struct *vm;
> > > > > >     void *objp = (void *)PAGE_ALIGN((unsigned long)object);
> > > > > > +   const void *caller;
> > > > > > +   struct vm_struct *vm;
> > > > > > +   struct vmap_area *va;
> > > > > > +   unsigned long addr;
> > > > > > +   unsigned int nr_pages;
> > > > > >
> > > > > > -   vm = find_vm_area(objp);
> > > > > > -   if (!vm)
> > > > > > +   if (!spin_trylock(&vmap_area_lock))
> > > > > > +           return false;
> > > > >
> > > > > It'd be good to have a comment here explaining why we must trylock here. I am
> > > > > also concerned that in the past this function would return false only if the
> > > > > address was not a vmalloc one, but now it might just return false due to lock
> > > > > contention and the user has no idea which it is?
> > > > >
> > > > > I'd want to at least output "vmalloc region cannot lookup lock contention"
> > > > > vs. the below cannot find case.
> > > >
> > > > In the patch 2/2 we do print if the address looks like a vmalloc address even
> > > > if the vmalloc look up fails.
> > >
> > > No, you output exactly what was output before, only changing what it
> > > means and in no way differentiating between couldn't find vmalloc
> > > area/couldn't get lock.
> >
> > 2/2 does this:
> >                          -     if (virt_addr_valid(object))
> >                          +     if (is_vmalloc_addr(object))
> >                          +             type = "vmalloc memory";
> >                          +     else if (virt_addr_valid(object))
> >                                        type = "non-slab/vmalloc memory";
> >
> > This code is executed only if vmalloc_dump_obj() returns false. The
> > is_vmalloc_addr() was added by 2/2 which is newly added right?
> >
> > You are right we are not differentiating between trylock failure and failure to
> > find the vmalloc area. I was just saying, even though we don't differentiate,
> > we do print "vmalloc memory" right? That wasn't being printed before.
> >
> > > > Also the reporter's usecase is not a common one. We only attempt to dump
> > > > information if there was a debug objects failure (example if somebody did a
> > > > double call_rcu). In such a situation, the patch will prevent a deadlock and
> > > > still print something about the address.
> > >
> > > Right, but the function still purports to do X but does Y.
> > >
> > > >
> > > > > Under heavy lock contention aren't you potentially breaking the ability to
> > > > > introspect vmalloc addresses? Wouldn't it be better to explicitly detect the
> > > > > contexts under which acquiring this spinlock is not appropriate?
> > > >
> > > > Yes this is a good point, but there's another case as well: PREEMPT_RT can
> > > > sleep on lock contention (as spinlocks are sleeping) and we can't sleep from
> > > > call_rcu() as it may be called in contexts that cannot sleep. So we handle
> > > > that also using trylock.
> > >
> > > Right so somebody now has to find this email to realise that. I hate
> > > implicit knowledge like this, it needs a comment. It also furthers the
> > > point that it'd be useful to differentiate between the two.
> >
> > This is a valid point, and I acknowledged it in last email. A code comment could
> > indeed be useful.
> 
> Thanks, yeah this may seem trivial, but I am quite sensitive about things
> being added to the code base that are neither described in commit msg nor
> in a comment or elsewhere and become 'implicit' in a sense.
> 
> So just a simple comment here would be helpful, and I'm glad we're in
> agreement on that, will leave to you to do a follow up patch.
> 
> >
> > So I guess from an agreement standpoint, I agree:
> >
> > 1/2 could use an additional comment explaining why we need trylock (sighting
> > the RT sleeping lock issue).
> >
> > 2/2 could update the existing code to convert "non-slab/vmalloc" to
> > "non-slab/non-vmalloc". Note: that's an *existing* issue.
> 
> Yeah sorry this whole thing was rather confusing, it did indeed (unclearly)
> specify non-/non- in the past (on assumption dumping function would work),
> addition of vmalloc check now makes that correct again, the phrasing is the
> issue.
> 
> You can leave this as-is as yeah, you're right, this was a pre-existing issue.
> 
> virt_addr_valid() returns true for a slab addr, but kmem_valid_obj() is
> checked above so already been ruled out, now you ruled out vmalloc.
> 
> Just a bit tricksy.
> 
> >
> > The issue in 2/2 is not a new one so that can certainly be a separate patch.
> > And while at it, we could update the comment in that patch as well.
> >
> > But the whole differentiating between trylock vs vmalloc area lookup failure
> > is not that useful -- just my opinion fwiw! I honestly feel differentiating
> > between trylock vs vmalloc area lookup failure complicates the code because
> > it will require passing this information down from vmalloc_dump_obj() to the
> > caller AFAICS and I am not sure if the person reading the debug will really
> > care much. But I am OK with whatever the -mm community wants and I am happy
> > to send out a new patch on top with the above that I agree on since Andrew
> > took these 2 (but for the stuff I don't agree, I would appreciate if you
> > could send a patch for review and I am happy to review it!).
> 
> Ah right, I think maybe I wasn't clear, all I meant to suggest is to output
> log output rather than feed anything back to caller, something like:-
> 
> if (!spin_trylock(&vmap_area_lock)) {
>         pr_cont(" [couldn't acquire vmap lock]\n");
> 	...
> }
> 
> My concern is that in the past this function would only return false if it
> couldn't find the address in a VA, now it returns false also if you happen
> to call it when the spinlock is locked, which might be confusing for
> somebody debugging this.
> 
> HOWEVER, since you now indicate that the address is vmalloc anyway, and you
> _absolutely cannot_ give any further details safely, perhaps this
> additional information is indeed not that usful.
> 
> My concern was just feeling antsy that we suddenly don't do something
> because a lock happens to be applied but as you say that cannot be helped
> in certain contexts.
> 
> So actually, leave this.
> 
> >
> > As you mentioned, this series is a stability fix and we can put touch-ups on
> > top of it if needed, and there is also plenty of time till the next merge
> > window. Allow me a few days and I'll do the new patch on top (I'd say dont
> > bother to spend your time on it, I'll do it).
> 
> Ack, I was just a little frustrated we didn't reach a resolution on review
> (either deciding things could be deferred or having changes) before
> merge. Obviously fine to prioritise, but would be good to have that
> explicitly stated.
> 
> >
> > thanks,
> >
> >  - Joel
> >
> >
> > >
> > >
> 
> Anyway, so TL;DR:-
> 
> 1. As we both agree, add a comment to explain why you need the spin trylock.
> (there are no further steps :P)
> 
> And I don't believe this actually needs any further changes after this
> discussion*, so if you fancy doing a follow up to that effect that will
> suffice for me thanks!
> 
For PREEMPT_RT kernels we are not allowed to use "vmap parts" in non
slepable context, this is just reality, because we use a sleep type of
spinlock.

I am not sure how urgent we need this fix. But to me it looks like
debuging and corner case. Probably i am wrong and miss something.
But if it is correct, i would just bailout for RT kernel and rework
later in a more proper way. For example implement a safe way of RCU
scan but this is also another story.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-07  9:23           ` Uladzislau Rezki
@ 2023-09-08  0:18             ` Joel Fernandes
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Fernandes @ 2023-09-08  0:18 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Lorenzo Stoakes, linux-kernel, Andrew Morton, Christoph Hellwig,
	Paul E. McKenney, Vlastimil Babka, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Thu, Sep 07, 2023 at 11:23:40AM +0200, Uladzislau Rezki wrote:
> On Thu, Sep 07, 2023 at 08:11:48AM +0100, Lorenzo Stoakes wrote:
[..]
> > Anyway, so TL;DR:-
> > 
> > 1. As we both agree, add a comment to explain why you need the spin trylock.
> > (there are no further steps :P)
> > 
> > And I don't believe this actually needs any further changes after this
> > discussion*, so if you fancy doing a follow up to that effect that will
> > suffice for me thanks!

Thanks.

> For PREEMPT_RT kernels we are not allowed to use "vmap parts" in non
> slepable context, this is just reality, because we use a sleep type of
> spinlock.
> 
> I am not sure how urgent we need this fix. But to me it looks like
> debuging and corner case. Probably i am wrong and miss something.
> But if it is correct, i would just bailout for RT kernel and rework
> later in a more proper way. For example implement a safe way of RCU
> scan but this is also another story.

Bailing out for RT kernel is insufficient, as we need the trylock() to avoid
self-deadlock as well for !PREEMPT_RT. Plus IIRC in the past there was a
opposition to special-casing PREEMPT_RT in code as well. Admittedly those
PREEMPT_RT cases were related to detecting preempt-disabled than a lock-held
section though.

We could optionally do a trylock() loop + bail out after certain number of
tries as well but that would compilicate the code a bit more and I am not
sure if it is worth it. Still if you guys feel strongly about doing something
like that, let me know and I can give it a try :).

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/2] rcu: Dump vmalloc memory info safely
  2023-09-07  7:10         ` Vlastimil Babka
@ 2023-09-08  0:26           ` Joel Fernandes
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Fernandes @ 2023-09-08  0:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Lorenzo Stoakes, linux-kernel, Andrew Morton, Paul E. McKenney,
	Zqiang, Zhen Lei, rcu, Matthew Wilcox, stable, linux-mm

On Thu, Sep 07, 2023 at 09:10:38AM +0200, Vlastimil Babka wrote:
> On 9/6/23 21:18, Lorenzo Stoakes wrote:
> > On Tue, 5 Sept 2023 at 12:48, Joel Fernandes <joel@joelfernandes.org> wrote:
> >>
> >> On Tue, Sep 05, 2023 at 08:00:44AM +0100, Lorenzo Stoakes wrote:
> >> > On Mon, Sep 04, 2023 at 06:08:05PM +0000, Joel Fernandes (Google) wrote:
> >> > > From: Zqiang <qiang.zhang1211@gmail.com>
> >> > >
> >> > > Currently, for double invoke call_rcu(), will dump rcu_head objects
> >> > > memory info, if the objects is not allocated from the slab allocator,
> >> > > the vmalloc_dump_obj() will be invoke and the vmap_area_lock spinlock
> >> > > need to be held, since the call_rcu() can be invoked in interrupt context,
> >> > > therefore, there is a possibility of spinlock deadlock scenarios.
> >> > >
> >> > > And in Preempt-RT kernel, the rcutorture test also trigger the following
> >> > > lockdep warning:
> >> > >
> >> > > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
> >> > > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> >> > > preempt_count: 1, expected: 0
> >> > > RCU nest depth: 1, expected: 1
> >> > > 3 locks held by swapper/0/1:
> >> > >  #0: ffffffffb534ee80 (fullstop_mutex){+.+.}-{4:4}, at: torture_init_begin+0x24/0xa0
> >> > >  #1: ffffffffb5307940 (rcu_read_lock){....}-{1:3}, at: rcu_torture_init+0x1ec7/0x2370
> >> > >  #2: ffffffffb536af40 (vmap_area_lock){+.+.}-{3:3}, at: find_vmap_area+0x1f/0x70
> >> > > irq event stamp: 565512
> >> > > hardirqs last  enabled at (565511): [<ffffffffb379b138>] __call_rcu_common+0x218/0x940
> >> > > hardirqs last disabled at (565512): [<ffffffffb5804262>] rcu_torture_init+0x20b2/0x2370
> >> > > softirqs last  enabled at (399112): [<ffffffffb36b2586>] __local_bh_enable_ip+0x126/0x170
> >> > > softirqs last disabled at (399106): [<ffffffffb43fef59>] inet_register_protosw+0x9/0x1d0
> >> > > Preemption disabled at:
> >> > > [<ffffffffb58040c3>] rcu_torture_init+0x1f13/0x2370
> >> > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.5.0-rc4-rt2-yocto-preempt-rt+ #15
> >> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
> >> > > Call Trace:
> >> > >  <TASK>
> >> > >  dump_stack_lvl+0x68/0xb0
> >> > >  dump_stack+0x14/0x20
> >> > >  __might_resched+0x1aa/0x280
> >> > >  ? __pfx_rcu_torture_err_cb+0x10/0x10
> >> > >  rt_spin_lock+0x53/0x130
> >> > >  ? find_vmap_area+0x1f/0x70
> >> > >  find_vmap_area+0x1f/0x70
> >> > >  vmalloc_dump_obj+0x20/0x60
> >> > >  mem_dump_obj+0x22/0x90
> >> > >  __call_rcu_common+0x5bf/0x940
> >> > >  ? debug_smp_processor_id+0x1b/0x30
> >> > >  call_rcu_hurry+0x14/0x20
> >> > >  rcu_torture_init+0x1f82/0x2370
> >> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >> > >  ? __pfx_rcu_torture_leak_cb+0x10/0x10
> >> > >  ? __pfx_rcu_torture_init+0x10/0x10
> >> > >  do_one_initcall+0x6c/0x300
> >> > >  ? debug_smp_processor_id+0x1b/0x30
> >> > >  kernel_init_freeable+0x2b9/0x540
> >> > >  ? __pfx_kernel_init+0x10/0x10
> >> > >  kernel_init+0x1f/0x150
> >> > >  ret_from_fork+0x40/0x50
> >> > >  ? __pfx_kernel_init+0x10/0x10
> >> > >  ret_from_fork_asm+0x1b/0x30
> >> > >  </TASK>
> >> > >
> >> > > The previous patch fixes this by using the deadlock-safe best-effort
> >> > > version of find_vm_area. However, in case of failure print the fact that
> >> > > the pointer was a vmalloc pointer so that we print at least something.
> >> > >
> >> > > Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
> >> > > Cc: Paul E. McKenney <paulmck@kernel.org>
> >> > > Cc: rcu@vger.kernel.org
> >> > > Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >> > > Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
> >> > > Cc: stable@vger.kernel.org
> >> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> >> > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> >> > > ---
> >> > >  mm/util.c | 4 +++-
> >> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> >> > >
> >> > > diff --git a/mm/util.c b/mm/util.c
> >> > > index dd12b9531ac4..406634f26918 100644
> >> > > --- a/mm/util.c
> >> > > +++ b/mm/util.c
> >> > > @@ -1071,7 +1071,9 @@ void mem_dump_obj(void *object)
> >> > >     if (vmalloc_dump_obj(object))
> >> > >             return;
> >> > >
> >> > > -   if (virt_addr_valid(object))
> >> > > +   if (is_vmalloc_addr(object))
> >> > > +           type = "vmalloc memory";
> >> > > +   else if (virt_addr_valid(object))
> >> > >             type = "non-slab/vmalloc memory";
> >> >
> >> > I think you should update this to say non-slab/non-vmalloc memory (as much
> >> > as that description sucks!) as this phrasing in the past meant to say
> >> > 'non-slab or vmalloc memory' (already confusing phrasing) so better to be
> >> > clear.
> >>
> >> True, though the issue you mentioned it is in existing code, a respin of this
> >> patch could update it to say non-vmalloc. Good point, thanks for reviewing!
> > 
> > No it's not, you're changing the meaning, because you changed the code
> > that determines the output...
> 
> I think it has always meant (but clearly it's not unambiguously worded) "not
> slab && not vmalloc", that is before and after this patch. Only in case
> patch 1 is applied and patch 2 not, can the output be wrong in that a
> vmalloc pointer will (in case of trylock fail) be classified as "not slab &&
> not vmalloc", but seems fine to me after patch 2.
> 
> I guess if we wanted, we could also rewrite it to be more like the kmem
> check in the beginning of mem_dump_obj(), so there would be:
> 
> if (is_vmalloc_addr(...)) {
>     vmalloc_dump_obj(...);
>     return;
> }
> 
> where vmalloc_dump_obj() itself would print at least "vmalloc memory" with
> no further details in case of trylock failure.
> 
> that assumes is_vmalloc_addr() is guaranteed to be true for all addresses
> that __find_vmap_area resolves, otherwise it could miss something compared
> to current code. Is it guaranteed?

It is guaranteed based on my reading of the code. But maybe it may aid
additional vmalloc-internals debugging if for some reason the address of the
object stored in the vmalloc data structures is out of bound for some reason
and the lookup actually succeded. That's just a hypothetical situation though
and I don't think that that can actually happen.

thanks,

 - Joel



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug
  2023-09-07  6:53 ` Vlastimil Babka
@ 2023-09-08  0:47   ` Joel Fernandes
  0 siblings, 0 replies; 17+ messages in thread
From: Joel Fernandes @ 2023-09-08  0:47 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton
  Cc: linux-kernel, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, Paul E. McKenney, Zhen Lei, rcu, Zqiang, stable,
	linux-mm

On Thu, Sep 07, 2023 at 08:53:14AM +0200, Vlastimil Babka wrote:
> Hi,
> 
> On 9/4/23 20:08, Joel Fernandes (Google) wrote:
> > It is unsafe to dump vmalloc area information when trying to do so from
> > some contexts. Add a safer trylock version of the same function to do a
> > best-effort VMA finding and use it from vmalloc_dump_obj().
> 
> I was a bit confused by the subject which suggests a new function is added,
> but it seems open-coded in its only caller. I assume it's due to evolution
> of the series. Something like:
> 
> mm/vmalloc: use trylock for vmap_area_lock in vmalloc_dump_obj()
> 
> ?
> 
> I also notice it's trying hard to copy everything from "vm" to temporary
> variables before unlocking, presumably to prevent use-after-free, so should
> that be also mentioned in the changelog?

Apologies for the less-than-ideal changelog. Andrew would you mind replacing
the merged patch with the below one instead? It just contains non-functional
changes to change log and an additional code comment/print. Thanks!

---8<-----------------------

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Subject: [PATCH v3.1] mm/vmalloc: Add a safer inlined version of find_vm_area() for
 debug

It is unsafe to dump vmalloc area information when trying to do so from
some contexts such as PREEMPT_RT or from an IRQ handler that interrupted
a vmap_area_lock-held region. Add a safer and inlined trylock version of
find_vm_area() to do a best-effort VMA finding and use it from
vmalloc_dump_obj().

While the vmap_area_lock is held, copy interesting attributes from the
vm_struct before unlocking.

[applied test robot feedback on unused function fix.]
[applied Uladzislau feedback on locking.]
[applied Vlastimil and Lorenzo feedback on changelog, comment and print
improvements]

Reported-by: Zhen Lei <thunder.leizhen@huaweicloud.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: rcu@vger.kernel.org
Cc: Zqiang <qiang.zhang1211@gmail.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Fixes: 98f180837a89 ("mm: Make mem_dump_obj() handle vmalloc() memory")
Cc: stable@vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 mm/vmalloc.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 93cf99aba335..990a0d5efba8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4274,14 +4274,40 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 #ifdef CONFIG_PRINTK
 bool vmalloc_dump_obj(void *object)
 {
-	struct vm_struct *vm;
 	void *objp = (void *)PAGE_ALIGN((unsigned long)object);
+	const void *caller;
+	struct vm_struct *vm;
+	struct vmap_area *va;
+	unsigned long addr;
+	unsigned int nr_pages;
+
+	/*
+	 * Use trylock as we don't want to contend since this is debug code and
+	 * we might run this code in contexts like PREEMPT_RT where spinlock
+	 * contention may result in sleeping, or from an IRQ handler which
+	 * might interrupt a vmap_area_lock-held critical section.
+	 */
+	if (!spin_trylock(&vmap_area_lock)) {
+		pr_cont(" [couldn't acquire vmap_area_lock]\n");
+		return false;
+	}
+	va = __find_vmap_area((unsigned long)objp, &vmap_area_root);
+	if (!va) {
+		spin_unlock(&vmap_area_lock);
+		return false;
+	}
 
-	vm = find_vm_area(objp);
-	if (!vm)
+	vm = va->vm;
+	if (!vm) {
+		spin_unlock(&vmap_area_lock);
 		return false;
+	}
+	addr = (unsigned long)vm->addr;
+	caller = vm->caller;
+	nr_pages = vm->nr_pages;
+	spin_unlock(&vmap_area_lock);
 	pr_cont(" %u-page vmalloc region starting at %#lx allocated at %pS\n",
-		vm->nr_pages, (unsigned long)vm->addr, vm->caller);
+		nr_pages, addr, caller);
 	return true;
 }
 #endif
-- 
2.42.0.283.g2d96d420d3-goog



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-09-08  0:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-04 18:08 [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Joel Fernandes (Google)
2023-09-04 18:08 ` [PATCH v3 2/2] rcu: Dump vmalloc memory info safely Joel Fernandes (Google)
2023-09-05  7:00   ` Lorenzo Stoakes
2023-09-05 11:48     ` Joel Fernandes
2023-09-06 19:18       ` Lorenzo Stoakes
2023-09-07  7:10         ` Vlastimil Babka
2023-09-08  0:26           ` Joel Fernandes
2023-09-05  7:09 ` [PATCH v3 1/2] mm/vmalloc: Add a safer version of find_vm_area() for debug Lorenzo Stoakes
2023-09-05 11:47   ` Joel Fernandes
2023-09-06 19:23     ` Lorenzo Stoakes
2023-09-06 19:46       ` Lorenzo Stoakes
2023-09-06 22:46       ` Joel Fernandes
2023-09-07  7:11         ` Lorenzo Stoakes
2023-09-07  9:23           ` Uladzislau Rezki
2023-09-08  0:18             ` Joel Fernandes
2023-09-07  6:53 ` Vlastimil Babka
2023-09-08  0:47   ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox