[Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
@ 2024-06-07  2:31 zhaoyang.huang
  2024-06-07  8:30 ` Zhaoyang Huang
  2024-06-13 17:33 ` Uladzislau Rezki
  0 siblings, 2 replies; 15+ messages in thread
From: zhaoyang.huang @ 2024-06-07  2:31 UTC (permalink / raw)
  To: Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, Baoquan He, Thomas Gleixner, hailong liu,
	linux-mm, linux-kernel, stable, Zhaoyang Huang, steve.kang

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

vmalloc area runs out in our ARM64 system during an erofs test as
vm_map_ram failed[1]. By following the debug log, we find that
vm_map_ram()->vb_alloc() will allocate new vb->va which corresponding
to 4MB vmalloc area as list_for_each_entry_rcu returns immediately
when vbq->free->next points to vbq->free. That is to say, 65536 times
of page fault after the list's broken will run out of the whole
vmalloc area. This should be introduced by one vbq->free->next point to
vbq->free which makes list_for_each_entry_rcu can not iterate the list
and find the BUG.

[1]
PID: 1        TASK: ffffff80802b4e00  CPU: 6    COMMAND: "init"
 #0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc
 #1 [ffffffc08006b040] __schedule at ffffffc08111dde0
 #2 [ffffffc08006b0a0] schedule at ffffffc08111e294
 #3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f0
 #4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c
 #5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8
 #6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834
 #7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803ebc3c
 #8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc
 #9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0

Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")

For detailed reason of broken list, please refer to below URL
https://lore.kernel.org/all/20240531024820.5507-1-hailong.liu@oppo.com/

Suggested-by: Hailong.Liu <hailong.liu@oppo.com>
Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
v2: introduce cpu in vmap_block to record the right CPU number
v3: use get_cpu/put_cpu to prevent schedule between core
v4: replace get_cpu/put_cpu by another API to avoid disabling preemption
---
---
 mm/vmalloc.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 22aa63f4ef63..89eb034f4ac6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2458,6 +2458,7 @@ struct vmap_block {
 	struct list_head free_list;
 	struct rcu_head rcu_head;
 	struct list_head purge;
+	unsigned int cpu;
 };
 
 /* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
@@ -2585,8 +2586,15 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 		free_vmap_area(va);
 		return ERR_PTR(err);
 	}
-
-	vbq = raw_cpu_ptr(&vmap_block_queue);
+	/*
+	 * list_add_tail_rcu could happened in another core
+	 * rather than vb->cpu due to task migration, which
+	 * is safe as list_add_tail_rcu will ensure the list's
+	 * integrity together with list_for_each_rcu from read
+	 * side.
+	 */
+	vb->cpu = raw_smp_processor_id();
+	vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
 	spin_lock(&vbq->lock);
 	list_add_tail_rcu(&vb->free_list, &vbq->free);
 	spin_unlock(&vbq->lock);
@@ -2614,9 +2622,10 @@ static void free_vmap_block(struct vmap_block *vb)
 }
 
 static bool purge_fragmented_block(struct vmap_block *vb,
-		struct vmap_block_queue *vbq, struct list_head *purge_list,
-		bool force_purge)
+		struct list_head *purge_list, bool force_purge)
 {
+	struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu);
+
 	if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
 	    vb->dirty == VMAP_BBMAP_BITS)
 		return false;
@@ -2664,7 +2673,7 @@ static void purge_fragmented_blocks(int cpu)
 			continue;
 
 		spin_lock(&vb->lock);
-		purge_fragmented_block(vb, vbq, &purge, true);
+		purge_fragmented_block(vb, &purge, true);
 		spin_unlock(&vb->lock);
 	}
 	rcu_read_unlock();
@@ -2801,7 +2810,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
 			 * not purgeable, check whether there is dirty
 			 * space to be flushed.
 			 */
-			if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
+			if (!purge_fragmented_block(vb, &purge_list, false) &&
 			    vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
 				unsigned long va_start = vb->va->va_start;
 				unsigned long s, e;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-07  2:31 [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block zhaoyang.huang
@ 2024-06-07  8:30 ` Zhaoyang Huang
  2024-06-11  1:08   ` Zhaoyang Huang
  2024-06-13 17:33 ` Uladzislau Rezki
  1 sibling, 1 reply; 15+ messages in thread
From: Zhaoyang Huang @ 2024-06-07  8:30 UTC (permalink / raw)
  To: zhaoyang.huang
  Cc: Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, Baoquan He, Thomas Gleixner, hailong liu,
	linux-mm, linux-kernel, stable, steve.kang

Patchv4 was updated based on Hailong and Uladzislau's comments, where
vbq is obtained from vb->cpu, thus avoiding disabling preemption.
Furthermore, Baoquan's suggestion was not adopted because it made vbq
accesses completely interleaved across all CPUs, which defeats the
goal of per_cpu.

On Fri, Jun 7, 2024 at 10:31 AM zhaoyang.huang
<zhaoyang.huang@unisoc.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> vmalloc area runs out in our ARM64 system during an erofs test as
> vm_map_ram failed[1]. By following the debug log, we find that
> vm_map_ram()->vb_alloc() will allocate new vb->va which corresponding
> to 4MB vmalloc area as list_for_each_entry_rcu returns immediately
> when vbq->free->next points to vbq->free. That is to say, 65536 times
> of page fault after the list's broken will run out of the whole
> vmalloc area. This should be introduced by one vbq->free->next point to
> vbq->free which makes list_for_each_entry_rcu can not iterate the list
> and find the BUG.
>
> [1]
> PID: 1        TASK: ffffff80802b4e00  CPU: 6    COMMAND: "init"
>  #0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc
>  #1 [ffffffc08006b040] __schedule at ffffffc08111dde0
>  #2 [ffffffc08006b0a0] schedule at ffffffc08111e294
>  #3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f0
>  #4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c
>  #5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8
>  #6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834
>  #7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803ebc3c
>  #8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc
>  #9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0
>
> Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
>
> For detailed reason of broken list, please refer to below URL
> https://lore.kernel.org/all/20240531024820.5507-1-hailong.liu@oppo.com/
>
> Suggested-by: Hailong.Liu <hailong.liu@oppo.com>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
> v2: introduce cpu in vmap_block to record the right CPU number
> v3: use get_cpu/put_cpu to prevent schedule between core
> v4: replace get_cpu/put_cpu by another API to avoid disabling preemption
> ---
> ---
>  mm/vmalloc.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 22aa63f4ef63..89eb034f4ac6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2458,6 +2458,7 @@ struct vmap_block {
>         struct list_head free_list;
>         struct rcu_head rcu_head;
>         struct list_head purge;
> +       unsigned int cpu;
>  };
>
>  /* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
> @@ -2585,8 +2586,15 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>                 free_vmap_area(va);
>                 return ERR_PTR(err);
>         }
> -
> -       vbq = raw_cpu_ptr(&vmap_block_queue);
> +       /*
> +        * list_add_tail_rcu could happened in another core
> +        * rather than vb->cpu due to task migration, which
> +        * is safe as list_add_tail_rcu will ensure the list's
> +        * integrity together with list_for_each_rcu from read
> +        * side.
> +        */
> +       vb->cpu = raw_smp_processor_id();
> +       vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
>         spin_lock(&vbq->lock);
>         list_add_tail_rcu(&vb->free_list, &vbq->free);
>         spin_unlock(&vbq->lock);
> @@ -2614,9 +2622,10 @@ static void free_vmap_block(struct vmap_block *vb)
>  }
>
>  static bool purge_fragmented_block(struct vmap_block *vb,
> -               struct vmap_block_queue *vbq, struct list_head *purge_list,
> -               bool force_purge)
> +               struct list_head *purge_list, bool force_purge)
>  {
> +       struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu);
> +
>         if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
>             vb->dirty == VMAP_BBMAP_BITS)
>                 return false;
> @@ -2664,7 +2673,7 @@ static void purge_fragmented_blocks(int cpu)
>                         continue;
>
>                 spin_lock(&vb->lock);
> -               purge_fragmented_block(vb, vbq, &purge, true);
> +               purge_fragmented_block(vb, &purge, true);
>                 spin_unlock(&vb->lock);
>         }
>         rcu_read_unlock();
> @@ -2801,7 +2810,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
>                          * not purgeable, check whether there is dirty
>                          * space to be flushed.
>                          */
> -                       if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
> +                       if (!purge_fragmented_block(vb, &purge_list, false) &&
>                             vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
>                                 unsigned long va_start = vb->va->va_start;
>                                 unsigned long s, e;
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-07  8:30 ` Zhaoyang Huang
@ 2024-06-11  1:08   ` Zhaoyang Huang
  2024-06-11 18:16     ` Uladzislau Rezki
  0 siblings, 1 reply; 15+ messages in thread
From: Zhaoyang Huang @ 2024-06-11  1:08 UTC (permalink / raw)
  To: zhaoyang.huang
  Cc: Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, Baoquan He, Thomas Gleixner, hailong liu,
	linux-mm, linux-kernel, stable, steve.kang

Sorry to bother you again. Are there any other comments or new patch
on this which block some test cases of ANDROID that only accept ACKed
one on its tree.

On Fri, Jun 7, 2024 at 4:30 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> Patchv4 was updated based on Hailong and Uladzislau's comments, where
> vbq is obtained from vb->cpu, thus avoiding disabling preemption.
> Furthermore, Baoquan's suggestion was not adopted because it made vbq
> accesses completely interleaved across all CPUs, which defeats the
> goal of per_cpu.
>
> On Fri, Jun 7, 2024 at 10:31 AM zhaoyang.huang
> <zhaoyang.huang@unisoc.com> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > vmalloc area runs out in our ARM64 system during an erofs test as
> > vm_map_ram failed[1]. By following the debug log, we find that
> > vm_map_ram()->vb_alloc() will allocate new vb->va which corresponding
> > to 4MB vmalloc area as list_for_each_entry_rcu returns immediately
> > when vbq->free->next points to vbq->free. That is to say, 65536 times
> > of page fault after the list's broken will run out of the whole
> > vmalloc area. This should be introduced by one vbq->free->next point to
> > vbq->free which makes list_for_each_entry_rcu can not iterate the list
> > and find the BUG.
> >
> > [1]
> > PID: 1        TASK: ffffff80802b4e00  CPU: 6    COMMAND: "init"
> >  #0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc
> >  #1 [ffffffc08006b040] __schedule at ffffffc08111dde0
> >  #2 [ffffffc08006b0a0] schedule at ffffffc08111e294
> >  #3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f0
> >  #4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c
> >  #5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8
> >  #6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834
> >  #7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803ebc3c
> >  #8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc
> >  #9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0
> >
> > Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
> >
> > For detailed reason of broken list, please refer to below URL
> > https://lore.kernel.org/all/20240531024820.5507-1-hailong.liu@oppo.com/
> >
> > Suggested-by: Hailong.Liu <hailong.liu@oppo.com>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> > v2: introduce cpu in vmap_block to record the right CPU number
> > v3: use get_cpu/put_cpu to prevent schedule between core
> > v4: replace get_cpu/put_cpu by another API to avoid disabling preemption
> > ---
> > ---
> >  mm/vmalloc.c | 21 +++++++++++++++------
> >  1 file changed, 15 insertions(+), 6 deletions(-)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 22aa63f4ef63..89eb034f4ac6 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2458,6 +2458,7 @@ struct vmap_block {
> >         struct list_head free_list;
> >         struct rcu_head rcu_head;
> >         struct list_head purge;
> > +       unsigned int cpu;
> >  };
> >
> >  /* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
> > @@ -2585,8 +2586,15 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
> >                 free_vmap_area(va);
> >                 return ERR_PTR(err);
> >         }
> > -
> > -       vbq = raw_cpu_ptr(&vmap_block_queue);
> > +       /*
> > +        * list_add_tail_rcu could happened in another core
> > +        * rather than vb->cpu due to task migration, which
> > +        * is safe as list_add_tail_rcu will ensure the list's
> > +        * integrity together with list_for_each_rcu from read
> > +        * side.
> > +        */
> > +       vb->cpu = raw_smp_processor_id();
> > +       vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
> >         spin_lock(&vbq->lock);
> >         list_add_tail_rcu(&vb->free_list, &vbq->free);
> >         spin_unlock(&vbq->lock);
> > @@ -2614,9 +2622,10 @@ static void free_vmap_block(struct vmap_block *vb)
> >  }
> >
> >  static bool purge_fragmented_block(struct vmap_block *vb,
> > -               struct vmap_block_queue *vbq, struct list_head *purge_list,
> > -               bool force_purge)
> > +               struct list_head *purge_list, bool force_purge)
> >  {
> > +       struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu);
> > +
> >         if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
> >             vb->dirty == VMAP_BBMAP_BITS)
> >                 return false;
> > @@ -2664,7 +2673,7 @@ static void purge_fragmented_blocks(int cpu)
> >                         continue;
> >
> >                 spin_lock(&vb->lock);
> > -               purge_fragmented_block(vb, vbq, &purge, true);
> > +               purge_fragmented_block(vb, &purge, true);
> >                 spin_unlock(&vb->lock);
> >         }
> >         rcu_read_unlock();
> > @@ -2801,7 +2810,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
> >                          * not purgeable, check whether there is dirty
> >                          * space to be flushed.
> >                          */
> > -                       if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
> > +                       if (!purge_fragmented_block(vb, &purge_list, false) &&
> >                             vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
> >                                 unsigned long va_start = vb->va->va_start;
> >                                 unsigned long s, e;
> > --
> > 2.25.1
> >


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-11  1:08   ` Zhaoyang Huang
@ 2024-06-11 18:16     ` Uladzislau Rezki
  2024-06-12  2:00       ` Zhaoyang Huang
  0 siblings, 1 reply; 15+ messages in thread
From: Uladzislau Rezki @ 2024-06-11 18:16 UTC (permalink / raw)
  To: Zhaoyang Huang
  Cc: zhaoyang.huang, Andrew Morton, Uladzislau Rezki,
	Christoph Hellwig, Lorenzo Stoakes, Baoquan He, Thomas Gleixner,
	hailong liu, linux-mm, linux-kernel, stable, steve.kang

>
> Sorry to bother you again. Are there any other comments or new patch
> on this which block some test cases of ANDROID that only accept ACKed
> one on its tree.
> 
I have just returned from vacation. Give me some time to review your
patch. Meanwhile, do you have a reproducer? So i would like to see how
i can trigger an issue that is in question.

Thanks!

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-11 18:16     ` Uladzislau Rezki
@ 2024-06-12  2:00       ` Zhaoyang Huang
  2024-06-12 11:27         ` Uladzislau Rezki
  0 siblings, 1 reply; 15+ messages in thread
From: Zhaoyang Huang @ 2024-06-12  2:00 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: zhaoyang.huang, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Baoquan He, Thomas Gleixner, hailong liu,
	linux-mm, linux-kernel, stable, steve.kang

On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> >
> > Sorry to bother you again. Are there any other comments or new patch
> > on this which block some test cases of ANDROID that only accept ACKed
> > one on its tree.
> >
> I have just returned from vacation. Give me some time to review your
> patch. Meanwhile, do you have a reproducer? So i would like to see how
> i can trigger an issue that is in question.
This bug arises from an system wide android test which has been
reported by many vendors. Keep mount/unmount an erofs partition is
supposed to be a simple reproducer. IMO, the logic defect is obvious
enough to be found by code review.
>
> Thanks!
>
> --
> Uladzislau Rezki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-12  2:00       ` Zhaoyang Huang
@ 2024-06-12 11:27         ` Uladzislau Rezki
  2024-06-13  8:41           ` Baoquan He
  2024-06-13 11:42           ` hailong liu
  0 siblings, 2 replies; 15+ messages in thread
From: Uladzislau Rezki @ 2024-06-12 11:27 UTC (permalink / raw)
  To: Zhaoyang Huang, Baoquan He
  Cc: Uladzislau Rezki, zhaoyang.huang, Andrew Morton,
	Christoph Hellwig, Lorenzo Stoakes, Baoquan He, Thomas Gleixner,
	hailong liu, linux-mm, linux-kernel, stable, steve.kang

On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > >
> > > Sorry to bother you again. Are there any other comments or new patch
> > > on this which block some test cases of ANDROID that only accept ACKed
> > > one on its tree.
> > >
> > I have just returned from vacation. Give me some time to review your
> > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > i can trigger an issue that is in question.
> This bug arises from an system wide android test which has been
> reported by many vendors. Keep mount/unmount an erofs partition is
> supposed to be a simple reproducer. IMO, the logic defect is obvious
> enough to be found by code review.
>
Baoquan, any objection about this v4?

Your proposal about inserting a new vmap-block based on it belongs
to, i.e. not per-this-cpu, should fix an issue. The problem is that
such way does __not__ pre-load a current CPU what is not good.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-12 11:27         ` Uladzislau Rezki
@ 2024-06-13  8:41           ` Baoquan He
  2024-06-13  9:11             ` hailong liu
  2024-06-13 11:28             ` Uladzislau Rezki
  2024-06-13 11:42           ` hailong liu
  1 sibling, 2 replies; 15+ messages in thread
From: Baoquan He @ 2024-06-13  8:41 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Zhaoyang Huang, zhaoyang.huang, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Thomas Gleixner, hailong liu, linux-mm,
	linux-kernel, stable, steve.kang

On 06/12/24 at 01:27pm, Uladzislau Rezki wrote:
> On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> > On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > >
> > > >
> > > > Sorry to bother you again. Are there any other comments or new patch
> > > > on this which block some test cases of ANDROID that only accept ACKed
> > > > one on its tree.
> > > >
> > > I have just returned from vacation. Give me some time to review your
> > > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > > i can trigger an issue that is in question.
> > This bug arises from an system wide android test which has been
> > reported by many vendors. Keep mount/unmount an erofs partition is
> > supposed to be a simple reproducer. IMO, the logic defect is obvious
> > enough to be found by code review.
> >
> Baoquan, any objection about this v4?
> 
> Your proposal about inserting a new vmap-block based on it belongs
> to, i.e. not per-this-cpu, should fix an issue. The problem is that
> such way does __not__ pre-load a current CPU what is not good.

With my understand, when we start handling to insert vb to vbq->xa and
vbq->free, the vmap_area allocation has been done, it doesn't impact the
CPU preloading when adding it into which CPU's vbq->free, does it? 

Not sure if I miss anything about the CPU preloading.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-13  8:41           ` Baoquan He
@ 2024-06-13  9:11             ` hailong liu
  2024-06-13  9:23               ` Zhaoyang Huang
  2024-06-13 11:28             ` Uladzislau Rezki
  1 sibling, 1 reply; 15+ messages in thread
From: hailong liu @ 2024-06-13  9:11 UTC (permalink / raw)
  To: Baoquan He
  Cc: Uladzislau Rezki, Zhaoyang Huang, zhaoyang.huang, Andrew Morton,
	Christoph Hellwig, Lorenzo Stoakes, Thomas Gleixner, linux-mm,
	linux-kernel, stable, steve.kang

On Thu, 13. Jun 16:41, Baoquan He wrote:
> On 06/12/24 at 01:27pm, Uladzislau Rezki wrote:
> > On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> > > On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > >
> > > > >
> > > > > Sorry to bother you again. Are there any other comments or new patch
> > > > > on this which block some test cases of ANDROID that only accept ACKed
> > > > > one on its tree.
> > > > >
> > > > I have just returned from vacation. Give me some time to review your
> > > > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > > > i can trigger an issue that is in question.
> > > This bug arises from an system wide android test which has been
> > > reported by many vendors. Keep mount/unmount an erofs partition is
> > > supposed to be a simple reproducer. IMO, the logic defect is obvious
> > > enough to be found by code review.
> > >
> > Baoquan, any objection about this v4?
> >
> > Your proposal about inserting a new vmap-block based on it belongs
> > to, i.e. not per-this-cpu, should fix an issue. The problem is that
> > such way does __not__ pre-load a current CPU what is not good.
>
> With my understand, when we start handling to insert vb to vbq->xa and
> vbq->free, the vmap_area allocation has been done, it doesn't impact the
> CPU preloading when adding it into which CPU's vbq->free, does it?
>
> Not sure if I miss anything about the CPU preloading.
>
>

IIUC, if vb put by hashing funcation. and the following scenario may occur:

A kthread limit on CPU_x and continuously calls vm_map_ram()
The 1 call vm_map_ram(): no vb in cpu_x->free, so
CPU_0->vb
CPU_1
...
CPU_x

The 2 call vm_map_ram(): no vb in cpu_x->free, so
CPU_0->vb
CPU_1->vb
...
CPU_x

--
help you, help me,
Hailong.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-13  9:11             ` hailong liu
@ 2024-06-13  9:23               ` Zhaoyang Huang
  2024-06-14  1:44                 ` Baoquan He
  0 siblings, 1 reply; 15+ messages in thread
From: Zhaoyang Huang @ 2024-06-13  9:23 UTC (permalink / raw)
  To: hailong liu
  Cc: Baoquan He, Uladzislau Rezki, zhaoyang.huang, Andrew Morton,
	Christoph Hellwig, Lorenzo Stoakes, Thomas Gleixner, linux-mm,
	linux-kernel, stable, steve.kang

On Thu, Jun 13, 2024 at 5:11 PM hailong liu <hailong.liu@oppo.com> wrote:
>
> On Thu, 13. Jun 16:41, Baoquan He wrote:
> > On 06/12/24 at 01:27pm, Uladzislau Rezki wrote:
> > > On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> > > > On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > >
> > > > > >
> > > > > > Sorry to bother you again. Are there any other comments or new patch
> > > > > > on this which block some test cases of ANDROID that only accept ACKed
> > > > > > one on its tree.
> > > > > >
> > > > > I have just returned from vacation. Give me some time to review your
> > > > > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > > > > i can trigger an issue that is in question.
> > > > This bug arises from an system wide android test which has been
> > > > reported by many vendors. Keep mount/unmount an erofs partition is
> > > > supposed to be a simple reproducer. IMO, the logic defect is obvious
> > > > enough to be found by code review.
> > > >
> > > Baoquan, any objection about this v4?
> > >
> > > Your proposal about inserting a new vmap-block based on it belongs
> > > to, i.e. not per-this-cpu, should fix an issue. The problem is that
> > > such way does __not__ pre-load a current CPU what is not good.
> >
> > With my understand, when we start handling to insert vb to vbq->xa and
> > vbq->free, the vmap_area allocation has been done, it doesn't impact the
> > CPU preloading when adding it into which CPU's vbq->free, does it?
> >
> > Not sure if I miss anything about the CPU preloading.
> >
> >
>
> IIUC, if vb put by hashing funcation. and the following scenario may occur:
>
> A kthread limit on CPU_x and continuously calls vm_map_ram()
> The 1 call vm_map_ram(): no vb in cpu_x->free, so
> CPU_0->vb
> CPU_1
> ...
> CPU_x
>
> The 2 call vm_map_ram(): no vb in cpu_x->free, so
> CPU_0->vb
> CPU_1->vb
> ...
> CPU_x
Yes, this could make the per_cpu vbq meaningless and the VMALLOC area
be abnormally consumed(like 8KB in 4MB for each allocation)
>
> --
> help you, help me,
> Hailong.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-13  9:23               ` Zhaoyang Huang
@ 2024-06-14  1:44                 ` Baoquan He
  0 siblings, 0 replies; 15+ messages in thread
From: Baoquan He @ 2024-06-14  1:44 UTC (permalink / raw)
  To: hailong liu, Zhaoyang Huang
  Cc: Uladzislau Rezki, zhaoyang.huang, Andrew Morton,
	Christoph Hellwig, Lorenzo Stoakes, Thomas Gleixner, linux-mm,
	linux-kernel, stable, steve.kang

On 06/13/24 at 05:23pm, Zhaoyang Huang wrote:
> On Thu, Jun 13, 2024 at 5:11 PM hailong liu <hailong.liu@oppo.com> wrote:
> >
> > On Thu, 13. Jun 16:41, Baoquan He wrote:
> > > On 06/12/24 at 01:27pm, Uladzislau Rezki wrote:
> > > > On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> > > > > On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > > >
> > > > > > >
> > > > > > > Sorry to bother you again. Are there any other comments or new patch
> > > > > > > on this which block some test cases of ANDROID that only accept ACKed
> > > > > > > one on its tree.
> > > > > > >
> > > > > > I have just returned from vacation. Give me some time to review your
> > > > > > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > > > > > i can trigger an issue that is in question.
> > > > > This bug arises from an system wide android test which has been
> > > > > reported by many vendors. Keep mount/unmount an erofs partition is
> > > > > supposed to be a simple reproducer. IMO, the logic defect is obvious
> > > > > enough to be found by code review.
> > > > >
> > > > Baoquan, any objection about this v4?
> > > >
> > > > Your proposal about inserting a new vmap-block based on it belongs
> > > > to, i.e. not per-this-cpu, should fix an issue. The problem is that
> > > > such way does __not__ pre-load a current CPU what is not good.
> > >
> > > With my understand, when we start handling to insert vb to vbq->xa and
> > > vbq->free, the vmap_area allocation has been done, it doesn't impact the
> > > CPU preloading when adding it into which CPU's vbq->free, does it?
> > >
> > > Not sure if I miss anything about the CPU preloading.
> > >
> > >
> >
> > IIUC, if vb put by hashing funcation. and the following scenario may occur:

Thanks for the details, it's truly a problem as you said.

> >
> > A kthread limit on CPU_x and continuously calls vm_map_ram()
> > The 1 call vm_map_ram(): no vb in cpu_x->free, so
> > CPU_0->vb
> > CPU_1
> > ...
> > CPU_x
> >
> > The 2 call vm_map_ram(): no vb in cpu_x->free, so
> > CPU_0->vb
> > CPU_1->vb
> > ...
> > CPU_x
> Yes, this could make the per_cpu vbq meaningless and the VMALLOC area
> be abnormally consumed(like 8KB in 4MB for each allocation)
> >
> > --
> > help you, help me,
> > Hailong.
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-13  8:41           ` Baoquan He
  2024-06-13  9:11             ` hailong liu
@ 2024-06-13 11:28             ` Uladzislau Rezki
  2024-06-14  1:32               ` Baoquan He
  1 sibling, 1 reply; 15+ messages in thread
From: Uladzislau Rezki @ 2024-06-13 11:28 UTC (permalink / raw)
  To: Baoquan He
  Cc: Uladzislau Rezki, Zhaoyang Huang, zhaoyang.huang, Andrew Morton,
	Christoph Hellwig, Lorenzo Stoakes, Thomas Gleixner, hailong liu,
	linux-mm, linux-kernel, stable, steve.kang

On Thu, Jun 13, 2024 at 04:41:34PM +0800, Baoquan He wrote:
> On 06/12/24 at 01:27pm, Uladzislau Rezki wrote:
> > On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> > > On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > >
> > > > >
> > > > > Sorry to bother you again. Are there any other comments or new patch
> > > > > on this which block some test cases of ANDROID that only accept ACKed
> > > > > one on its tree.
> > > > >
> > > > I have just returned from vacation. Give me some time to review your
> > > > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > > > i can trigger an issue that is in question.
> > > This bug arises from an system wide android test which has been
> > > reported by many vendors. Keep mount/unmount an erofs partition is
> > > supposed to be a simple reproducer. IMO, the logic defect is obvious
> > > enough to be found by code review.
> > >
> > Baoquan, any objection about this v4?
> > 
> > Your proposal about inserting a new vmap-block based on it belongs
> > to, i.e. not per-this-cpu, should fix an issue. The problem is that
> > such way does __not__ pre-load a current CPU what is not good.
> 
> With my understand, when we start handling to insert vb to vbq->xa and
> vbq->free, the vmap_area allocation has been done, it doesn't impact the
> CPU preloading when adding it into which CPU's vbq->free, does it? 
> 
> Not sure if I miss anything about the CPU preloading.
> 
Like explained below in this email-thread:

vb_alloc() inserts a new block _not_ on this CPU. This CPU tries to
allocate one more time and its free_list is empty(because on a prev.
step a block has been inserted into another CPU-block-queue), thus
it allocates a new block one more time and which is inserted most
likely on a next zone/CPU. And so on.

See:

<snip vb_alloc>
...
        rcu_read_lock();
	vbq = raw_cpu_ptr(&vmap_block_queue); <- Here it is correctly accessing this CPU 
	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
		unsigned long pages_off;
...
<snip vb_alloc>

<snip new_vmap_block>
...
       vbq = addr_to_vbq(va->va_start); <- Here we insert based on hashing, i.e. not to this CPU-block-queue
       spin_lock(&vbq->lock);
       list_add_tail_rcu(&vb->free_list, &vbq->free);
       spin_unlock(&vbq->lock);
...
<snip new_vmap_block>

Thanks!

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-13 11:28             ` Uladzislau Rezki
@ 2024-06-14  1:32               ` Baoquan He
  0 siblings, 0 replies; 15+ messages in thread
From: Baoquan He @ 2024-06-14  1:32 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Zhaoyang Huang, zhaoyang.huang, Andrew Morton, Christoph Hellwig,
	Lorenzo Stoakes, Thomas Gleixner, hailong liu, linux-mm,
	linux-kernel, stable, steve.kang

On 06/13/24 at 01:28pm, Uladzislau Rezki wrote:
> On Thu, Jun 13, 2024 at 04:41:34PM +0800, Baoquan He wrote:
> > On 06/12/24 at 01:27pm, Uladzislau Rezki wrote:
> > > On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> > > > On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > > > >
> > > > > >
> > > > > > Sorry to bother you again. Are there any other comments or new patch
> > > > > > on this which block some test cases of ANDROID that only accept ACKed
> > > > > > one on its tree.
> > > > > >
> > > > > I have just returned from vacation. Give me some time to review your
> > > > > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > > > > i can trigger an issue that is in question.
> > > > This bug arises from an system wide android test which has been
> > > > reported by many vendors. Keep mount/unmount an erofs partition is
> > > > supposed to be a simple reproducer. IMO, the logic defect is obvious
> > > > enough to be found by code review.
> > > >
> > > Baoquan, any objection about this v4?
> > > 
> > > Your proposal about inserting a new vmap-block based on it belongs
> > > to, i.e. not per-this-cpu, should fix an issue. The problem is that
> > > such way does __not__ pre-load a current CPU what is not good.
> > 
> > With my understand, when we start handling to insert vb to vbq->xa and
> > vbq->free, the vmap_area allocation has been done, it doesn't impact the
> > CPU preloading when adding it into which CPU's vbq->free, does it? 
> > 
> > Not sure if I miss anything about the CPU preloading.
> > 
> Like explained below in this email-thread:
> 
> vb_alloc() inserts a new block _not_ on this CPU. This CPU tries to
> allocate one more time and its free_list is empty(because on a prev.
> step a block has been inserted into another CPU-block-queue), thus
> it allocates a new block one more time and which is inserted most
> likely on a next zone/CPU. And so on.

Thanks for detailed explanation, got it now.

It's a pity we can't unify the xa and the list into one vbq structure
based on one principal.

> 
> See:
> 
> <snip vb_alloc>
> ...
>         rcu_read_lock();
> 	vbq = raw_cpu_ptr(&vmap_block_queue); <- Here it is correctly accessing this CPU 
> 	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> 		unsigned long pages_off;
> ...
> <snip vb_alloc>
> 
> <snip new_vmap_block>
> ...
>        vbq = addr_to_vbq(va->va_start); <- Here we insert based on hashing, i.e. not to this CPU-block-queue
>        spin_lock(&vbq->lock);
>        list_add_tail_rcu(&vb->free_list, &vbq->free);
>        spin_unlock(&vbq->lock);
> ...
> <snip new_vmap_block>
> 
> Thanks!
> 
> --
> Uladzislau Rezki
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-12 11:27         ` Uladzislau Rezki
  2024-06-13  8:41           ` Baoquan He
@ 2024-06-13 11:42           ` hailong liu
  1 sibling, 0 replies; 15+ messages in thread
From: hailong liu @ 2024-06-13 11:42 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Zhaoyang Huang, Baoquan He, zhaoyang.huang, Andrew Morton,
	Christoph Hellwig, Lorenzo Stoakes, Thomas Gleixner, linux-mm,
	linux-kernel, stable, steve.kang

On Wed, 12. Jun 13:27, Uladzislau Rezki wrote:
> On Wed, Jun 12, 2024 at 10:00:14AM +0800, Zhaoyang Huang wrote:
> > On Wed, Jun 12, 2024 at 2:16 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> > >
> > > >
> > > > Sorry to bother you again. Are there any other comments or new patch
> > > > on this which block some test cases of ANDROID that only accept ACKed
> > > > one on its tree.
> > > >
> > > I have just returned from vacation. Give me some time to review your
> > > patch. Meanwhile, do you have a reproducer? So i would like to see how
> > > i can trigger an issue that is in question.
> > This bug arises from an system wide android test which has been
> > reported by many vendors. Keep mount/unmount an erofs partition is
> > supposed to be a simple reproducer. IMO, the logic defect is obvious
> > enough to be found by code review.
> >
> Baoquan, any objection about this v4?
>
> Your proposal about inserting a new vmap-block based on it belongs
> to, i.e. not per-this-cpu, should fix an issue. The problem is that
> such way does __not__ pre-load a current CPU what is not good.
>
> --
> Uladzislau Rezki

mediatek & oppo also have this issue based on kernel-6.6,
The following call stack more clearly illustrates the issue:

[] notify_die+0x4c/0x8c
[] die+0x90/0x308
[] bug_handler+0x44/0xec
[] brk_handler+0x90/0x110
[] do_debug_exception+0xa0/0x140
[] el1_dbg+0x54/0x70
[] el1h_64_sync_handler+0x38/0x90
[] el1h_64_sync+0x64/0x6c
[] __list_del_entry_valid_or_report+0xec/0xf0
[] purge_fragmented_block+0xf0/0x104
[] _vm_unmap_aliases+0x12c/0x29c
[] vm_unmap_aliases+0x18/0x28
[] change_memory_common+0x184/0x218
[] set_memory_ro+0x14/0x24
[] bpf_jit_binary_lock_ro+0x38/0x60
[] bpf_int_jit_compile+0x4b4/0x5a0
[] bpf_prog_select_runtime+0x1a8/0x24c
[] bpf_prepare_filter+0x4f0/0x518
[] bpf_prog_create_from_user+0xfc/0x148
[] do_seccomp+0x2a0/0x498
[] prctl_set_seccomp+0x34/0x4c
[] __arm64_sys_prctl+0x4b4/0xea0
[] invoke_syscall+0x54/0x114
[] el0_svc_common+0xa8/0xe0
[] do_el0_svc+0x18/0x28
[] el0_svc+0x34/0x68
[] el0t_64_sync_handler+0x64/0xbc
[] el0t_64_sync+0x1a4/0x1ac

I placed the assignment to the CPU before xa_insert. In fact, putting
it afterwards wouldn’t be a problem either. I just thought this
way the content of vb remains stable. With this patch, the current
stability tests no longer report related issues. Hopefully
it would help you.

--
help you, help me,
Hailong.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-07  2:31 [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block zhaoyang.huang
  2024-06-07  8:30 ` Zhaoyang Huang
@ 2024-06-13 17:33 ` Uladzislau Rezki
  2024-06-13 20:18   ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Uladzislau Rezki @ 2024-06-13 17:33 UTC (permalink / raw)
  To: zhaoyang.huang
  Cc: Andrew Morton, Uladzislau Rezki, Christoph Hellwig,
	Lorenzo Stoakes, Baoquan He, Thomas Gleixner, hailong liu,
	linux-mm, linux-kernel, stable, Zhaoyang Huang, steve.kang

On Fri, Jun 07, 2024 at 10:31:16AM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> vmalloc area runs out in our ARM64 system during an erofs test as
> vm_map_ram failed[1]. By following the debug log, we find that
> vm_map_ram()->vb_alloc() will allocate new vb->va which corresponding
> to 4MB vmalloc area as list_for_each_entry_rcu returns immediately
> when vbq->free->next points to vbq->free. That is to say, 65536 times
> of page fault after the list's broken will run out of the whole
> vmalloc area. This should be introduced by one vbq->free->next point to
> vbq->free which makes list_for_each_entry_rcu can not iterate the list
> and find the BUG.
> 
> [1]
> PID: 1        TASK: ffffff80802b4e00  CPU: 6    COMMAND: "init"
>  #0 [ffffffc08006afe0] __switch_to at ffffffc08111d5cc
>  #1 [ffffffc08006b040] __schedule at ffffffc08111dde0
>  #2 [ffffffc08006b0a0] schedule at ffffffc08111e294
>  #3 [ffffffc08006b0d0] schedule_preempt_disabled at ffffffc08111e3f0
>  #4 [ffffffc08006b140] __mutex_lock at ffffffc08112068c
>  #5 [ffffffc08006b180] __mutex_lock_slowpath at ffffffc08111f8f8
>  #6 [ffffffc08006b1a0] mutex_lock at ffffffc08111f834
>  #7 [ffffffc08006b1d0] reclaim_and_purge_vmap_areas at ffffffc0803ebc3c
>  #8 [ffffffc08006b290] alloc_vmap_area at ffffffc0803e83fc
>  #9 [ffffffc08006b300] vm_map_ram at ffffffc0803e78c0
> 
> Fixes: fc1e0d980037 ("mm/vmalloc: prevent stale TLBs in fully utilized blocks")
> 
> For detailed reason of broken list, please refer to below URL
> https://lore.kernel.org/all/20240531024820.5507-1-hailong.liu@oppo.com/
> 
> Suggested-by: Hailong.Liu <hailong.liu@oppo.com>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
> v2: introduce cpu in vmap_block to record the right CPU number
> v3: use get_cpu/put_cpu to prevent schedule between core
> v4: replace get_cpu/put_cpu by another API to avoid disabling preemption
> ---
> ---
>  mm/vmalloc.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 22aa63f4ef63..89eb034f4ac6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2458,6 +2458,7 @@ struct vmap_block {
>  	struct list_head free_list;
>  	struct rcu_head rcu_head;
>  	struct list_head purge;
> +	unsigned int cpu;
>  };
>  
>  /* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
> @@ -2585,8 +2586,15 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>  		free_vmap_area(va);
>  		return ERR_PTR(err);
>  	}
> -
> -	vbq = raw_cpu_ptr(&vmap_block_queue);
> +	/*
> +	 * list_add_tail_rcu could happened in another core
> +	 * rather than vb->cpu due to task migration, which
> +	 * is safe as list_add_tail_rcu will ensure the list's
> +	 * integrity together with list_for_each_rcu from read
> +	 * side.
> +	 */
> +	vb->cpu = raw_smp_processor_id();
> +	vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
>  	spin_lock(&vbq->lock);
>  	list_add_tail_rcu(&vb->free_list, &vbq->free);
>  	spin_unlock(&vbq->lock);
> @@ -2614,9 +2622,10 @@ static void free_vmap_block(struct vmap_block *vb)
>  }
>  
>  static bool purge_fragmented_block(struct vmap_block *vb,
> -		struct vmap_block_queue *vbq, struct list_head *purge_list,
> -		bool force_purge)
> +		struct list_head *purge_list, bool force_purge)
>  {
> +	struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, vb->cpu);
> +
>  	if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
>  	    vb->dirty == VMAP_BBMAP_BITS)
>  		return false;
> @@ -2664,7 +2673,7 @@ static void purge_fragmented_blocks(int cpu)
>  			continue;
>  
>  		spin_lock(&vb->lock);
> -		purge_fragmented_block(vb, vbq, &purge, true);
> +		purge_fragmented_block(vb, &purge, true);
>  		spin_unlock(&vb->lock);
>  	}
>  	rcu_read_unlock();
> @@ -2801,7 +2810,7 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
>  			 * not purgeable, check whether there is dirty
>  			 * space to be flushed.
>  			 */
> -			if (!purge_fragmented_block(vb, vbq, &purge_list, false) &&
> +			if (!purge_fragmented_block(vb, &purge_list, false) &&
>  			    vb->dirty_max && vb->dirty != VMAP_BBMAP_BITS) {
>  				unsigned long va_start = vb->va->va_start;
>  				unsigned long s, e;
> -- 
> 2.25.1
> 
Yes there is a mess with holding wrong vbq->lock and vmap_blocks xarray.

The patch looks good to me. One small nit from my side is a commit
message. To me it is vague and it should be improved.

Could you please use Hailong.Liu <hailong.liu@oppo.com> explanation
of the issue and resend the patch?

See it here: https://lkml.org/lkml/2024/6/2/2

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block
  2024-06-13 17:33 ` Uladzislau Rezki
@ 2024-06-13 20:18   ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2024-06-13 20:18 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: zhaoyang.huang, Christoph Hellwig, Lorenzo Stoakes, Baoquan He,
	Thomas Gleixner, hailong liu, linux-mm, linux-kernel, stable,
	Zhaoyang Huang, steve.kang

On Thu, 13 Jun 2024 19:33:16 +0200 Uladzislau Rezki <urezki@gmail.com> wrote:

> > 2.25.1
> > 
> Yes there is a mess with holding wrong vbq->lock and vmap_blocks xarray.
> 
> The patch looks good to me. One small nit from my side is a commit
> message. To me it is vague and it should be improved.
> 
> Could you please use Hailong.Liu <hailong.liu@oppo.com> explanation
> of the issue and resend the patch?
> 
> See it here: https://lkml.org/lkml/2024/6/2/2
> 
> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Thanks, I'll queue this (with a cc:stable) for testing and shall await
the updated changelog.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-06-14  1:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-07  2:31 [Resend PATCHv4 1/1] mm: fix incorrect vbq reference in purge_fragmented_block zhaoyang.huang
2024-06-07  8:30 ` Zhaoyang Huang
2024-06-11  1:08   ` Zhaoyang Huang
2024-06-11 18:16     ` Uladzislau Rezki
2024-06-12  2:00       ` Zhaoyang Huang
2024-06-12 11:27         ` Uladzislau Rezki
2024-06-13  8:41           ` Baoquan He
2024-06-13  9:11             ` hailong liu
2024-06-13  9:23               ` Zhaoyang Huang
2024-06-14  1:44                 ` Baoquan He
2024-06-13 11:28             ` Uladzislau Rezki
2024-06-14  1:32               ` Baoquan He
2024-06-13 11:42           ` hailong liu
2024-06-13 17:33 ` Uladzislau Rezki
2024-06-13 20:18   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox