* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 3:30 ` Baoquan He
@ 2024-06-25 10:32 ` Uladzislau Rezki
2024-06-25 11:40 ` Baoquan He
2024-06-25 11:19 ` Hailong Liu
2024-06-25 12:41 ` Uladzislau Rezki
2 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-25 10:32 UTC (permalink / raw)
To: Baoquan He
Cc: Uladzislau Rezki, Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Tue, Jun 25, 2024 at 11:30:33AM +0800, Baoquan He wrote:
> On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > ......
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > static struct xarray *
> > > > > addr_to_vb_xa(unsigned long addr)
> > > > > {
> > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > >
> > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > }
> > > > >
> > > > The problem i see is about not-initializing of the:
> > > > <snip>
> > > > for_each_possible_cpu(i) {
> > > > struct vmap_block_queue *vbq;
> > > > struct vfree_deferred *p;
> > > >
> > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > spin_lock_init(&vbq->lock);
> > > > INIT_LIST_HEAD(&vbq->free);
> > > > p = &per_cpu(vfree_deferred, i);
> > > > init_llist_head(&p->list);
> > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > xa_init(&vbq->vmap_blocks);
> > > > }
> > > > <snip>
> > > >
> > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > can be non sequential :-/
> > > >
> > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > >
> > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > are different with my understanding.
> > >
> > > kernel/smp.c
> > > void __init setup_nr_cpu_ids(void)
> > > {
> > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > }
> > >
> > I see that it is not a weak function, so it is generic, thus the
> > behavior can not be overwritten, which is great. This does what we
> > need.
> >
> > Thank you for checking this you are right!
>
> Thanks for confirming this.
>
> >
> > Then it is just a matter of proper initialization of the hash:
> >
> > <snip>
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 5d3aa2dc88a8..1733946f7a12 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > */
> > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> >
> > - for_each_possible_cpu(i) {
> > + /*
> > + * We use "nr_cpu_ids" here because some architectures
> > + * may have "gaps" in cpu-possible-mask. It is OK for
> > + * per-cpu approaches but is not OK for cases where it
> > + * can be used as hashes also.
> > + */
> > + for (i = 0; i < nr_cpu_ids; i++) {
>
> I was wrong about earlier comments. Percpu variables are only available
> on possible CPUs. For those nonexistent possible CPUs of static percpu
> variable vmap_block_queue, there isn't memory allocated and mapped for
> them. So accessing into them will cause problem.
>
> In Nick's case, there are only CPU0, CPU2. If you access
> &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> change to take other way for vbq. E.g:
> 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> below draft patch;
> 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> store/fetch each vbq on non-possible CPU?
>
A correct way, i think, is to create a normal array. A quick fix can be
to stick to a next possible CPU.
> The way 1) is simpler, the existing code can be adapted a little just as
> below.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 633363997dec..59a8951cc6c0 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> +
> + if (!cpu_possible(idex))
> + index = cpumask_next(index, cpu_possible_mask);
>
cpumask_next() can return nr_cpu_ids if no next bits set.
Thanks!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 10:32 ` Uladzislau Rezki
@ 2024-06-25 11:40 ` Baoquan He
2024-06-25 12:40 ` Uladzislau Rezki
0 siblings, 1 reply; 39+ messages in thread
From: Baoquan He @ 2024-06-25 11:40 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On 06/25/24 at 12:32pm, Uladzislau Rezki wrote:
> On Tue, Jun 25, 2024 at 11:30:33AM +0800, Baoquan He wrote:
> > On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > > ......
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > > static struct xarray *
> > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > {
> > > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > >
> > > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > }
> > > > > >
> > > > > The problem i see is about not-initializing of the:
> > > > > <snip>
> > > > > for_each_possible_cpu(i) {
> > > > > struct vmap_block_queue *vbq;
> > > > > struct vfree_deferred *p;
> > > > >
> > > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > > spin_lock_init(&vbq->lock);
> > > > > INIT_LIST_HEAD(&vbq->free);
> > > > > p = &per_cpu(vfree_deferred, i);
> > > > > init_llist_head(&p->list);
> > > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > > xa_init(&vbq->vmap_blocks);
> > > > > }
> > > > > <snip>
> > > > >
> > > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > > can be non sequential :-/
> > > > >
> > > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > > >
> > > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > > are different with my understanding.
> > > >
> > > > kernel/smp.c
> > > > void __init setup_nr_cpu_ids(void)
> > > > {
> > > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > > }
> > > >
> > > I see that it is not a weak function, so it is generic, thus the
> > > behavior can not be overwritten, which is great. This does what we
> > > need.
> > >
> > > Thank you for checking this you are right!
> >
> > Thanks for confirming this.
> >
> > >
> > > Then it is just a matter of proper initialization of the hash:
> > >
> > > <snip>
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 5d3aa2dc88a8..1733946f7a12 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > > */
> > > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> > >
> > > - for_each_possible_cpu(i) {
> > > + /*
> > > + * We use "nr_cpu_ids" here because some architectures
> > > + * may have "gaps" in cpu-possible-mask. It is OK for
> > > + * per-cpu approaches but is not OK for cases where it
> > > + * can be used as hashes also.
> > > + */
> > > + for (i = 0; i < nr_cpu_ids; i++) {
> >
> > I was wrong about earlier comments. Percpu variables are only available
> > on possible CPUs. For those nonexistent possible CPUs of static percpu
> > variable vmap_block_queue, there isn't memory allocated and mapped for
> > them. So accessing into them will cause problem.
> >
> > In Nick's case, there are only CPU0, CPU2. If you access
> > &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> > change to take other way for vbq. E.g:
> > 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> > below draft patch;
> > 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> > store/fetch each vbq on non-possible CPU?
> >
> A correct way, i think, is to create a normal array. A quick fix can be
> to stick to a next possible CPU.
>
> > The way 1) is simpler, the existing code can be adapted a little just as
> > below.
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 633363997dec..59a8951cc6c0 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > static struct xarray *
> > addr_to_vb_xa(unsigned long addr)
> > {
> > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > +
> > + if (!cpu_possible(idex))
> > + index = cpumask_next(index, cpu_possible_mask);
> >
> cpumask_next() can return nr_cpu_ids if no next bits set.
It won't. nr_cpu_ids is the largest index + 1, the hashed index will
be: 0 =< index <= (nr_cpu_ids - 1) e.g cpu_possible_mask is
b10001111, the nr_cpu_ids is 8, the largest bit is cpu7.
cpu_possible(index) will check that. So the largest bit of cpumask_next()
returns is (nr_cpu_ids - 1).
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 11:40 ` Baoquan He
@ 2024-06-25 12:40 ` Uladzislau Rezki
2024-06-25 13:02 ` Baoquan He
0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-25 12:40 UTC (permalink / raw)
To: Baoquan He
Cc: Uladzislau Rezki, Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Tue, Jun 25, 2024 at 07:40:21PM +0800, Baoquan He wrote:
> On 06/25/24 at 12:32pm, Uladzislau Rezki wrote:
> > On Tue, Jun 25, 2024 at 11:30:33AM +0800, Baoquan He wrote:
> > > On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > > > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > > > ......
> > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > > > --- a/mm/vmalloc.c
> > > > > > > +++ b/mm/vmalloc.c
> > > > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > > > static struct xarray *
> > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > {
> > > > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > > >
> > > > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > }
> > > > > > >
> > > > > > The problem i see is about not-initializing of the:
> > > > > > <snip>
> > > > > > for_each_possible_cpu(i) {
> > > > > > struct vmap_block_queue *vbq;
> > > > > > struct vfree_deferred *p;
> > > > > >
> > > > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > > > spin_lock_init(&vbq->lock);
> > > > > > INIT_LIST_HEAD(&vbq->free);
> > > > > > p = &per_cpu(vfree_deferred, i);
> > > > > > init_llist_head(&p->list);
> > > > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > > > xa_init(&vbq->vmap_blocks);
> > > > > > }
> > > > > > <snip>
> > > > > >
> > > > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > > > can be non sequential :-/
> > > > > >
> > > > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > > > >
> > > > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > > > are different with my understanding.
> > > > >
> > > > > kernel/smp.c
> > > > > void __init setup_nr_cpu_ids(void)
> > > > > {
> > > > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > > > }
> > > > >
> > > > I see that it is not a weak function, so it is generic, thus the
> > > > behavior can not be overwritten, which is great. This does what we
> > > > need.
> > > >
> > > > Thank you for checking this you are right!
> > >
> > > Thanks for confirming this.
> > >
> > > >
> > > > Then it is just a matter of proper initialization of the hash:
> > > >
> > > > <snip>
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 5d3aa2dc88a8..1733946f7a12 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > > > */
> > > > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> > > >
> > > > - for_each_possible_cpu(i) {
> > > > + /*
> > > > + * We use "nr_cpu_ids" here because some architectures
> > > > + * may have "gaps" in cpu-possible-mask. It is OK for
> > > > + * per-cpu approaches but is not OK for cases where it
> > > > + * can be used as hashes also.
> > > > + */
> > > > + for (i = 0; i < nr_cpu_ids; i++) {
> > >
> > > I was wrong about earlier comments. Percpu variables are only available
> > > on possible CPUs. For those nonexistent possible CPUs of static percpu
> > > variable vmap_block_queue, there isn't memory allocated and mapped for
> > > them. So accessing into them will cause problem.
> > >
> > > In Nick's case, there are only CPU0, CPU2. If you access
> > > &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> > > change to take other way for vbq. E.g:
> > > 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> > > below draft patch;
> > > 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> > > store/fetch each vbq on non-possible CPU?
> > >
> > A correct way, i think, is to create a normal array. A quick fix can be
> > to stick to a next possible CPU.
> >
> > > The way 1) is simpler, the existing code can be adapted a little just as
> > > below.
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 633363997dec..59a8951cc6c0 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > static struct xarray *
> > > addr_to_vb_xa(unsigned long addr)
> > > {
> > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > +
> > > + if (!cpu_possible(idex))
> > > + index = cpumask_next(index, cpu_possible_mask);
> > >
> > cpumask_next() can return nr_cpu_ids if no next bits set.
>
> It won't. nr_cpu_ids is the largest index + 1, the hashed index will
> be: 0 =< index <= (nr_cpu_ids - 1) e.g cpu_possible_mask is
> b10001111, the nr_cpu_ids is 8, the largest bit is cpu7.
> cpu_possible(index) will check that. So the largest bit of cpumask_next()
> returns is (nr_cpu_ids - 1).
>
/**
* cpumask_next - get the next cpu in a cpumask
* @n: the cpu prior to the place to search (i.e. return will be > @n)
* @srcp: the cpumask pointer
*
* Return: >= nr_cpu_ids if no further cpus set.
*/
static inline
unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
/* -1 is a legal arg here. */
if (n != -1)
cpumask_check(n);
return find_next_bit(cpumask_bits(srcp), small_cpumask_bits, n + 1);
}
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 12:40 ` Uladzislau Rezki
@ 2024-06-25 13:02 ` Baoquan He
2024-06-25 15:33 ` Uladzislau Rezki
0 siblings, 1 reply; 39+ messages in thread
From: Baoquan He @ 2024-06-25 13:02 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On 06/25/24 at 02:40pm, Uladzislau Rezki wrote:
> On Tue, Jun 25, 2024 at 07:40:21PM +0800, Baoquan He wrote:
> > On 06/25/24 at 12:32pm, Uladzislau Rezki wrote:
> > > On Tue, Jun 25, 2024 at 11:30:33AM +0800, Baoquan He wrote:
> > > > On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > > > > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > > > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > > > > ......
> > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > > > > --- a/mm/vmalloc.c
> > > > > > > > +++ b/mm/vmalloc.c
> > > > > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > > > > static struct xarray *
> > > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > > {
> > > > > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > > > >
> > > > > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > > }
> > > > > > > >
> > > > > > > The problem i see is about not-initializing of the:
> > > > > > > <snip>
> > > > > > > for_each_possible_cpu(i) {
> > > > > > > struct vmap_block_queue *vbq;
> > > > > > > struct vfree_deferred *p;
> > > > > > >
> > > > > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > > > > spin_lock_init(&vbq->lock);
> > > > > > > INIT_LIST_HEAD(&vbq->free);
> > > > > > > p = &per_cpu(vfree_deferred, i);
> > > > > > > init_llist_head(&p->list);
> > > > > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > > > > xa_init(&vbq->vmap_blocks);
> > > > > > > }
> > > > > > > <snip>
> > > > > > >
> > > > > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > > > > can be non sequential :-/
> > > > > > >
> > > > > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > > > > >
> > > > > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > > > > are different with my understanding.
> > > > > >
> > > > > > kernel/smp.c
> > > > > > void __init setup_nr_cpu_ids(void)
> > > > > > {
> > > > > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > > > > }
> > > > > >
> > > > > I see that it is not a weak function, so it is generic, thus the
> > > > > behavior can not be overwritten, which is great. This does what we
> > > > > need.
> > > > >
> > > > > Thank you for checking this you are right!
> > > >
> > > > Thanks for confirming this.
> > > >
> > > > >
> > > > > Then it is just a matter of proper initialization of the hash:
> > > > >
> > > > > <snip>
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 5d3aa2dc88a8..1733946f7a12 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > > > > */
> > > > > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> > > > >
> > > > > - for_each_possible_cpu(i) {
> > > > > + /*
> > > > > + * We use "nr_cpu_ids" here because some architectures
> > > > > + * may have "gaps" in cpu-possible-mask. It is OK for
> > > > > + * per-cpu approaches but is not OK for cases where it
> > > > > + * can be used as hashes also.
> > > > > + */
> > > > > + for (i = 0; i < nr_cpu_ids; i++) {
> > > >
> > > > I was wrong about earlier comments. Percpu variables are only available
> > > > on possible CPUs. For those nonexistent possible CPUs of static percpu
> > > > variable vmap_block_queue, there isn't memory allocated and mapped for
> > > > them. So accessing into them will cause problem.
> > > >
> > > > In Nick's case, there are only CPU0, CPU2. If you access
> > > > &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> > > > change to take other way for vbq. E.g:
> > > > 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> > > > below draft patch;
> > > > 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> > > > store/fetch each vbq on non-possible CPU?
> > > >
> > > A correct way, i think, is to create a normal array. A quick fix can be
> > > to stick to a next possible CPU.
> > >
> > > > The way 1) is simpler, the existing code can be adapted a little just as
> > > > below.
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 633363997dec..59a8951cc6c0 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > static struct xarray *
> > > > addr_to_vb_xa(unsigned long addr)
> > > > {
> > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > +
> > > > + if (!cpu_possible(idex))
> > > > + index = cpumask_next(index, cpu_possible_mask);
> > > >
> > > cpumask_next() can return nr_cpu_ids if no next bits set.
> >
> > It won't. nr_cpu_ids is the largest index + 1, the hashed index will
> > be: 0 =< index <= (nr_cpu_ids - 1) e.g cpu_possible_mask is
> > b10001111, the nr_cpu_ids is 8, the largest bit is cpu7.
> > cpu_possible(index) will check that. So the largest bit of cpumask_next()
> > returns is (nr_cpu_ids - 1).
> >
> /**
> * cpumask_next - get the next cpu in a cpumask
> * @n: the cpu prior to the place to search (i.e. return will be > @n)
> * @srcp: the cpumask pointer
> *
> * Return: >= nr_cpu_ids if no further cpus set.
Ah, I got what you mean. In the vbq case, it may not have chance to get
a return number as nr_cpu_ids. Becuase the hashed index limits the
range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
be possible CPU.
Do I miss some corner cases?
> */
> static inline
> unsigned int cpumask_next(int n, const struct cpumask *srcp)
> {
> /* -1 is a legal arg here. */
> if (n != -1)
> cpumask_check(n);
> return find_next_bit(cpumask_bits(srcp), small_cpumask_bits, n + 1);
> }
>
> --
> Uladzislau Rezki
>
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 13:02 ` Baoquan He
@ 2024-06-25 15:33 ` Uladzislau Rezki
2024-06-25 15:49 ` Baoquan He
0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-25 15:33 UTC (permalink / raw)
To: Baoquan He, Hailong Liu
Cc: Uladzislau Rezki, Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Tue, Jun 25, 2024 at 09:02:43PM +0800, Baoquan He wrote:
> On 06/25/24 at 02:40pm, Uladzislau Rezki wrote:
> > On Tue, Jun 25, 2024 at 07:40:21PM +0800, Baoquan He wrote:
> > > On 06/25/24 at 12:32pm, Uladzislau Rezki wrote:
> > > > On Tue, Jun 25, 2024 at 11:30:33AM +0800, Baoquan He wrote:
> > > > > On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > > > > > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > > > > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > > > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > > > > > ......
> > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > > > > > --- a/mm/vmalloc.c
> > > > > > > > > +++ b/mm/vmalloc.c
> > > > > > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > > > > > static struct xarray *
> > > > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > > > {
> > > > > > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > > > > >
> > > > > > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > The problem i see is about not-initializing of the:
> > > > > > > > <snip>
> > > > > > > > for_each_possible_cpu(i) {
> > > > > > > > struct vmap_block_queue *vbq;
> > > > > > > > struct vfree_deferred *p;
> > > > > > > >
> > > > > > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > > > > > spin_lock_init(&vbq->lock);
> > > > > > > > INIT_LIST_HEAD(&vbq->free);
> > > > > > > > p = &per_cpu(vfree_deferred, i);
> > > > > > > > init_llist_head(&p->list);
> > > > > > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > > > > > xa_init(&vbq->vmap_blocks);
> > > > > > > > }
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > > > > > can be non sequential :-/
> > > > > > > >
> > > > > > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > > > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > > > > > >
> > > > > > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > > > > > are different with my understanding.
> > > > > > >
> > > > > > > kernel/smp.c
> > > > > > > void __init setup_nr_cpu_ids(void)
> > > > > > > {
> > > > > > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > > > > > }
> > > > > > >
> > > > > > I see that it is not a weak function, so it is generic, thus the
> > > > > > behavior can not be overwritten, which is great. This does what we
> > > > > > need.
> > > > > >
> > > > > > Thank you for checking this you are right!
> > > > >
> > > > > Thanks for confirming this.
> > > > >
> > > > > >
> > > > > > Then it is just a matter of proper initialization of the hash:
> > > > > >
> > > > > > <snip>
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index 5d3aa2dc88a8..1733946f7a12 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > > > > > */
> > > > > > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> > > > > >
> > > > > > - for_each_possible_cpu(i) {
> > > > > > + /*
> > > > > > + * We use "nr_cpu_ids" here because some architectures
> > > > > > + * may have "gaps" in cpu-possible-mask. It is OK for
> > > > > > + * per-cpu approaches but is not OK for cases where it
> > > > > > + * can be used as hashes also.
> > > > > > + */
> > > > > > + for (i = 0; i < nr_cpu_ids; i++) {
> > > > >
> > > > > I was wrong about earlier comments. Percpu variables are only available
> > > > > on possible CPUs. For those nonexistent possible CPUs of static percpu
> > > > > variable vmap_block_queue, there isn't memory allocated and mapped for
> > > > > them. So accessing into them will cause problem.
> > > > >
> > > > > In Nick's case, there are only CPU0, CPU2. If you access
> > > > > &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> > > > > change to take other way for vbq. E.g:
> > > > > 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> > > > > below draft patch;
> > > > > 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> > > > > store/fetch each vbq on non-possible CPU?
> > > > >
> > > > A correct way, i think, is to create a normal array. A quick fix can be
> > > > to stick to a next possible CPU.
> > > >
> > > > > The way 1) is simpler, the existing code can be adapted a little just as
> > > > > below.
> > > > >
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 633363997dec..59a8951cc6c0 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > static struct xarray *
> > > > > addr_to_vb_xa(unsigned long addr)
> > > > > {
> > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > +
> > > > > + if (!cpu_possible(idex))
> > > > > + index = cpumask_next(index, cpu_possible_mask);
> > > > >
> > > > cpumask_next() can return nr_cpu_ids if no next bits set.
> > >
> > > It won't. nr_cpu_ids is the largest index + 1, the hashed index will
> > > be: 0 =< index <= (nr_cpu_ids - 1) e.g cpu_possible_mask is
> > > b10001111, the nr_cpu_ids is 8, the largest bit is cpu7.
> > > cpu_possible(index) will check that. So the largest bit of cpumask_next()
> > > returns is (nr_cpu_ids - 1).
> > >
> > /**
> > * cpumask_next - get the next cpu in a cpumask
> > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > * @srcp: the cpumask pointer
> > *
> > * Return: >= nr_cpu_ids if no further cpus set.
>
> Ah, I got what you mean. In the vbq case, it may not have chance to get
> a return number as nr_cpu_ids. Becuase the hashed index limits the
> range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> be possible CPU.
>
> Do I miss some corner cases?
>
Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
So we do not need to use *next_wrap() variant. You do not miss anything :)
Hailong Liu has proposed more simpler version:
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 11fe5ea208aa..e1e63ffb9c57 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1994,8 +1994,9 @@ static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int cpu = cpumask_nth(index, cpu_possible_mask);
- return &per_cpu(vmap_block_queue, index).vmap_blocks;
+ return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
<snip>
which just takes a next CPU if an index is not set in the cpu_possible_mask.
The only thing that can be updated in the patch is to replace num_possible_cpu()
by the nr_cpu_ids.
Any thoughts? I think we need to fix it by a minor change so it is
easier to back-port on stable kernels.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 15:33 ` Uladzislau Rezki
@ 2024-06-25 15:49 ` Baoquan He
2024-06-25 16:49 ` Uladzislau Rezki
0 siblings, 1 reply; 39+ messages in thread
From: Baoquan He @ 2024-06-25 15:49 UTC (permalink / raw)
To: Uladzislau Rezki, Hailong Liu
Cc: Nick Bowler, linux-kernel, Linux regressions mailing list,
linux-mm, sparclinux, Andrew Morton
On 06/25/24 at 05:33pm, Uladzislau Rezki wrote:
> On Tue, Jun 25, 2024 at 09:02:43PM +0800, Baoquan He wrote:
> > On 06/25/24 at 02:40pm, Uladzislau Rezki wrote:
> > > On Tue, Jun 25, 2024 at 07:40:21PM +0800, Baoquan He wrote:
> > > > On 06/25/24 at 12:32pm, Uladzislau Rezki wrote:
> > > > > On Tue, Jun 25, 2024 at 11:30:33AM +0800, Baoquan He wrote:
> > > > > > On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > > > > > > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > > > > > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > > > > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > > > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > > > > > > ......
> > > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > > > > > > --- a/mm/vmalloc.c
> > > > > > > > > > +++ b/mm/vmalloc.c
> > > > > > > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > > > > > > static struct xarray *
> > > > > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > > > > {
> > > > > > > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > > > > > >
> > > > > > > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > The problem i see is about not-initializing of the:
> > > > > > > > > <snip>
> > > > > > > > > for_each_possible_cpu(i) {
> > > > > > > > > struct vmap_block_queue *vbq;
> > > > > > > > > struct vfree_deferred *p;
> > > > > > > > >
> > > > > > > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > > > > > > spin_lock_init(&vbq->lock);
> > > > > > > > > INIT_LIST_HEAD(&vbq->free);
> > > > > > > > > p = &per_cpu(vfree_deferred, i);
> > > > > > > > > init_llist_head(&p->list);
> > > > > > > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > > > > > > xa_init(&vbq->vmap_blocks);
> > > > > > > > > }
> > > > > > > > > <snip>
> > > > > > > > >
> > > > > > > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > > > > > > can be non sequential :-/
> > > > > > > > >
> > > > > > > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > > > > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > > > > > > >
> > > > > > > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > > > > > > are different with my understanding.
> > > > > > > >
> > > > > > > > kernel/smp.c
> > > > > > > > void __init setup_nr_cpu_ids(void)
> > > > > > > > {
> > > > > > > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > > > > > > }
> > > > > > > >
> > > > > > > I see that it is not a weak function, so it is generic, thus the
> > > > > > > behavior can not be overwritten, which is great. This does what we
> > > > > > > need.
> > > > > > >
> > > > > > > Thank you for checking this you are right!
> > > > > >
> > > > > > Thanks for confirming this.
> > > > > >
> > > > > > >
> > > > > > > Then it is just a matter of proper initialization of the hash:
> > > > > > >
> > > > > > > <snip>
> > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > index 5d3aa2dc88a8..1733946f7a12 100644
> > > > > > > --- a/mm/vmalloc.c
> > > > > > > +++ b/mm/vmalloc.c
> > > > > > > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > > > > > > */
> > > > > > > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> > > > > > >
> > > > > > > - for_each_possible_cpu(i) {
> > > > > > > + /*
> > > > > > > + * We use "nr_cpu_ids" here because some architectures
> > > > > > > + * may have "gaps" in cpu-possible-mask. It is OK for
> > > > > > > + * per-cpu approaches but is not OK for cases where it
> > > > > > > + * can be used as hashes also.
> > > > > > > + */
> > > > > > > + for (i = 0; i < nr_cpu_ids; i++) {
> > > > > >
> > > > > > I was wrong about earlier comments. Percpu variables are only available
> > > > > > on possible CPUs. For those nonexistent possible CPUs of static percpu
> > > > > > variable vmap_block_queue, there isn't memory allocated and mapped for
> > > > > > them. So accessing into them will cause problem.
> > > > > >
> > > > > > In Nick's case, there are only CPU0, CPU2. If you access
> > > > > > &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> > > > > > change to take other way for vbq. E.g:
> > > > > > 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> > > > > > below draft patch;
> > > > > > 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> > > > > > store/fetch each vbq on non-possible CPU?
> > > > > >
> > > > > A correct way, i think, is to create a normal array. A quick fix can be
> > > > > to stick to a next possible CPU.
> > > > >
> > > > > > The way 1) is simpler, the existing code can be adapted a little just as
> > > > > > below.
> > > > > >
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index 633363997dec..59a8951cc6c0 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > > static struct xarray *
> > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > {
> > > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > > +
> > > > > > + if (!cpu_possible(idex))
> > > > > > + index = cpumask_next(index, cpu_possible_mask);
> > > > > >
> > > > > cpumask_next() can return nr_cpu_ids if no next bits set.
> > > >
> > > > It won't. nr_cpu_ids is the largest index + 1, the hashed index will
> > > > be: 0 =< index <= (nr_cpu_ids - 1) e.g cpu_possible_mask is
> > > > b10001111, the nr_cpu_ids is 8, the largest bit is cpu7.
> > > > cpu_possible(index) will check that. So the largest bit of cpumask_next()
> > > > returns is (nr_cpu_ids - 1).
> > > >
> > > /**
> > > * cpumask_next - get the next cpu in a cpumask
> > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > * @srcp: the cpumask pointer
> > > *
> > > * Return: >= nr_cpu_ids if no further cpus set.
> >
> > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > be possible CPU.
> >
> > Do I miss some corner cases?
> >
> Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> So we do not need to use *next_wrap() variant. You do not miss anything :)
>
> Hailong Liu has proposed more simpler version:
>
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 11fe5ea208aa..e1e63ffb9c57 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1994,8 +1994,9 @@ static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int cpu = cpumask_nth(index, cpu_possible_mask);
>
> - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> <snip>
>
> which just takes a next CPU if an index is not set in the cpu_possible_mask.
>
> The only thing that can be updated in the patch is to replace num_possible_cpu()
> by the nr_cpu_ids.
>
> Any thoughts? I think we need to fix it by a minor change so it is
> easier to back-port on stable kernels.
Yeah, sounds good since the regresson commit is merged in v6.3.
Please feel free to post this and the hash array patch separately for
formal reviewing.
By the way, when I am replying this mail, I check the cpumask_nth()
again. I doubt it may take more checking then cpu_possible(), given most
of systems don't have gaps in cpu_possible_mask. I could be dizzy at
this moment.
static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
{
return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
}
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 15:49 ` Baoquan He
@ 2024-06-25 16:49 ` Uladzislau Rezki
2024-06-25 20:05 ` Uladzislau Rezki
0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-25 16:49 UTC (permalink / raw)
To: Nick Bowler
Cc: Uladzislau Rezki, Hailong Liu, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
> > > > /**
> > > > * cpumask_next - get the next cpu in a cpumask
> > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > * @srcp: the cpumask pointer
> > > > *
> > > > * Return: >= nr_cpu_ids if no further cpus set.
> > >
> > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > be possible CPU.
> > >
> > > Do I miss some corner cases?
> > >
> > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > So we do not need to use *next_wrap() variant. You do not miss anything :)
> >
> > Hailong Liu has proposed more simpler version:
> >
> > <snip>
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 11fe5ea208aa..e1e63ffb9c57 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1994,8 +1994,9 @@ static struct xarray *
> > addr_to_vb_xa(unsigned long addr)
> > {
> > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > + int cpu = cpumask_nth(index, cpu_possible_mask);
> >
> > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > <snip>
> >
> > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> >
> > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > by the nr_cpu_ids.
> >
> > Any thoughts? I think we need to fix it by a minor change so it is
> > easier to back-port on stable kernels.
>
> Yeah, sounds good since the regresson commit is merged in v6.3.
> Please feel free to post this and the hash array patch separately for
> formal reviewing.
>
Agreed! The patch about hash array i will post later.
> By the way, when I am replying this mail, I check the cpumask_nth()
> again. I doubt it may take more checking then cpu_possible(), given most
> of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> this moment.
>
> static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> {
> return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> }
>
Yep, i do not think it is a big problem based on your noted fact.
Nick, could you please test your machine with below change?
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 45e1506d58c3..5458fd2290cf 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2542,9 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
- int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+ int cpu = cpumask_nth(index, cpu_possible_mask);
- return &per_cpu(vmap_block_queue, index).vmap_blocks;
+ return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
}
/*
<snip>
Thank you in advance!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 16:49 ` Uladzislau Rezki
@ 2024-06-25 20:05 ` Uladzislau Rezki
2024-06-26 0:38 ` Baoquan He
2024-06-26 5:12 ` Hailong Liu
0 siblings, 2 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-25 20:05 UTC (permalink / raw)
To: Baoquan He
Cc: Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
> > > > > /**
> > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > * @srcp: the cpumask pointer
> > > > > *
> > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > >
> > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > be possible CPU.
> > > >
> > > > Do I miss some corner cases?
> > > >
> > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > >
> > > Hailong Liu has proposed more simpler version:
> > >
> > > <snip>
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > addr_to_vb_xa(unsigned long addr)
> > > {
> > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > >
> > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > <snip>
> > >
> > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > >
> > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > by the nr_cpu_ids.
> > >
> > > Any thoughts? I think we need to fix it by a minor change so it is
> > > easier to back-port on stable kernels.
> >
> > Yeah, sounds good since the regresson commit is merged in v6.3.
> > Please feel free to post this and the hash array patch separately for
> > formal reviewing.
> >
> Agreed! The patch about hash array i will post later.
>
> > By the way, when I am replying this mail, I check the cpumask_nth()
> > again. I doubt it may take more checking then cpu_possible(), given most
> > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > this moment.
> >
> > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > {
> > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > }
> >
> Yep, i do not think it is a big problem based on your noted fact.
>
Checked. There is a difference:
1. Default
<snip>
...
+ 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
+ 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
+ 15.13% 12.05% [kernel] [k] vunmap_p4d_range
+ 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
+ 10.62% 0.00% [kernel] [k] ret_from_fork_asm
+ 10.62% 0.00% [kernel] [k] ret_from_fork
+ 10.62% 0.00% [kernel] [k] kthread
...
<snip>
2. Check if cpu_possible() and then fallback to cpumask_nth() if not
<snip>
...
+ 6.84% 0.29% [kernel] [k] alloc_vmap_area
+ 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
+ 4.24% 0.09% [kernel] [k] free_vmap_block
+ 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
+ 1.94% 1.91% [kernel] [k] xas_start
...
<snip>
It is _worth_ to check if an index is in possible mask:
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 45e1506d58c3..af20f78c2cbf 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
- int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+
+ if (!cpu_possible(index))
+ index = cpumask_nth(index, cpu_possible_mask);
return &per_cpu(vmap_block_queue, index).vmap_blocks;
}
cpumask_nth() is not cheap. My measurements are based on a synthetic
tight test and it detects a difference. In a real workloads it should
not be visible. Having gaps is not a common case plus a "slow path"
will be mitigated by the hit against possible mask.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 20:05 ` Uladzislau Rezki
@ 2024-06-26 0:38 ` Baoquan He
2024-06-26 5:12 ` Hailong Liu
1 sibling, 0 replies; 39+ messages in thread
From: Baoquan He @ 2024-06-26 0:38 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On 06/25/24 at 10:05pm, Uladzislau Rezki wrote:
> > > > > > /**
> > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > * @srcp: the cpumask pointer
> > > > > > *
> > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > >
> > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > be possible CPU.
> > > > >
> > > > > Do I miss some corner cases?
> > > > >
> > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > >
> > > > Hailong Liu has proposed more simpler version:
> > > >
> > > > <snip>
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > addr_to_vb_xa(unsigned long addr)
> > > > {
> > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > >
> > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > <snip>
> > > >
> > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > >
> > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > by the nr_cpu_ids.
> > > >
> > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > easier to back-port on stable kernels.
> > >
> > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > Please feel free to post this and the hash array patch separately for
> > > formal reviewing.
> > >
> > Agreed! The patch about hash array i will post later.
> >
> > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > again. I doubt it may take more checking then cpu_possible(), given most
> > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > this moment.
> > >
> > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > {
> > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > }
> > >
> > Yep, i do not think it is a big problem based on your noted fact.
> >
> Checked. There is a difference:
>
> 1. Default
>
> <snip>
> ...
> + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> + 10.62% 0.00% [kernel] [k] ret_from_fork
> + 10.62% 0.00% [kernel] [k] kthread
> ...
> <snip>
>
> 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
>
> <snip>
> ...
> + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> + 4.24% 0.09% [kernel] [k] free_vmap_block
> + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> + 1.94% 1.91% [kernel] [k] xas_start
> ...
> <snip>
>
> It is _worth_ to check if an index is in possible mask:
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 45e1506d58c3..af20f78c2cbf 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> +
> + if (!cpu_possible(index))
> + index = cpumask_nth(index, cpu_possible_mask);
>
> return &per_cpu(vmap_block_queue, index).vmap_blocks;
> }
>
> cpumask_nth() is not cheap. My measurements are based on a synthetic
> tight test and it detects a difference. In a real workloads it should
> not be visible. Having gaps is not a common case plus a "slow path"
> will be mitigated by the hit against possible mask.
Ah, this is consistent with my understanding from the code, thanks
for confirming by testing.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 20:05 ` Uladzislau Rezki
2024-06-26 0:38 ` Baoquan He
@ 2024-06-26 5:12 ` Hailong Liu
2024-06-26 9:15 ` Uladzislau Rezki
1 sibling, 1 reply; 39+ messages in thread
From: Hailong Liu @ 2024-06-26 5:12 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Baoquan He, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > /**
> > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > * @srcp: the cpumask pointer
> > > > > > *
> > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > >
> > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > be possible CPU.
> > > > >
> > > > > Do I miss some corner cases?
> > > > >
> > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > >
> > > > Hailong Liu has proposed more simpler version:
> > > >
> > > > <snip>
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > addr_to_vb_xa(unsigned long addr)
> > > > {
> > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > >
> > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > <snip>
> > > >
> > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > >
> > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > by the nr_cpu_ids.
> > > >
> > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > easier to back-port on stable kernels.
> > >
> > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > Please feel free to post this and the hash array patch separately for
> > > formal reviewing.
> > >
> > Agreed! The patch about hash array i will post later.
> >
> > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > again. I doubt it may take more checking then cpu_possible(), given most
> > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > this moment.
> > >
> > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > {
> > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > }
> > >
> > Yep, i do not think it is a big problem based on your noted fact.
> >
> Checked. There is a difference:
>
> 1. Default
>
> <snip>
> ...
> + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> + 10.62% 0.00% [kernel] [k] ret_from_fork
> + 10.62% 0.00% [kernel] [k] kthread
> ...
> <snip>
>
> 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
>
> <snip>
> ...
> + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> + 4.24% 0.09% [kernel] [k] free_vmap_block
> + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> + 1.94% 1.91% [kernel] [k] xas_start
> ...
> <snip>
>
> It is _worth_ to check if an index is in possible mask:
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 45e1506d58c3..af20f78c2cbf 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
IIUC, use nr_cpu_ids here maybe incorrect.
take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
might return 64.
/**
* cpumask_nth_and - get the first cpu in 2 cpumasks
* @srcp1: the cpumask pointer
* @srcp2: the cpumask pointer
* @cpu: the N'th cpu to find, starting from 0 <--- N'th cpu
*
* Returns >= nr_cpu_ids if such cpu doesn't exist. <-----
*/
static inline
unsigned int cpumask_nth_and(unsigned int cpu, const struct cpumask *srcp1,
const struct cpumask *srcp2)
{
return find_nth_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2),
nr_cpumask_bits, cpumask_check(cpu));
}
I use num_possible_cpus() and cpumask_nth() here to distribute the addresses
evenly across different CPUs.
if we use cpumask_next(index) or use cpumask_nth(index, cpu_possible_mask)
becomes as follows:
CPU_0 CPU_2 CPU_2
| | |
V V V
0 10 20 30 40 50 60
|------|------|------|------|------|------|..
> +
> + if (!cpu_possible(index))
> + index = cpumask_nth(index, cpu_possible_mask);
>
> return &per_cpu(vmap_block_queue, index).vmap_blocks;
> }
>
> cpumask_nth() is not cheap. My measurements are based on a synthetic
> tight test and it detects a difference. In a real workloads it should
> not be visible. Having gaps is not a common case plus a "slow path"
> will be mitigated by the hit against possible mask.
If cpumask_nth() is not cheap or have performance regression. Perhaps we
can use the solution suggested by Haoquan. I’ve drafted as follows:
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 11fe5ea208aa..355dbfdf51f1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -92,6 +92,7 @@ struct vfree_deferred {
struct work_struct wq;
};
static DEFINE_PER_CPU(struct vfree_deferred, vfree_deferred);
+static unsigned int *table_non_seq_cpu;
/*** Page table manipulation functions ***/
static int vmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
@@ -1995,6 +1996,10 @@ addr_to_vb_xa(unsigned long addr)
{
int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ /* recalculate the cpuid if cpumask is not full. */
+ if (table_non_seq_cpu)
+ index = table_non_seq_cpu[index];
+
return &per_cpu(vmap_block_queue, index).vmap_blocks;
}
@@ -4473,17 +4478,25 @@ void __init vmalloc_init(void)
{
struct vmap_area *va;
struct vm_struct *tmp;
- int i;
+ int i, inx = 0;
/*
* Create the cache for vmap_area objects.
*/
vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
+ if (!cpumask_full(cpu_possible_mask)) {
+ table_non_seq_cpu = kzalloc(num_possible_cpus() * sizeof(unsigned int),
+ GFP_NOWAIT);
+ BUG_ON(!table_non_seq_cpu);
+ }
+
for_each_possible_cpu(i) {
struct vmap_block_queue *vbq;
struct vfree_deferred *p;
+ if (table_non_seq_cpu)
+ table_non_seq_cpu[inx++] = i;
vbq = &per_cpu(vmap_block_queue, i);
spin_lock_init(&vbq->lock);
INIT_LIST_HEAD(&vbq->free);
>
> --
> Uladzislau Rezki
--
help you, help me,
Hailong.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 5:12 ` Hailong Liu
@ 2024-06-26 9:15 ` Uladzislau Rezki
2024-06-26 10:03 ` Hailong Liu
0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-26 9:15 UTC (permalink / raw)
To: Hailong Liu
Cc: Uladzislau Rezki, Baoquan He, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > /**
> > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > * @srcp: the cpumask pointer
> > > > > > > *
> > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > >
> > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > be possible CPU.
> > > > > >
> > > > > > Do I miss some corner cases?
> > > > > >
> > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > >
> > > > > Hailong Liu has proposed more simpler version:
> > > > >
> > > > > <snip>
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > addr_to_vb_xa(unsigned long addr)
> > > > > {
> > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > >
> > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > <snip>
> > > > >
> > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > >
> > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > by the nr_cpu_ids.
> > > > >
> > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > easier to back-port on stable kernels.
> > > >
> > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > Please feel free to post this and the hash array patch separately for
> > > > formal reviewing.
> > > >
> > > Agreed! The patch about hash array i will post later.
> > >
> > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > this moment.
> > > >
> > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > {
> > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > }
> > > >
> > > Yep, i do not think it is a big problem based on your noted fact.
> > >
> > Checked. There is a difference:
> >
> > 1. Default
> >
> > <snip>
> > ...
> > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > + 10.62% 0.00% [kernel] [k] kthread
> > ...
> > <snip>
> >
> > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> >
> > <snip>
> > ...
> > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > + 1.94% 1.91% [kernel] [k] xas_start
> > ...
> > <snip>
> >
> > It is _worth_ to check if an index is in possible mask:
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 45e1506d58c3..af20f78c2cbf 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > static struct xarray *
> > addr_to_vb_xa(unsigned long addr)
> > {
> > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> IIUC, use nr_cpu_ids here maybe incorrect.
>
> take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> might return 64.
>
But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
What i miss here?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 9:15 ` Uladzislau Rezki
@ 2024-06-26 10:03 ` Hailong Liu
2024-06-26 10:51 ` Baoquan He
2024-06-26 10:51 ` Uladzislau Rezki
0 siblings, 2 replies; 39+ messages in thread
From: Hailong Liu @ 2024-06-26 10:03 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Baoquan He, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > /**
> > > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > * @srcp: the cpumask pointer
> > > > > > > > *
> > > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > >
> > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > be possible CPU.
> > > > > > >
> > > > > > > Do I miss some corner cases?
> > > > > > >
> > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > >
> > > > > > Hailong Liu has proposed more simpler version:
> > > > > >
> > > > > > <snip>
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > {
> > > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > >
> > > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > <snip>
> > > > > >
> > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > >
> > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > by the nr_cpu_ids.
> > > > > >
> > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > easier to back-port on stable kernels.
> > > > >
> > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > Please feel free to post this and the hash array patch separately for
> > > > > formal reviewing.
> > > > >
> > > > Agreed! The patch about hash array i will post later.
> > > >
> > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > this moment.
> > > > >
> > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > {
> > > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > }
> > > > >
> > > > Yep, i do not think it is a big problem based on your noted fact.
> > > >
> > > Checked. There is a difference:
> > >
> > > 1. Default
> > >
> > > <snip>
> > > ...
> > > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > > + 10.62% 0.00% [kernel] [k] kthread
> > > ...
> > > <snip>
> > >
> > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > >
> > > <snip>
> > > ...
> > > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > > + 1.94% 1.91% [kernel] [k] xas_start
> > > ...
> > > <snip>
> > >
> > > It is _worth_ to check if an index is in possible mask:
> > >
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 45e1506d58c3..af20f78c2cbf 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > static struct xarray *
> > > addr_to_vb_xa(unsigned long addr)
> > > {
> > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > IIUC, use nr_cpu_ids here maybe incorrect.
> >
> > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > might return 64.
> >
> But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
>
> What i miss here?
>
Sorry, I forget to reply to all :), I write a demo to test as follows:
static int cpumask_init(void)
{
struct cpumask mask;
unsigned int cpu_id;
cpumask_clear(&mask);
cpumask_set_cpu(1, &mask);
cpumask_set_cpu(3, &mask);
cpumask_set_cpu(5, &mask);
cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
pr_info("cpu_id:%d\n", cpu_id);
for (; i < nr_cpu_ids; i++) {
pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
}
return 0;
}
[ 1.337020][ T1] cpu_id:6
[ 1.337338][ T1] 0: cpu_1
[ 1.337558][ T1] 1: cpu_3
[ 1.337751][ T1] 2: cpu_5
[ 1.337960][ T1] 3: cpu_64
[ 1.338183][ T1] 4: cpu_64
[ 1.338387][ T1] 5: cpu_64
[ 1.338594][ T1] 6: cpu_64
In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
--
help you, help me,
Hailong.
> --
> Uladzislau Rezki
--
help you, help me,
Hailong.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 10:03 ` Hailong Liu
@ 2024-06-26 10:51 ` Baoquan He
2024-06-26 10:53 ` Uladzislau Rezki
2024-06-26 11:30 ` Hailong Liu
2024-06-26 10:51 ` Uladzislau Rezki
1 sibling, 2 replies; 39+ messages in thread
From: Baoquan He @ 2024-06-26 10:51 UTC (permalink / raw)
To: Hailong Liu
Cc: Uladzislau Rezki, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On 06/26/24 at 06:03pm, Hailong Liu wrote:
> On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> > On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > > /**
> > > > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > > * @srcp: the cpumask pointer
> > > > > > > > > *
> > > > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > > >
> > > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > > be possible CPU.
> > > > > > > >
> > > > > > > > Do I miss some corner cases?
> > > > > > > >
> > > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > > >
> > > > > > > Hailong Liu has proposed more simpler version:
> > > > > > >
> > > > > > > <snip>
> > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > > --- a/mm/vmalloc.c
> > > > > > > +++ b/mm/vmalloc.c
> > > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > {
> > > > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > > >
> > > > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > > <snip>
> > > > > > >
> > > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > > >
> > > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > > by the nr_cpu_ids.
> > > > > > >
> > > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > > easier to back-port on stable kernels.
> > > > > >
> > > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > > Please feel free to post this and the hash array patch separately for
> > > > > > formal reviewing.
> > > > > >
> > > > > Agreed! The patch about hash array i will post later.
> > > > >
> > > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > > this moment.
> > > > > >
> > > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > > {
> > > > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > > }
> > > > > >
> > > > > Yep, i do not think it is a big problem based on your noted fact.
> > > > >
> > > > Checked. There is a difference:
> > > >
> > > > 1. Default
> > > >
> > > > <snip>
> > > > ...
> > > > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > > > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > > > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > > > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > > > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > > > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > > > + 10.62% 0.00% [kernel] [k] kthread
> > > > ...
> > > > <snip>
> > > >
> > > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > > >
> > > > <snip>
> > > > ...
> > > > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > > > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > > > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > > > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > > > + 1.94% 1.91% [kernel] [k] xas_start
> > > > ...
> > > > <snip>
> > > >
> > > > It is _worth_ to check if an index is in possible mask:
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 45e1506d58c3..af20f78c2cbf 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > static struct xarray *
> > > > addr_to_vb_xa(unsigned long addr)
> > > > {
> > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > IIUC, use nr_cpu_ids here maybe incorrect.
> > >
> > > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > > might return 64.
> > >
> > But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> > So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
> >
> > What i miss here?
> >
> Sorry, I forget to reply to all :), I write a demo to test as follows:
>
> static int cpumask_init(void)
> {
> struct cpumask mask;
> unsigned int cpu_id;
> cpumask_clear(&mask);
>
> cpumask_set_cpu(1, &mask);
> cpumask_set_cpu(3, &mask);
> cpumask_set_cpu(5, &mask);
>
> cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
> pr_info("cpu_id:%d\n", cpu_id);
>
> for (; i < nr_cpu_ids; i++) {
> pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
> }
>
> return 0;
> }
>
> [ 1.337020][ T1] cpu_id:6
> [ 1.337338][ T1] 0: cpu_1
> [ 1.337558][ T1] 1: cpu_3
> [ 1.337751][ T1] 2: cpu_5
> [ 1.337960][ T1] 3: cpu_64
> [ 1.338183][ T1] 4: cpu_64
> [ 1.338387][ T1] 5: cpu_64
> [ 1.338594][ T1] 6: cpu_64
>
> In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
I think just using below change for a quick fix is enough. It doesn't
have the issue cpumask_nth() has and very simple. For most of systems,
it only adds an extra cpu_possible(idex) checking.
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 633363997dec..59a8951cc6c0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
- int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+
+ if (!cpu_possible(idex))
+ index = cpumask_next(index, cpu_possible_mask);
return &per_cpu(vmap_block_queue, index).vmap_blocks;
}
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 10:51 ` Baoquan He
@ 2024-06-26 10:53 ` Uladzislau Rezki
2024-06-26 11:30 ` Hailong Liu
1 sibling, 0 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-26 10:53 UTC (permalink / raw)
To: Baoquan He
Cc: Hailong Liu, Uladzislau Rezki, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Wed, Jun 26, 2024 at 06:51:32PM +0800, Baoquan He wrote:
> On 06/26/24 at 06:03pm, Hailong Liu wrote:
> > On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> > > On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > > > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > > > /**
> > > > > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > > > * @srcp: the cpumask pointer
> > > > > > > > > > *
> > > > > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > > > >
> > > > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > > > be possible CPU.
> > > > > > > > >
> > > > > > > > > Do I miss some corner cases?
> > > > > > > > >
> > > > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > > > >
> > > > > > > > Hailong Liu has proposed more simpler version:
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > > > --- a/mm/vmalloc.c
> > > > > > > > +++ b/mm/vmalloc.c
> > > > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > > {
> > > > > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > > > >
> > > > > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > > > >
> > > > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > > > by the nr_cpu_ids.
> > > > > > > >
> > > > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > > > easier to back-port on stable kernels.
> > > > > > >
> > > > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > > > Please feel free to post this and the hash array patch separately for
> > > > > > > formal reviewing.
> > > > > > >
> > > > > > Agreed! The patch about hash array i will post later.
> > > > > >
> > > > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > > > this moment.
> > > > > > >
> > > > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > > > {
> > > > > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > > > }
> > > > > > >
> > > > > > Yep, i do not think it is a big problem based on your noted fact.
> > > > > >
> > > > > Checked. There is a difference:
> > > > >
> > > > > 1. Default
> > > > >
> > > > > <snip>
> > > > > ...
> > > > > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > > > > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > > > > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > > > > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > > > > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > > > > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > > > > + 10.62% 0.00% [kernel] [k] kthread
> > > > > ...
> > > > > <snip>
> > > > >
> > > > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > > > >
> > > > > <snip>
> > > > > ...
> > > > > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > > > > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > > > > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > > > > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > > > > + 1.94% 1.91% [kernel] [k] xas_start
> > > > > ...
> > > > > <snip>
> > > > >
> > > > > It is _worth_ to check if an index is in possible mask:
> > > > >
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 45e1506d58c3..af20f78c2cbf 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > static struct xarray *
> > > > > addr_to_vb_xa(unsigned long addr)
> > > > > {
> > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > IIUC, use nr_cpu_ids here maybe incorrect.
> > > >
> > > > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > > > might return 64.
> > > >
> > > But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> > > So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
> > >
> > > What i miss here?
> > >
> > Sorry, I forget to reply to all :), I write a demo to test as follows:
> >
> > static int cpumask_init(void)
> > {
> > struct cpumask mask;
> > unsigned int cpu_id;
> > cpumask_clear(&mask);
> >
> > cpumask_set_cpu(1, &mask);
> > cpumask_set_cpu(3, &mask);
> > cpumask_set_cpu(5, &mask);
> >
> > cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
> > pr_info("cpu_id:%d\n", cpu_id);
> >
> > for (; i < nr_cpu_ids; i++) {
> > pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
> > }
> >
> > return 0;
> > }
> >
> > [ 1.337020][ T1] cpu_id:6
> > [ 1.337338][ T1] 0: cpu_1
> > [ 1.337558][ T1] 1: cpu_3
> > [ 1.337751][ T1] 2: cpu_5
> > [ 1.337960][ T1] 3: cpu_64
> > [ 1.338183][ T1] 4: cpu_64
> > [ 1.338387][ T1] 5: cpu_64
> > [ 1.338594][ T1] 6: cpu_64
> >
> > In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
>
> I think just using below change for a quick fix is enough. It doesn't
> have the issue cpumask_nth() has and very simple. For most of systems,
> it only adds an extra cpu_possible(idex) checking.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 633363997dec..59a8951cc6c0 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> +
> + if (!cpu_possible(idex))
> + index = cpumask_next(index, cpu_possible_mask);
>
> return &per_cpu(vmap_block_queue, index).vmap_blocks;
> }
>
Just submitted the same :) I will send out the patch.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 10:51 ` Baoquan He
2024-06-26 10:53 ` Uladzislau Rezki
@ 2024-06-26 11:30 ` Hailong Liu
2024-06-26 11:45 ` Uladzislau Rezki
1 sibling, 1 reply; 39+ messages in thread
From: Hailong Liu @ 2024-06-26 11:30 UTC (permalink / raw)
To: Baoquan He, Uladzislau Rezki
Cc: Uladzislau Rezki, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Wed, 26. Jun 18:51, Baoquan He wrote:
> On 06/26/24 at 06:03pm, Hailong Liu wrote:
> > On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> > > On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > > > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > > > /**
> > > > > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > > > * @srcp: the cpumask pointer
> > > > > > > > > > *
> > > > > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > > > >
> > > > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > > > be possible CPU.
> > > > > > > > >
> > > > > > > > > Do I miss some corner cases?
> > > > > > > > >
> > > > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > > > >
> > > > > > > > Hailong Liu has proposed more simpler version:
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > > > --- a/mm/vmalloc.c
> > > > > > > > +++ b/mm/vmalloc.c
> > > > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > > {
> > > > > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > > > >
> > > > > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > > > >
> > > > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > > > by the nr_cpu_ids.
> > > > > > > >
> > > > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > > > easier to back-port on stable kernels.
> > > > > > >
> > > > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > > > Please feel free to post this and the hash array patch separately for
> > > > > > > formal reviewing.
> > > > > > >
> > > > > > Agreed! The patch about hash array i will post later.
G> > > > > >
> > > > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > > > this moment.
> > > > > > >
> > > > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > > > {
> > > > > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > > > }
> > > > > > >
> > > > > > Yep, i do not think it is a big problem based on your noted fact.
> > > > > >
> > > > > Checked. There is a difference:
> > > > >
> > > > > 1. Default
> > > > >
> > > > > <snip>
> > > > > ...
> > > > > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > > > > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > > > > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > > > > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > > > > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > > > > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > > > > + 10.62% 0.00% [kernel] [k] kthread
> > > > > ...
> > > > > <snip>
> > > > >
> > > > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > > > >
> > > > > <snip>
> > > > > ...
> > > > > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > > > > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > > > > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > > > > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > > > > + 1.94% 1.91% [kernel] [k] xas_start
> > > > > ...
> > > > > <snip>
> > > > >
> > > > > It is _worth_ to check if an index is in possible mask:
> > > > >
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index 45e1506d58c3..af20f78c2cbf 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > static struct xarray *
> > > > > addr_to_vb_xa(unsigned long addr)
> > > > > {
> > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > IIUC, use nr_cpu_ids here maybe incorrect.
> > > >
> > > > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > > > might return 64.
> > > >
> > > But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> > > So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
> > >
> > > What i miss here?
> > >
> > Sorry, I forget to reply to all :), I write a demo to test as follows:
> >
> > static int cpumask_init(void)
> > {
> > struct cpumask mask;
> > unsigned int cpu_id;
> > cpumask_clear(&mask);
> >
> > cpumask_set_cpu(1, &mask);
> > cpumask_set_cpu(3, &mask);
> > cpumask_set_cpu(5, &mask);
> >
> > cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
> > pr_info("cpu_id:%d\n", cpu_id);
> >
> > for (; i < nr_cpu_ids; i++) {
> > pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
> > }
> >
> > return 0;
> > }
> >
> > [ 1.337020][ T1] cpu_id:6
> > [ 1.337338][ T1] 0: cpu_1
> > [ 1.337558][ T1] 1: cpu_3
> > [ 1.337751][ T1] 2: cpu_5
> > [ 1.337960][ T1] 3: cpu_64
> > [ 1.338183][ T1] 4: cpu_64
> > [ 1.338387][ T1] 5: cpu_64
> > [ 1.338594][ T1] 6: cpu_64
> >
> > In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
>
> I think just using below change for a quick fix is enough. It doesn't
> have the issue cpumask_nth() has and very simple. For most of systems,
> it only adds an extra cpu_possible(idex) checking.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 633363997dec..59a8951cc6c0 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> +
> + if (!cpu_possible(idex))
> + index = cpumask_next(index, cpu_possible_mask);
>
> return &per_cpu(vmap_block_queue, index).vmap_blocks;
> }
>
Agreed! This is a very simple solution.
If cpumask is b1000001, addresses being distributed across different
CPUs could theoretically lead to such a situation, but it has not been
encountered in practice. I’m just pointing out the possibility here.
CPU_0 CPU_6 CPU_6 CPU_6 CPU_6 CPU_6
| | | | | |
V V V V V V
0 10 20 30 40 50 60
|------|------|------|------|------|------|..
Thanks again for your reply, I learned a lot.
--
help you, help me,
Hailong.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 11:30 ` Hailong Liu
@ 2024-06-26 11:45 ` Uladzislau Rezki
0 siblings, 0 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-26 11:45 UTC (permalink / raw)
To: Hailong Liu
Cc: Baoquan He, Uladzislau Rezki, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Wed, Jun 26, 2024 at 07:30:39PM +0800, Hailong Liu wrote:
> On Wed, 26. Jun 18:51, Baoquan He wrote:
> > On 06/26/24 at 06:03pm, Hailong Liu wrote:
> > > On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> > > > On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > > > > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > > > > /**
> > > > > > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > > > > * @srcp: the cpumask pointer
> > > > > > > > > > > *
> > > > > > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > > > > >
> > > > > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > > > > be possible CPU.
> > > > > > > > > >
> > > > > > > > > > Do I miss some corner cases?
> > > > > > > > > >
> > > > > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > > > > >
> > > > > > > > > Hailong Liu has proposed more simpler version:
> > > > > > > > >
> > > > > > > > > <snip>
> > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > > > > --- a/mm/vmalloc.c
> > > > > > > > > +++ b/mm/vmalloc.c
> > > > > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > > > {
> > > > > > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > > > > >
> > > > > > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > > > > <snip>
> > > > > > > > >
> > > > > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > > > > >
> > > > > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > > > > by the nr_cpu_ids.
> > > > > > > > >
> > > > > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > > > > easier to back-port on stable kernels.
> > > > > > > >
> > > > > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > > > > Please feel free to post this and the hash array patch separately for
> > > > > > > > formal reviewing.
> > > > > > > >
> > > > > > > Agreed! The patch about hash array i will post later.
> G> > > > > >
> > > > > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > > > > this moment.
> > > > > > > >
> > > > > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > > > > {
> > > > > > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > > > > }
> > > > > > > >
> > > > > > > Yep, i do not think it is a big problem based on your noted fact.
> > > > > > >
> > > > > > Checked. There is a difference:
> > > > > >
> > > > > > 1. Default
> > > > > >
> > > > > > <snip>
> > > > > > ...
> > > > > > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > > > > > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > > > > > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > > > > > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > > > > > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > > > > > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > > > > > + 10.62% 0.00% [kernel] [k] kthread
> > > > > > ...
> > > > > > <snip>
> > > > > >
> > > > > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > > > > >
> > > > > > <snip>
> > > > > > ...
> > > > > > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > > > > > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > > > > > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > > > > > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > > > > > + 1.94% 1.91% [kernel] [k] xas_start
> > > > > > ...
> > > > > > <snip>
> > > > > >
> > > > > > It is _worth_ to check if an index is in possible mask:
> > > > > >
> > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > index 45e1506d58c3..af20f78c2cbf 100644
> > > > > > --- a/mm/vmalloc.c
> > > > > > +++ b/mm/vmalloc.c
> > > > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > > static struct xarray *
> > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > {
> > > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > > IIUC, use nr_cpu_ids here maybe incorrect.
> > > > >
> > > > > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > > > > might return 64.
> > > > >
> > > > But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> > > > So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
> > > >
> > > > What i miss here?
> > > >
> > > Sorry, I forget to reply to all :), I write a demo to test as follows:
> > >
> > > static int cpumask_init(void)
> > > {
> > > struct cpumask mask;
> > > unsigned int cpu_id;
> > > cpumask_clear(&mask);
> > >
> > > cpumask_set_cpu(1, &mask);
> > > cpumask_set_cpu(3, &mask);
> > > cpumask_set_cpu(5, &mask);
> > >
> > > cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
> > > pr_info("cpu_id:%d\n", cpu_id);
> > >
> > > for (; i < nr_cpu_ids; i++) {
> > > pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
> > > }
> > >
> > > return 0;
> > > }
> > >
> > > [ 1.337020][ T1] cpu_id:6
> > > [ 1.337338][ T1] 0: cpu_1
> > > [ 1.337558][ T1] 1: cpu_3
> > > [ 1.337751][ T1] 2: cpu_5
> > > [ 1.337960][ T1] 3: cpu_64
> > > [ 1.338183][ T1] 4: cpu_64
> > > [ 1.338387][ T1] 5: cpu_64
> > > [ 1.338594][ T1] 6: cpu_64
> > >
> > > In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
> >
> > I think just using below change for a quick fix is enough. It doesn't
> > have the issue cpumask_nth() has and very simple. For most of systems,
> > it only adds an extra cpu_possible(idex) checking.
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 633363997dec..59a8951cc6c0 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > static struct xarray *
> > addr_to_vb_xa(unsigned long addr)
> > {
> > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > +
> > + if (!cpu_possible(idex))
> > + index = cpumask_next(index, cpu_possible_mask);
> >
> > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > }
> >
> Agreed! This is a very simple solution.
>
> If cpumask is b1000001, addresses being distributed across different
> CPUs could theoretically lead to such a situation, but it has not been
> encountered in practice. I’m just pointing out the possibility here.
>
> CPU_0 CPU_6 CPU_6 CPU_6 CPU_6 CPU_6
> | | | | | |
> V V V V V V
> 0 10 20 30 40 50 60
> |------|------|------|------|------|------|..
>
Right!
>
> Thanks again for your reply, I learned a lot.
>
Thank you for helping :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 10:03 ` Hailong Liu
2024-06-26 10:51 ` Baoquan He
@ 2024-06-26 10:51 ` Uladzislau Rezki
2024-06-26 13:34 ` Nick Bowler
1 sibling, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-26 10:51 UTC (permalink / raw)
To: Hailong Liu
Cc: Uladzislau Rezki, Baoquan He, Nick Bowler, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Wed, Jun 26, 2024 at 06:03:42PM +0800, Hailong Liu wrote:
> On Wed, 26. Jun 11:15, Uladzislau Rezki wrote:
> > On Wed, Jun 26, 2024 at 01:12:06PM +0800, Hailong Liu wrote:
> > > On Tue, 25. Jun 22:05, Uladzislau Rezki wrote:
> > > > > > > > > /**
> > > > > > > > > * cpumask_next - get the next cpu in a cpumask
> > > > > > > > > * @n: the cpu prior to the place to search (i.e. return will be > @n)
> > > > > > > > > * @srcp: the cpumask pointer
> > > > > > > > > *
> > > > > > > > > * Return: >= nr_cpu_ids if no further cpus set.
> > > > > > > >
> > > > > > > > Ah, I got what you mean. In the vbq case, it may not have chance to get
> > > > > > > > a return number as nr_cpu_ids. Becuase the hashed index limits the
> > > > > > > > range to [0, nr_cpu_ids-1], and cpu_possible(index) will guarantee it
> > > > > > > > won't be the highest cpu number [nr_cpu_ids-1] since CPU[nr_cpu_ids-1] must
> > > > > > > > be possible CPU.
> > > > > > > >
> > > > > > > > Do I miss some corner cases?
> > > > > > > >
> > > > > > > Right. We guarantee that a highest CPU is available by doing: % nr_cpu_ids.
> > > > > > > So we do not need to use *next_wrap() variant. You do not miss anything :)
> > > > > > >
> > > > > > > Hailong Liu has proposed more simpler version:
> > > > > > >
> > > > > > > <snip>
> > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > > > index 11fe5ea208aa..e1e63ffb9c57 100644
> > > > > > > --- a/mm/vmalloc.c
> > > > > > > +++ b/mm/vmalloc.c
> > > > > > > @@ -1994,8 +1994,9 @@ static struct xarray *
> > > > > > > addr_to_vb_xa(unsigned long addr)
> > > > > > > {
> > > > > > > int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > > > + int cpu = cpumask_nth(index, cpu_possible_mask);
> > > > > > >
> > > > > > > - return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > > > + return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
> > > > > > > <snip>
> > > > > > >
> > > > > > > which just takes a next CPU if an index is not set in the cpu_possible_mask.
> > > > > > >
> > > > > > > The only thing that can be updated in the patch is to replace num_possible_cpu()
> > > > > > > by the nr_cpu_ids.
> > > > > > >
> > > > > > > Any thoughts? I think we need to fix it by a minor change so it is
> > > > > > > easier to back-port on stable kernels.
> > > > > >
> > > > > > Yeah, sounds good since the regresson commit is merged in v6.3.
> > > > > > Please feel free to post this and the hash array patch separately for
> > > > > > formal reviewing.
> > > > > >
> > > > > Agreed! The patch about hash array i will post later.
> > > > >
> > > > > > By the way, when I am replying this mail, I check the cpumask_nth()
> > > > > > again. I doubt it may take more checking then cpu_possible(), given most
> > > > > > of systems don't have gaps in cpu_possible_mask. I could be dizzy at
> > > > > > this moment.
> > > > > >
> > > > > > static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp)
> > > > > > {
> > > > > > return find_nth_bit(cpumask_bits(srcp), small_cpumask_bits, cpumask_check(cpu));
> > > > > > }
> > > > > >
> > > > > Yep, i do not think it is a big problem based on your noted fact.
> > > > >
> > > > Checked. There is a difference:
> > > >
> > > > 1. Default
> > > >
> > > > <snip>
> > > > ...
> > > > + 15.95% 6.05% [kernel] [k] __vmap_pages_range_noflush
> > > > + 15.91% 1.74% [kernel] [k] addr_to_vb_xa <---------------
> > > > + 15.13% 12.05% [kernel] [k] vunmap_p4d_range
> > > > + 14.17% 13.38% [kernel] [k] __find_nth_bit <--------------
> > > > + 10.62% 0.00% [kernel] [k] ret_from_fork_asm
> > > > + 10.62% 0.00% [kernel] [k] ret_from_fork
> > > > + 10.62% 0.00% [kernel] [k] kthread
> > > > ...
> > > > <snip>
> > > >
> > > > 2. Check if cpu_possible() and then fallback to cpumask_nth() if not
> > > >
> > > > <snip>
> > > > ...
> > > > + 6.84% 0.29% [kernel] [k] alloc_vmap_area
> > > > + 6.80% 6.70% [kernel] [k] native_queued_spin_lock_slowpath
> > > > + 4.24% 0.09% [kernel] [k] free_vmap_block
> > > > + 2.41% 2.38% [kernel] [k] addr_to_vb_xa <-----------
> > > > + 1.94% 1.91% [kernel] [k] xas_start
> > > > ...
> > > > <snip>
> > > >
> > > > It is _worth_ to check if an index is in possible mask:
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 45e1506d58c3..af20f78c2cbf 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > static struct xarray *
> > > > addr_to_vb_xa(unsigned long addr)
> > > > {
> > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > IIUC, use nr_cpu_ids here maybe incorrect.
> > >
> > > take b101 as example, nr_cpu_ids is 3. if index is 2 cpumask_nth(2, cpu_possible_mask);
> > > might return 64.
> > >
> > But then a CPU2 becomes possible? Cutting by % nr_cpu_ids generates values < nr_cpu_ids.
> > So, last CPU is always possible and we never do cpumask_nth() on a last possible CPU.
> >
> > What i miss here?
> >
> Sorry, I forget to reply to all :), I write a demo to test as follows:
>
> static int cpumask_init(void)
> {
> struct cpumask mask;
> unsigned int cpu_id;
> cpumask_clear(&mask);
>
> cpumask_set_cpu(1, &mask);
> cpumask_set_cpu(3, &mask);
> cpumask_set_cpu(5, &mask);
>
> cpu_id = find_last_bit(cpumask_bits(&mask), NR_CPUS) + 1;
> pr_info("cpu_id:%d\n", cpu_id);
>
> for (; i < nr_cpu_ids; i++) {
> pr_info("%d: cpu_%d\n", i, cpumask_nth(i, &mask));
> }
>
> return 0;
> }
>
> [ 1.337020][ T1] cpu_id:6
> [ 1.337338][ T1] 0: cpu_1
> [ 1.337558][ T1] 1: cpu_3
> [ 1.337751][ T1] 2: cpu_5
> [ 1.337960][ T1] 3: cpu_64
> [ 1.338183][ T1] 4: cpu_64
> [ 1.338387][ T1] 5: cpu_64
> [ 1.338594][ T1] 6: cpu_64
>
> In summary, the nr_cpu_ids = last_bit + 1, and cpumask_nth() return the nth cpu_id.
>
OK, i misread the cpumask_nth(). We should go with *_next() variant instead.
Thank you for pointing this. Below is updated version with extra comment:
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 45e1506d58c3..03b82fb8ecd3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2542,7 +2542,15 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
- int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
+
+ /*
+ * Please note, nr_cpu_ids points on a highest set
+ * possible bit, i.e. we never invoke cpumask_next()
+ * if an index points on it which is nr_cpu_ids - 1.
+ */
+ if (!cpu_possible(index))
+ index = cpumask_next(index, cpu_possible_mask);
return &per_cpu(vmap_block_queue, index).vmap_blocks;
}
<snip>
Thanks!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 10:51 ` Uladzislau Rezki
@ 2024-06-26 13:34 ` Nick Bowler
2024-06-26 13:38 ` Uladzislau Rezki
0 siblings, 1 reply; 39+ messages in thread
From: Nick Bowler @ 2024-06-26 13:34 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Baoquan He, linux-kernel, Linux regressions mailing list,
linux-mm, sparclinux, Andrew Morton, Hailong Liu
On 2024-06-26 06:51, Uladzislau Rezki wrote:
[...]
> Thank you for pointing this. Below is updated version with extra comment:
I checked that I can still reproduce the problem on 6.10-rc5 and with
this patch applied on top, xfsdump doesn't crash anymore.
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 45e1506d58c3..03b82fb8ecd3 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2542,7 +2542,15 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> +
> + /*
> + * Please note, nr_cpu_ids points on a highest set
> + * possible bit, i.e. we never invoke cpumask_next()
> + * if an index points on it which is nr_cpu_ids - 1.
> + */
> + if (!cpu_possible(index))
> + index = cpumask_next(index, cpu_possible_mask);
>
> return &per_cpu(vmap_block_queue, index).vmap_blocks;
> }
Thanks,
Nick
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-26 13:34 ` Nick Bowler
@ 2024-06-26 13:38 ` Uladzislau Rezki
0 siblings, 0 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-26 13:38 UTC (permalink / raw)
To: Nick Bowler
Cc: Uladzislau Rezki, Baoquan He, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton, Hailong Liu
> On 2024-06-26 06:51, Uladzislau Rezki wrote:
> [...]
> > Thank you for pointing this. Below is updated version with extra comment:
>
> I checked that I can still reproduce the problem on 6.10-rc5 and with
> this patch applied on top, xfsdump doesn't crash anymore.
>
Thank you for helping and testing. I appreciate it!
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 3:30 ` Baoquan He
2024-06-25 10:32 ` Uladzislau Rezki
@ 2024-06-25 11:19 ` Hailong Liu
2024-06-25 12:41 ` Uladzislau Rezki
2 siblings, 0 replies; 39+ messages in thread
From: Hailong Liu @ 2024-06-25 11:19 UTC (permalink / raw)
To: Baoquan He, Uladzislau Rezki
Cc: Nick Bowler, linux-kernel, Linux regressions mailing list,
linux-mm, sparclinux, Andrew Morton
On Tue, 25. Jun 11:30, Baoquan He wrote:
> On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > ......
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > static struct xarray *
> > > > > addr_to_vb_xa(unsigned long addr)
> > > > > {
> > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > >
> > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > }
> > > > >
> > > > The problem i see is about not-initializing of the:
> > > > <snip>
> > > > for_each_possible_cpu(i) {
> > > > struct vmap_block_queue *vbq;
> > > > struct vfree_deferred *p;
> > > >
> > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > spin_lock_init(&vbq->lock);
> > > > INIT_LIST_HEAD(&vbq->free);
> > > > p = &per_cpu(vfree_deferred, i);
> > > > init_llist_head(&p->list);
> > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > xa_init(&vbq->vmap_blocks);
> > > > }
> > > > <snip>
> > > >
> > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > can be non sequential :-/
> > > >
> > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > >
> > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > are different with my understanding.
> > >
> > > kernel/smp.c
> > > void __init setup_nr_cpu_ids(void)
> > > {
> > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > }
> > >
> > I see that it is not a weak function, so it is generic, thus the
> > behavior can not be overwritten, which is great. This does what we
> > need.
> >
> > Thank you for checking this you are right!
>
> Thanks for confirming this.
>
> >
> > Then it is just a matter of proper initialization of the hash:
> >
> > <snip>
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 5d3aa2dc88a8..1733946f7a12 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > */
> > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> >
> > - for_each_possible_cpu(i) {
> > + /*
> > + * We use "nr_cpu_ids" here because some architectures
> > + * may have "gaps" in cpu-possible-mask. It is OK for
> > + * per-cpu approaches but is not OK for cases where it
> > + * can be used as hashes also.
> > + */
> > + for (i = 0; i < nr_cpu_ids; i++) {
>
> I was wrong about earlier comments. Percpu variables are only available
> on possible CPUs. For those nonexistent possible CPUs of static percpu
> variable vmap_block_queue, there isn't memory allocated and mapped for
> them. So accessing into them will cause problem.
Thansk for pointing out. Indeed, I have very limited knowledge about
CPU-related :)
>
> In Nick's case, there are only CPU0, CPU2. If you access
> &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> change to take other way for vbq. E.g:
> 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> below draft patch;
> 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> store/fetch each vbq on non-possible CPU?
>
> The way 1) is simpler, the existing code can be adapted a little just as
> below.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 633363997dec..59a8951cc6c0 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2542,7 +2542,10 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> static struct xarray *
> addr_to_vb_xa(unsigned long addr)
> {
> - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> +
> + if (!cpu_possible(idex))
> + index = cpumask_next(index, cpu_possible_mask);
If cpumask is b1001, more addr would located in cpu3.
>
> return &per_cpu(vmap_block_queue, index).vmap_blocks;
> }
> @@ -2556,9 +2559,15 @@ addr_to_vb_xa(unsigned long addr)
>
> static unsigned long addr_to_vb_idx(unsigned long addr)
> {
> + int id = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> + int id_dest = id;
> +
> + if (!cpu_possible(id))
> + id_dest = cpumask_next(id, cpu_possible_mask);
> +
> addr -= VMALLOC_START & ~(VMAP_BLOCK_SIZE-1);
> addr /= VMAP_BLOCK_SIZE;
> - return addr;
> + return addr + (id_dest - id);
> }
How about using cpumask_nth to get cpu id ? It looks simpler and more
straightforward, it may waste some cputimes :)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 11fe5ea208aa..e1e63ffb9c57 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1994,8 +1994,9 @@ static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int cpu = cpumask_nth(index, cpu_possible_mask);
- return &per_cpu(vmap_block_queue, index).vmap_blocks;
+ return &per_cpu(vmap_block_queue, cpu).vmap_blocks;
}
>
--
help you, help me,
Hailong.
^ permalink raw reply [flat|nested] 39+ messages in thread* Re: PROBLEM: kernel crashes when running xfsdump since ~6.4
2024-06-25 3:30 ` Baoquan He
2024-06-25 10:32 ` Uladzislau Rezki
2024-06-25 11:19 ` Hailong Liu
@ 2024-06-25 12:41 ` Uladzislau Rezki
2 siblings, 0 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2024-06-25 12:41 UTC (permalink / raw)
To: Baoquan He
Cc: Uladzislau Rezki, Nick Bowler, Hailong Liu, linux-kernel,
Linux regressions mailing list, linux-mm, sparclinux,
Andrew Morton
On Tue, Jun 25, 2024 at 11:30:33AM +0800, Baoquan He wrote:
> On 06/24/24 at 02:16pm, Uladzislau Rezki wrote:
> > On Fri, Jun 21, 2024 at 10:02:50PM +0800, Baoquan He wrote:
> > > On 06/21/24 at 11:44am, Uladzislau Rezki wrote:
> > > > On Fri, Jun 21, 2024 at 03:07:16PM +0800, Baoquan He wrote:
> > > > > On 06/21/24 at 11:30am, Hailong Liu wrote:
> > > > > > On Thu, 20. Jun 14:02, Nick Bowler wrote:
> > > > > > > On 2024-06-20 02:19, Nick Bowler wrote:
> > > ......
> > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > > index be2dd281ea76..18e87cafbaf2 100644
> > > > > --- a/mm/vmalloc.c
> > > > > +++ b/mm/vmalloc.c
> > > > > @@ -2542,7 +2542,7 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
> > > > > static struct xarray *
> > > > > addr_to_vb_xa(unsigned long addr)
> > > > > {
> > > > > - int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
> > > > > + int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
> > > > >
> > > > > return &per_cpu(vmap_block_queue, index).vmap_blocks;
> > > > > }
> > > > >
> > > > The problem i see is about not-initializing of the:
> > > > <snip>
> > > > for_each_possible_cpu(i) {
> > > > struct vmap_block_queue *vbq;
> > > > struct vfree_deferred *p;
> > > >
> > > > vbq = &per_cpu(vmap_block_queue, i);
> > > > spin_lock_init(&vbq->lock);
> > > > INIT_LIST_HEAD(&vbq->free);
> > > > p = &per_cpu(vfree_deferred, i);
> > > > init_llist_head(&p->list);
> > > > INIT_WORK(&p->wq, delayed_vfree_work);
> > > > xa_init(&vbq->vmap_blocks);
> > > > }
> > > > <snip>
> > > >
> > > > correctly or fully. It is my bad i did not think that CPUs in a possible mask
> > > > can be non sequential :-/
> > > >
> > > > nr_cpu_ids - is not the max possible CPU. For example, in Nick case,
> > > > when he has two CPUs, num_possible_cpus() and nr_cpu_ids are the same.
> > >
> > > I checked the generic version of setup_nr_cpu_ids(), from codes, they
> > > are different with my understanding.
> > >
> > > kernel/smp.c
> > > void __init setup_nr_cpu_ids(void)
> > > {
> > > set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + 1);
> > > }
> > >
> > I see that it is not a weak function, so it is generic, thus the
> > behavior can not be overwritten, which is great. This does what we
> > need.
> >
> > Thank you for checking this you are right!
>
> Thanks for confirming this.
>
> >
> > Then it is just a matter of proper initialization of the hash:
> >
> > <snip>
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 5d3aa2dc88a8..1733946f7a12 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -5087,7 +5087,13 @@ void __init vmalloc_init(void)
> > */
> > vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
> >
> > - for_each_possible_cpu(i) {
> > + /*
> > + * We use "nr_cpu_ids" here because some architectures
> > + * may have "gaps" in cpu-possible-mask. It is OK for
> > + * per-cpu approaches but is not OK for cases where it
> > + * can be used as hashes also.
> > + */
> > + for (i = 0; i < nr_cpu_ids; i++) {
>
> I was wrong about earlier comments. Percpu variables are only available
> on possible CPUs. For those nonexistent possible CPUs of static percpu
> variable vmap_block_queue, there isn't memory allocated and mapped for
> them. So accessing into them will cause problem.
>
> In Nick's case, there are only CPU0, CPU2. If you access
> &per_cpu(vmap_block_queue, 1), problem occurs. So I think we may need to
> change to take other way for vbq. E.g:
> 1) Storing the vb in the nearest neighbouring vbq on possible CPU as
> below draft patch;
> 2) create an normal array to store vbq of size nr_cpu_ids, then we can
> store/fetch each vbq on non-possible CPU?
>
See below how the patch look like if we switch to hash array:
<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 45e1506d58c3..a8bcd9ceec2d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2501,7 +2501,8 @@ struct vmap_block {
};
/* Queue of free and dirty vmap blocks, for allocation and flushing purposes */
-static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
+static struct vmap_block_queue *vmap_block_queue;
+static bool vmap_block_queue_initialized;
/*
* In order to fast access to any "vmap_block" associated with a
@@ -2542,9 +2543,9 @@ static DEFINE_PER_CPU(struct vmap_block_queue, vmap_block_queue);
static struct xarray *
addr_to_vb_xa(unsigned long addr)
{
- int index = (addr / VMAP_BLOCK_SIZE) % num_possible_cpus();
+ int index = (addr / VMAP_BLOCK_SIZE) % nr_cpu_ids;
- return &per_cpu(vmap_block_queue, index).vmap_blocks;
+ return &vmap_block_queue[index].vmap_blocks;
}
/*
@@ -2626,7 +2627,7 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
return ERR_PTR(err);
}
- vbq = raw_cpu_ptr(&vmap_block_queue);
+ vbq = &vmap_block_queue[raw_smp_processor_id()];
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
spin_unlock(&vbq->lock);
@@ -2657,6 +2658,9 @@ static bool purge_fragmented_block(struct vmap_block *vb,
struct vmap_block_queue *vbq, struct list_head *purge_list,
bool force_purge)
{
+ if (!vmap_block_queue_initialized)
+ return false;
+
if (vb->free + vb->dirty != VMAP_BBMAP_BITS ||
vb->dirty == VMAP_BBMAP_BITS)
return false;
@@ -2692,7 +2696,12 @@ static void purge_fragmented_blocks(int cpu)
{
LIST_HEAD(purge);
struct vmap_block *vb;
- struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
+ struct vmap_block_queue *vbq;
+
+ if (!vmap_block_queue_initialized)
+ return;
+
+ vbq = &vmap_block_queue[cpu];
rcu_read_lock();
list_for_each_entry_rcu(vb, &vbq->free, free_list) {
@@ -2715,7 +2724,7 @@ static void purge_fragmented_blocks_allcpus(void)
{
int cpu;
- for_each_possible_cpu(cpu)
+ for (cpu = 0; cpu < nr_cpu_ids; cpu++)
purge_fragmented_blocks(cpu);
}
@@ -2739,7 +2748,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
order = get_order(size);
rcu_read_lock();
- vbq = raw_cpu_ptr(&vmap_block_queue);
+ vbq = &vmap_block_queue[raw_smp_processor_id()];
list_for_each_entry_rcu(vb, &vbq->free, free_list) {
unsigned long pages_off;
@@ -2822,13 +2831,13 @@ static void _vm_unmap_aliases(unsigned long start, unsigned long end, int flush)
LIST_HEAD(purge_list);
int cpu;
- if (unlikely(!vmap_initialized))
+ if (unlikely(!vmap_initialized || !vmap_block_queue_initialized))
return;
mutex_lock(&vmap_purge_lock);
- for_each_possible_cpu(cpu) {
- struct vmap_block_queue *vbq = &per_cpu(vmap_block_queue, cpu);
+ for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+ struct vmap_block_queue *vbq = &vmap_block_queue[cpu];
struct vmap_block *vb;
unsigned long idx;
@@ -2910,7 +2919,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
kasan_poison_vmalloc(mem, size);
- if (likely(count <= VMAP_MAX_ALLOC)) {
+ if (likely(count <= VMAP_MAX_ALLOC) && vmap_block_queue_initialized) {
debug_check_no_locks_freed(mem, size);
vb_free(addr, size);
return;
@@ -2946,7 +2955,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)
unsigned long addr;
void *mem;
- if (likely(count <= VMAP_MAX_ALLOC)) {
+ if (likely(count <= VMAP_MAX_ALLOC && vmap_block_queue_initialized)) {
mem = vb_alloc(size, GFP_KERNEL);
if (IS_ERR(mem))
return NULL;
@@ -5087,17 +5096,28 @@ void __init vmalloc_init(void)
*/
vmap_area_cachep = KMEM_CACHE(vmap_area, SLAB_PANIC);
+ vmap_block_queue = kmalloc_array(
+ nr_cpu_ids, sizeof(struct vmap_block_queue), GFP_NOWAIT);
+
+ if (vmap_block_queue) {
+ for (i = 0; i < nr_cpu_ids; i++) {
+ struct vmap_block_queue *vbq =
+ &vmap_block_queue[i];
+
+ spin_lock_init(&vbq->lock);
+ INIT_LIST_HEAD(&vbq->free);
+ xa_init(&vbq->vmap_blocks);
+ }
+ } else {
+ pr_err("Failed to allocate vmap_block_queue array, use fallback path!\n");
+ }
+
for_each_possible_cpu(i) {
- struct vmap_block_queue *vbq;
- struct vfree_deferred *p;
+ struct vfree_deferred *p =
+ &per_cpu(vfree_deferred, i);
- vbq = &per_cpu(vmap_block_queue, i);
- spin_lock_init(&vbq->lock);
- INIT_LIST_HEAD(&vbq->free);
- p = &per_cpu(vfree_deferred, i);
init_llist_head(&p->list);
INIT_WORK(&p->wq, delayed_vfree_work);
- xa_init(&vbq->vmap_blocks);
}
/*
@@ -5125,6 +5145,9 @@ void __init vmalloc_init(void)
vmap_init_free_space();
vmap_initialized = true;
+ if (vmap_block_queue)
+ vmap_block_queue_initialized = true;
+
vmap_node_shrinker = shrinker_alloc(0, "vmap-node");
if (!vmap_node_shrinker) {
pr_err("Failed to allocate vmap-node shrinker!\n");
<snip>
Any thoughts?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 39+ messages in thread