[PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
@ 2024-12-21  3:33 Koichiro Den
  2025-01-03 23:33 ` Lorenzo Stoakes
  0 siblings, 1 reply; 22+ messages in thread
From: Koichiro Den @ 2024-12-21  3:33 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, linux-kernel

Even after mm/vmstat:online teardown, shepherd may still queue work for
the dying cpu until the cpu is removed from online mask. While it's
quite rare, this means that after unbind_workers() unbinds a per-cpu
kworker, it potentially runs vmstat_update for the dying CPU on an
irrelevant cpu before entering atomic AP states.
When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
backtrace.

  BUG: using smp_processor_id() in preemptible [00000000] code: \
                                               kworker/7:3/1702
  caller is refresh_cpu_vm_stats+0x235/0x5f0
  CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
  Tainted: [N]=TEST
  Workqueue: mm_percpu_wq vmstat_update
  Call Trace:
   <TASK>
   dump_stack_lvl+0x8d/0xb0
   check_preemption_disabled+0xce/0xe0
   refresh_cpu_vm_stats+0x235/0x5f0
   vmstat_update+0x17/0xa0
   process_one_work+0x869/0x1aa0
   worker_thread+0x5e5/0x1100
   kthread+0x29e/0x380
   ret_from_fork+0x2d/0x70
   ret_from_fork_asm+0x1a/0x30
   </TASK>

So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
symmetrically enable it on startup.

Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
---
 mm/vmstat.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4d016314a56c..0889b75cef14 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
 	if (!node_state(cpu_to_node(cpu), N_CPU)) {
 		node_set_state(cpu_to_node(cpu), N_CPU);
 	}
+	enable_delayed_work(&per_cpu(vmstat_work, cpu));
 
 	return 0;
 }
 
 static int vmstat_cpu_down_prep(unsigned int cpu)
 {
-	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
 	return 0;
 }
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2024-12-21  3:33 [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() Koichiro Den
@ 2025-01-03 23:33 ` Lorenzo Stoakes
  2025-01-04  4:00   ` Koichiro Den
  0 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-01-03 23:33 UTC (permalink / raw)
  To: Koichiro Den; +Cc: linux-mm, akpm, linux-kernel

On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> Even after mm/vmstat:online teardown, shepherd may still queue work for
> the dying cpu until the cpu is removed from online mask. While it's
> quite rare, this means that after unbind_workers() unbinds a per-cpu
> kworker, it potentially runs vmstat_update for the dying CPU on an
> irrelevant cpu before entering atomic AP states.
> When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> backtrace.
>
>   BUG: using smp_processor_id() in preemptible [00000000] code: \
>                                                kworker/7:3/1702
>   caller is refresh_cpu_vm_stats+0x235/0x5f0
>   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
>   Tainted: [N]=TEST
>   Workqueue: mm_percpu_wq vmstat_update
>   Call Trace:
>    <TASK>
>    dump_stack_lvl+0x8d/0xb0
>    check_preemption_disabled+0xce/0xe0
>    refresh_cpu_vm_stats+0x235/0x5f0
>    vmstat_update+0x17/0xa0
>    process_one_work+0x869/0x1aa0
>    worker_thread+0x5e5/0x1100
>    kthread+0x29e/0x380
>    ret_from_fork+0x2d/0x70
>    ret_from_fork_asm+0x1a/0x30
>    </TASK>
>
> So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> symmetrically enable it on startup.
>
> Signed-off-by: Koichiro Den <koichiro.den@canonical.com>

Hi,

I observed a warning in my qemu and real hardware, which I bisected to this commit:

[    0.087733] ------------[ cut here ]------------
[    0.087733] workqueue: work disable count underflowed
[    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0

This is:

static void work_offqd_enable(struct work_offq_data *offqd)
{
	if (likely(offqd->disable > 0))
		offqd->disable--;
	else
		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
}

So (based on this code) presumably an enable is only required if previously
disabled, and this code is being called on startup unconditionally without
the work having been disabled previously? I'm not hugely familiar with
delayed workqueue implementation details.

[    0.087733] Modules linked in:
[    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
[    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
[    0.087733] RIP: 0010:enable_work+0xb5/0xc0
[    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
[    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
[    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
[    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
[    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
[    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
[    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
[    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
[    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
[    0.087733] PKRU: 55555554
[    0.087733] Call Trace:
[    0.087733]  <TASK>
[    0.087733]  ? enable_work+0xb5/0xc0
[    0.087733]  ? __warn.cold+0x93/0xf2
[    0.087733]  ? enable_work+0xb5/0xc0
[    0.087733]  ? report_bug+0xff/0x140
[    0.087733]  ? handle_bug+0x54/0x90
[    0.087733]  ? exc_invalid_op+0x17/0x70
[    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
[    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
[    0.087733]  ? enable_work+0xb5/0xc0
[    0.087733]  vmstat_cpu_online+0x5c/0x70
[    0.087733]  cpuhp_invoke_callback+0x133/0x440
[    0.087733]  cpuhp_thread_fun+0x95/0x150
[    0.087733]  smpboot_thread_fn+0xd5/0x1d0
[    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
[    0.087735]  kthread+0xc8/0xf0
[    0.087737]  ? __pfx_kthread+0x10/0x10
[    0.087738]  ret_from_fork+0x2c/0x50
[    0.087739]  ? __pfx_kthread+0x10/0x10
[    0.087740]  ret_from_fork_asm+0x1a/0x30
[    0.087742]  </TASK>
[    0.087742] ---[ end trace 0000000000000000 ]---


> ---
> v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
> ---
>  mm/vmstat.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 4d016314a56c..0889b75cef14 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
>  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
>  		node_set_state(cpu_to_node(cpu), N_CPU);
>  	}
> +	enable_delayed_work(&per_cpu(vmstat_work, cpu));

Probably needs to be 'if disabled' here, as this is invoked on normal
startup when the work won't have been disabled?

Had a brief look at code and couldn't see how that could be done
however... and one would need to be careful about races... Tricky!

>
>  	return 0;
>  }
>
>  static int vmstat_cpu_down_prep(unsigned int cpu)
>  {
> -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>  	return 0;
>  }
>
> --
> 2.43.0
>
>

Let me know if you need any more details, .config etc.

I noticed this warning on a real box too (in both cases running akpm's
mm-unstable branch), FWIW.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-03 23:33 ` Lorenzo Stoakes
@ 2025-01-04  4:00   ` Koichiro Den
  2025-01-06  2:18     ` Charalampos Mitrodimas
                       ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Koichiro Den @ 2025-01-04  4:00 UTC (permalink / raw)
  To: Lorenzo Stoakes; +Cc: linux-mm, akpm, linux-kernel

On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> > Even after mm/vmstat:online teardown, shepherd may still queue work for
> > the dying cpu until the cpu is removed from online mask. While it's
> > quite rare, this means that after unbind_workers() unbinds a per-cpu
> > kworker, it potentially runs vmstat_update for the dying CPU on an
> > irrelevant cpu before entering atomic AP states.
> > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> > backtrace.
> >
> >   BUG: using smp_processor_id() in preemptible [00000000] code: \
> >                                                kworker/7:3/1702
> >   caller is refresh_cpu_vm_stats+0x235/0x5f0
> >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
> >   Tainted: [N]=TEST
> >   Workqueue: mm_percpu_wq vmstat_update
> >   Call Trace:
> >    <TASK>
> >    dump_stack_lvl+0x8d/0xb0
> >    check_preemption_disabled+0xce/0xe0
> >    refresh_cpu_vm_stats+0x235/0x5f0
> >    vmstat_update+0x17/0xa0
> >    process_one_work+0x869/0x1aa0
> >    worker_thread+0x5e5/0x1100
> >    kthread+0x29e/0x380
> >    ret_from_fork+0x2d/0x70
> >    ret_from_fork_asm+0x1a/0x30
> >    </TASK>
> >
> > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> > symmetrically enable it on startup.
> >
> > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> 
> Hi,
> 
> I observed a warning in my qemu and real hardware, which I bisected to this commit:
> 
> [    0.087733] ------------[ cut here ]------------
> [    0.087733] workqueue: work disable count underflowed
> [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0
> 
> This is:
> 
> static void work_offqd_enable(struct work_offq_data *offqd)
> {
> 	if (likely(offqd->disable > 0))
> 		offqd->disable--;
> 	else
> 		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
> }
> 
> So (based on this code) presumably an enable is only required if previously
> disabled, and this code is being called on startup unconditionally without
> the work having been disabled previously? I'm not hugely familiar with
> delayed workqueue implementation details.
> 
> [    0.087733] Modules linked in:
> [    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
> [    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> [    0.087733] RIP: 0010:enable_work+0xb5/0xc0
> [    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
> [    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
> [    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
> [    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
> [    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
> [    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
> [    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
> [    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
> [    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
> [    0.087733] PKRU: 55555554
> [    0.087733] Call Trace:
> [    0.087733]  <TASK>
> [    0.087733]  ? enable_work+0xb5/0xc0
> [    0.087733]  ? __warn.cold+0x93/0xf2
> [    0.087733]  ? enable_work+0xb5/0xc0
> [    0.087733]  ? report_bug+0xff/0x140
> [    0.087733]  ? handle_bug+0x54/0x90
> [    0.087733]  ? exc_invalid_op+0x17/0x70
> [    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
> [    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
> [    0.087733]  ? enable_work+0xb5/0xc0
> [    0.087733]  vmstat_cpu_online+0x5c/0x70
> [    0.087733]  cpuhp_invoke_callback+0x133/0x440
> [    0.087733]  cpuhp_thread_fun+0x95/0x150
> [    0.087733]  smpboot_thread_fn+0xd5/0x1d0
> [    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
> [    0.087735]  kthread+0xc8/0xf0
> [    0.087737]  ? __pfx_kthread+0x10/0x10
> [    0.087738]  ret_from_fork+0x2c/0x50
> [    0.087739]  ? __pfx_kthread+0x10/0x10
> [    0.087740]  ret_from_fork_asm+0x1a/0x30
> [    0.087742]  </TASK>
> [    0.087742] ---[ end trace 0000000000000000 ]---
> 
> 
> > ---
> > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
> > ---
> >  mm/vmstat.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 4d016314a56c..0889b75cef14 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
> >  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
> >  		node_set_state(cpu_to_node(cpu), N_CPU);
> >  	}
> > +	enable_delayed_work(&per_cpu(vmstat_work, cpu));
> 
> Probably needs to be 'if disabled' here, as this is invoked on normal
> startup when the work won't have been disabled?
> 
> Had a brief look at code and couldn't see how that could be done
> however... and one would need to be careful about races... Tricky!
> 
> >
> >  	return 0;
> >  }
> >
> >  static int vmstat_cpu_down_prep(unsigned int cpu)
> >  {
> > -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >  	return 0;
> >  }
> >
> > --
> > 2.43.0
> >
> >
> 
> Let me know if you need any more details, .config etc.
> 
> I noticed this warning on a real box too (in both cases running akpm's
> mm-unstable branch), FWIW.

Thank you for the report. I was able to reproduce the warning and now
wonder how I missed it.. My oversight, apologies.

In my current view, the simplest solution would be to make sure a local
vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
during boot-up. The following patch suppresses the warning:

  diff --git a/mm/vmstat.c b/mm/vmstat.c
  index 0889b75cef14..19ceed5d34bf 100644
  --- a/mm/vmstat.c
  +++ b/mm/vmstat.c
  @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
   {
          int cpu;
  
  -       for_each_possible_cpu(cpu)
  +       for_each_possible_cpu(cpu) {
                  INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
                          vmstat_update);
  
  +               /* will be enabled on vmstat_cpu_online */
  +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
  +       }
  +
          schedule_delayed_work(&shepherd,
                  round_jiffies_relative(sysctl_stat_interval));
   }

If you think of a better solution later, please let me know. Otherwise,
I'll submit a follow-up fix patch with the above diff.

Thanks.

-Koichiro


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-04  4:00   ` Koichiro Den
@ 2025-01-06  2:18     ` Charalampos Mitrodimas
  2025-01-06 10:04     ` Mark Rutland
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Charalampos Mitrodimas @ 2025-01-06  2:18 UTC (permalink / raw)
  To: Koichiro Den; +Cc: Lorenzo Stoakes, linux-mm, akpm, linux-kernel

Koichiro Den <koichiro.den@canonical.com> writes:

> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
>> On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
>> > Even after mm/vmstat:online teardown, shepherd may still queue work for
>> > the dying cpu until the cpu is removed from online mask. While it's
>> > quite rare, this means that after unbind_workers() unbinds a per-cpu
>> > kworker, it potentially runs vmstat_update for the dying CPU on an
>> > irrelevant cpu before entering atomic AP states.
>> > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
>> > backtrace.
>> >
>> >   BUG: using smp_processor_id() in preemptible [00000000] code: \
>> >                                                kworker/7:3/1702
>> >   caller is refresh_cpu_vm_stats+0x235/0x5f0
>> >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
>> >   Tainted: [N]=TEST
>> >   Workqueue: mm_percpu_wq vmstat_update
>> >   Call Trace:
>> >    <TASK>
>> >    dump_stack_lvl+0x8d/0xb0
>> >    check_preemption_disabled+0xce/0xe0
>> >    refresh_cpu_vm_stats+0x235/0x5f0
>> >    vmstat_update+0x17/0xa0
>> >    process_one_work+0x869/0x1aa0
>> >    worker_thread+0x5e5/0x1100
>> >    kthread+0x29e/0x380
>> >    ret_from_fork+0x2d/0x70
>> >    ret_from_fork_asm+0x1a/0x30
>> >    </TASK>
>> >
>> > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
>> > symmetrically enable it on startup.
>> >
>> > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
>>
>> Hi,
>>
>> I observed a warning in my qemu and real hardware, which I bisected to this commit:
>>
>> [    0.087733] ------------[ cut here ]------------
>> [    0.087733] workqueue: work disable count underflowed
>> [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0

I also encountered this in my QEMU Debian installation.

>>
>> This is:
>>
>> static void work_offqd_enable(struct work_offq_data *offqd)
>> {
>> 	if (likely(offqd->disable > 0))
>> 		offqd->disable--;
>> 	else
>> 		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
>> }
>>
>> So (based on this code) presumably an enable is only required if previously
>> disabled, and this code is being called on startup unconditionally without
>> the work having been disabled previously? I'm not hugely familiar with
>> delayed workqueue implementation details.
>>
>> [    0.087733] Modules linked in:
>> [    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
>> [    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>> [    0.087733] RIP: 0010:enable_work+0xb5/0xc0
>> [    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
>> [    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
>> [    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
>> [    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
>> [    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
>> [    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
>> [    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
>> [    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
>> [    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
>> [    0.087733] PKRU: 55555554
>> [    0.087733] Call Trace:
>> [    0.087733]  <TASK>
>> [    0.087733]  ? enable_work+0xb5/0xc0
>> [    0.087733]  ? __warn.cold+0x93/0xf2
>> [    0.087733]  ? enable_work+0xb5/0xc0
>> [    0.087733]  ? report_bug+0xff/0x140
>> [    0.087733]  ? handle_bug+0x54/0x90
>> [    0.087733]  ? exc_invalid_op+0x17/0x70
>> [    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
>> [    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
>> [    0.087733]  ? enable_work+0xb5/0xc0
>> [    0.087733]  vmstat_cpu_online+0x5c/0x70
>> [    0.087733]  cpuhp_invoke_callback+0x133/0x440
>> [    0.087733]  cpuhp_thread_fun+0x95/0x150
>> [    0.087733]  smpboot_thread_fn+0xd5/0x1d0
>> [    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
>> [    0.087735]  kthread+0xc8/0xf0
>> [    0.087737]  ? __pfx_kthread+0x10/0x10
>> [    0.087738]  ret_from_fork+0x2c/0x50
>> [    0.087739]  ? __pfx_kthread+0x10/0x10
>> [    0.087740]  ret_from_fork_asm+0x1a/0x30
>> [    0.087742]  </TASK>
>> [    0.087742] ---[ end trace 0000000000000000 ]---
>>
>>
>> > ---
>> > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
>> > ---
>> >  mm/vmstat.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/vmstat.c b/mm/vmstat.c
>> > index 4d016314a56c..0889b75cef14 100644
>> > --- a/mm/vmstat.c
>> > +++ b/mm/vmstat.c
>> > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
>> >  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
>> >  		node_set_state(cpu_to_node(cpu), N_CPU);
>> >  	}
>> > +	enable_delayed_work(&per_cpu(vmstat_work, cpu));
>>
>> Probably needs to be 'if disabled' here, as this is invoked on normal
>> startup when the work won't have been disabled?
>>
>> Had a brief look at code and couldn't see how that could be done
>> however... and one would need to be careful about races... Tricky!
>>
>> >
>> >  	return 0;
>> >  }
>> >
>> >  static int vmstat_cpu_down_prep(unsigned int cpu)
>> >  {
>> > -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>> > +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>> >  	return 0;
>> >  }
>> >
>> > --
>> > 2.43.0
>> >
>> >
>>
>> Let me know if you need any more details, .config etc.
>>
>> I noticed this warning on a real box too (in both cases running akpm's
>> mm-unstable branch), FWIW.
>
> Thank you for the report. I was able to reproduce the warning and now
> wonder how I missed it.. My oversight, apologies.
>
> In my current view, the simplest solution would be to make sure a local
> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> during boot-up. The following patch suppresses the warning:
>
>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>   index 0889b75cef14..19ceed5d34bf 100644
>   --- a/mm/vmstat.c
>   +++ b/mm/vmstat.c
>   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
>    {
>           int cpu;
>
>   -       for_each_possible_cpu(cpu)
>   +       for_each_possible_cpu(cpu) {
>                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                           vmstat_update);
>
>   +               /* will be enabled on vmstat_cpu_online */
>   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>   +       }
>   +
>           schedule_delayed_work(&shepherd,
>                   round_jiffies_relative(sysctl_stat_interval));
>    }
>
> If you think of a better solution later, please let me know. Otherwise,
> I'll submit a follow-up fix patch with the above diff.

Can't think of a better solution myself but this fixes the issue. Thanks!


>
> Thanks.
>
> -Koichiro

C. Mitrodimas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-04  4:00   ` Koichiro Den
  2025-01-06  2:18     ` Charalampos Mitrodimas
@ 2025-01-06 10:04     ` Mark Rutland
  2025-01-06 10:18     ` Geert Uytterhoeven
  2025-01-06 10:52     ` Lorenzo Stoakes
  3 siblings, 0 replies; 22+ messages in thread
From: Mark Rutland @ 2025-01-06 10:04 UTC (permalink / raw)
  To: Koichiro Den; +Cc: Lorenzo Stoakes, linux-mm, akpm, linux-kernel

On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote:
> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> > Hi,
> > 
> > I observed a warning in my qemu and real hardware, which I bisected to this commit:
> > 
> > [    0.087733] ------------[ cut here ]------------
> > [    0.087733] workqueue: work disable count underflowed
> > [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0

FWIW, I hit similar when testing v6.13-rc6 defconfig on arm64, when
booting secondaries I always get a splat (trimmed):

| ------------[ cut here ]------------
| workqueue: work disable count underflowed
| WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4317 enable_work+0xfc/0x108
| Modules linked in:
| CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc6 #1
| Hardware name: linux,dummy-virt (DT)
| [...]
| Call trace:
|  enable_work+0xfc/0x108 (P)
|  enable_delayed_work+0x10/0x1c
|  vmstat_cpu_online+0x88/0xbc
|  cpuhp_invoke_callback+0x10c/0x208
|  cpuhp_thread_fun+0xb0/0x1a0
|  smpboot_thread_fn+0x20c/0x234
|  kthread+0x110/0x114
|  ret_from_fork+0x10/0x20
| ---[ end trace 0000000000000000 ]---

[...]

> In my current view, the simplest solution would be to make sure a local
> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> during boot-up. The following patch suppresses the warning:
> 
>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>   index 0889b75cef14..19ceed5d34bf 100644
>   --- a/mm/vmstat.c
>   +++ b/mm/vmstat.c
>   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
>    {
>           int cpu;
>   
>   -       for_each_possible_cpu(cpu)
>   +       for_each_possible_cpu(cpu) {
>                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                           vmstat_update);
>   
>   +               /* will be enabled on vmstat_cpu_online */
>   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>   +       }
>   +
>           schedule_delayed_work(&shepherd,
>                   round_jiffies_relative(sysctl_stat_interval));
>    }

FWIW, the above solves the warning for me.

Mark.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-04  4:00   ` Koichiro Den
  2025-01-06  2:18     ` Charalampos Mitrodimas
  2025-01-06 10:04     ` Mark Rutland
@ 2025-01-06 10:18     ` Geert Uytterhoeven
  2025-01-06 10:52     ` Lorenzo Stoakes
  3 siblings, 0 replies; 22+ messages in thread
From: Geert Uytterhoeven @ 2025-01-06 10:18 UTC (permalink / raw)
  To: Koichiro Den; +Cc: Lorenzo Stoakes, linux-mm, akpm, linux-kernel, Linux-Renesas

Hi Koichiro,

On Sat, Jan 4, 2025 at 5:00 AM Koichiro Den <koichiro.den@canonical.com> wrote:
> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> > > Even after mm/vmstat:online teardown, shepherd may still queue work for
> > > the dying cpu until the cpu is removed from online mask. While it's
> > > quite rare, this means that after unbind_workers() unbinds a per-cpu
> > > kworker, it potentially runs vmstat_update for the dying CPU on an
> > > irrelevant cpu before entering atomic AP states.
> > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> > > backtrace.
> > >
> > >   BUG: using smp_processor_id() in preemptible [00000000] code: \
> > >                                                kworker/7:3/1702
> > >   caller is refresh_cpu_vm_stats+0x235/0x5f0
> > >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
> > >   Tainted: [N]=TEST
> > >   Workqueue: mm_percpu_wq vmstat_update
> > >   Call Trace:
> > >    <TASK>
> > >    dump_stack_lvl+0x8d/0xb0
> > >    check_preemption_disabled+0xce/0xe0
> > >    refresh_cpu_vm_stats+0x235/0x5f0
> > >    vmstat_update+0x17/0xa0
> > >    process_one_work+0x869/0x1aa0
> > >    worker_thread+0x5e5/0x1100
> > >    kthread+0x29e/0x380
> > >    ret_from_fork+0x2d/0x70
> > >    ret_from_fork_asm+0x1a/0x30
> > >    </TASK>
> > >
> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> > > symmetrically enable it on startup.
> > >
> > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> >
> > I observed a warning in my qemu and real hardware, which I bisected to this commit:
> >
> > [    0.087733] ------------[ cut here ]------------
> > [    0.087733] workqueue: work disable count underflowed
> > [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0

I am seeing the same on arm32 (R-Car M2-W) and arm64 (R-Car H3 ES2.0).

> Thank you for the report. I was able to reproduce the warning and now
> wonder how I missed it.. My oversight, apologies.
>
> In my current view, the simplest solution would be to make sure a local
> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> during boot-up. The following patch suppresses the warning:
>
>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>   index 0889b75cef14..19ceed5d34bf 100644
>   --- a/mm/vmstat.c
>   +++ b/mm/vmstat.c
>   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
>    {
>           int cpu;
>
>   -       for_each_possible_cpu(cpu)
>   +       for_each_possible_cpu(cpu) {
>                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                           vmstat_update);
>
>   +               /* will be enabled on vmstat_cpu_online */
>   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>   +       }
>   +
>           schedule_delayed_work(&shepherd,
>                   round_jiffies_relative(sysctl_stat_interval));
>    }
>
> If you think of a better solution later, please let me know. Otherwise,
> I'll submit a follow-up fix patch with the above diff.

Thank you, that fixes the warnings for me!
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-04  4:00   ` Koichiro Den
                       ` (2 preceding siblings ...)
  2025-01-06 10:18     ` Geert Uytterhoeven
@ 2025-01-06 10:52     ` Lorenzo Stoakes
  2025-01-06 12:53       ` Charalampos Mitrodimas
  2025-01-06 13:03       ` Koichiro Den
  3 siblings, 2 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-01-06 10:52 UTC (permalink / raw)
  To: Koichiro Den
  Cc: linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra

+cc tglx, peterz for insight on CPU hot plug

On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote:
> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> > > Even after mm/vmstat:online teardown, shepherd may still queue work for
> > > the dying cpu until the cpu is removed from online mask. While it's
> > > quite rare, this means that after unbind_workers() unbinds a per-cpu
> > > kworker, it potentially runs vmstat_update for the dying CPU on an
> > > irrelevant cpu before entering atomic AP states.
> > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> > > backtrace.
> > >
> > >   BUG: using smp_processor_id() in preemptible [00000000] code: \
> > >                                                kworker/7:3/1702
> > >   caller is refresh_cpu_vm_stats+0x235/0x5f0
> > >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
> > >   Tainted: [N]=TEST
> > >   Workqueue: mm_percpu_wq vmstat_update
> > >   Call Trace:
> > >    <TASK>
> > >    dump_stack_lvl+0x8d/0xb0
> > >    check_preemption_disabled+0xce/0xe0
> > >    refresh_cpu_vm_stats+0x235/0x5f0
> > >    vmstat_update+0x17/0xa0
> > >    process_one_work+0x869/0x1aa0
> > >    worker_thread+0x5e5/0x1100
> > >    kthread+0x29e/0x380
> > >    ret_from_fork+0x2d/0x70
> > >    ret_from_fork_asm+0x1a/0x30
> > >    </TASK>
> > >
> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> > > symmetrically enable it on startup.
> > >
> > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> >
> > Hi,
> >
> > I observed a warning in my qemu and real hardware, which I bisected to this commit:
> >
> > [    0.087733] ------------[ cut here ]------------
> > [    0.087733] workqueue: work disable count underflowed
> > [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0
> >
> > This is:
> >
> > static void work_offqd_enable(struct work_offq_data *offqd)
> > {
> > 	if (likely(offqd->disable > 0))
> > 		offqd->disable--;
> > 	else
> > 		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
> > }
> >
> > So (based on this code) presumably an enable is only required if previously
> > disabled, and this code is being called on startup unconditionally without
> > the work having been disabled previously? I'm not hugely familiar with
> > delayed workqueue implementation details.
> >
> > [    0.087733] Modules linked in:
> > [    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
> > [    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> > [    0.087733] RIP: 0010:enable_work+0xb5/0xc0
> > [    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
> > [    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
> > [    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
> > [    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
> > [    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
> > [    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
> > [    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
> > [    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
> > [    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
> > [    0.087733] PKRU: 55555554
> > [    0.087733] Call Trace:
> > [    0.087733]  <TASK>
> > [    0.087733]  ? enable_work+0xb5/0xc0
> > [    0.087733]  ? __warn.cold+0x93/0xf2
> > [    0.087733]  ? enable_work+0xb5/0xc0
> > [    0.087733]  ? report_bug+0xff/0x140
> > [    0.087733]  ? handle_bug+0x54/0x90
> > [    0.087733]  ? exc_invalid_op+0x17/0x70
> > [    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
> > [    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
> > [    0.087733]  ? enable_work+0xb5/0xc0
> > [    0.087733]  vmstat_cpu_online+0x5c/0x70
> > [    0.087733]  cpuhp_invoke_callback+0x133/0x440
> > [    0.087733]  cpuhp_thread_fun+0x95/0x150
> > [    0.087733]  smpboot_thread_fn+0xd5/0x1d0
> > [    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
> > [    0.087735]  kthread+0xc8/0xf0
> > [    0.087737]  ? __pfx_kthread+0x10/0x10
> > [    0.087738]  ret_from_fork+0x2c/0x50
> > [    0.087739]  ? __pfx_kthread+0x10/0x10
> > [    0.087740]  ret_from_fork_asm+0x1a/0x30
> > [    0.087742]  </TASK>
> > [    0.087742] ---[ end trace 0000000000000000 ]---
> >
> >
> > > ---
> > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
> > > ---
> > >  mm/vmstat.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > index 4d016314a56c..0889b75cef14 100644
> > > --- a/mm/vmstat.c
> > > +++ b/mm/vmstat.c
> > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
> > >  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
> > >  		node_set_state(cpu_to_node(cpu), N_CPU);
> > >  	}
> > > +	enable_delayed_work(&per_cpu(vmstat_work, cpu));
> >
> > Probably needs to be 'if disabled' here, as this is invoked on normal
> > startup when the work won't have been disabled?
> >
> > Had a brief look at code and couldn't see how that could be done
> > however... and one would need to be careful about races... Tricky!
> >
> > >
> > >  	return 0;
> > >  }
> > >
> > >  static int vmstat_cpu_down_prep(unsigned int cpu)
> > >  {
> > > -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > >  	return 0;
> > >  }
> > >
> > > --
> > > 2.43.0
> > >
> > >
> >
> > Let me know if you need any more details, .config etc.
> >
> > I noticed this warning on a real box too (in both cases running akpm's
> > mm-unstable branch), FWIW.
>
> Thank you for the report. I was able to reproduce the warning and now
> wonder how I missed it.. My oversight, apologies.
>
> In my current view, the simplest solution would be to make sure a local
> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> during boot-up. The following patch suppresses the warning:
>
>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>   index 0889b75cef14..19ceed5d34bf 100644
>   --- a/mm/vmstat.c
>   +++ b/mm/vmstat.c
>   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
>    {
>           int cpu;
>
>   -       for_each_possible_cpu(cpu)
>   +       for_each_possible_cpu(cpu) {
>                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                           vmstat_update);
>
>   +               /* will be enabled on vmstat_cpu_online */
>   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>   +       }
>   +
>           schedule_delayed_work(&shepherd,
>                   round_jiffies_relative(sysctl_stat_interval));
>    }
>
> If you think of a better solution later, please let me know. Otherwise,
> I'll submit a follow-up fix patch with the above diff.

Thanks, this resolves the problem, but are we sure that _all_ CPUs will
definitely call vmstat_cpu_online()?

I did a bit of printk output and it seems like this _didn't_ online CPU 0,
presumably the boot CPU which calls this function in the first instance?

I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls()
explicitly which does _not_ call the callback, though even if it did this
would be too early as it calls start_shepherd_timer() _after_ this anyway.

So yeah, unless I'm missing something, I think this patch is broken.

I have added Thomas and Peter to give some insight on the CPU hotplug side.

It feels like the patch really needs an 'enable if not already enabled'
call in vmstat_cpu_online().

>
> Thanks.
>
> -Koichiro


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-06 10:52     ` Lorenzo Stoakes
@ 2025-01-06 12:53       ` Charalampos Mitrodimas
  2025-01-06 12:58         ` Lorenzo Stoakes
  2025-01-06 13:03       ` Koichiro Den
  1 sibling, 1 reply; 22+ messages in thread
From: Charalampos Mitrodimas @ 2025-01-06 12:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Koichiro Den, linux-mm, akpm, linux-kernel, Thomas Gleixner,
	Peter Zijlstra

Lorenzo Stoakes <lorenzo.stoakes@oracle.com> writes:

> +cc tglx, peterz for insight on CPU hot plug
>
> On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote:
>> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
>> > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
>> > > Even after mm/vmstat:online teardown, shepherd may still queue work for
>> > > the dying cpu until the cpu is removed from online mask. While it's
>> > > quite rare, this means that after unbind_workers() unbinds a per-cpu
>> > > kworker, it potentially runs vmstat_update for the dying CPU on an
>> > > irrelevant cpu before entering atomic AP states.
>> > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
>> > > backtrace.
>> > >
>> > >   BUG: using smp_processor_id() in preemptible [00000000] code: \
>> > >                                                kworker/7:3/1702
>> > >   caller is refresh_cpu_vm_stats+0x235/0x5f0
>> > >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
>> > >   Tainted: [N]=TEST
>> > >   Workqueue: mm_percpu_wq vmstat_update
>> > >   Call Trace:
>> > >    <TASK>
>> > >    dump_stack_lvl+0x8d/0xb0
>> > >    check_preemption_disabled+0xce/0xe0
>> > >    refresh_cpu_vm_stats+0x235/0x5f0
>> > >    vmstat_update+0x17/0xa0
>> > >    process_one_work+0x869/0x1aa0
>> > >    worker_thread+0x5e5/0x1100
>> > >    kthread+0x29e/0x380
>> > >    ret_from_fork+0x2d/0x70
>> > >    ret_from_fork_asm+0x1a/0x30
>> > >    </TASK>
>> > >
>> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
>> > > symmetrically enable it on startup.
>> > >
>> > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
>> >
>> > Hi,
>> >
>> > I observed a warning in my qemu and real hardware, which I bisected to this commit:
>> >
>> > [    0.087733] ------------[ cut here ]------------
>> > [    0.087733] workqueue: work disable count underflowed
>> > [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0
>> >
>> > This is:
>> >
>> > static void work_offqd_enable(struct work_offq_data *offqd)
>> > {
>> > 	if (likely(offqd->disable > 0))
>> > 		offqd->disable--;
>> > 	else
>> > 		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
>> > }
>> >
>> > So (based on this code) presumably an enable is only required if previously
>> > disabled, and this code is being called on startup unconditionally without
>> > the work having been disabled previously? I'm not hugely familiar with
>> > delayed workqueue implementation details.
>> >
>> > [    0.087733] Modules linked in:
>> > [    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
>> > [    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
>> > [    0.087733] RIP: 0010:enable_work+0xb5/0xc0
>> > [    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
>> > [    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
>> > [    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
>> > [    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
>> > [    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
>> > [    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
>> > [    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
>> > [    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
>> > [    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
>> > [    0.087733] PKRU: 55555554
>> > [    0.087733] Call Trace:
>> > [    0.087733]  <TASK>
>> > [    0.087733]  ? enable_work+0xb5/0xc0
>> > [    0.087733]  ? __warn.cold+0x93/0xf2
>> > [    0.087733]  ? enable_work+0xb5/0xc0
>> > [    0.087733]  ? report_bug+0xff/0x140
>> > [    0.087733]  ? handle_bug+0x54/0x90
>> > [    0.087733]  ? exc_invalid_op+0x17/0x70
>> > [    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
>> > [    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
>> > [    0.087733]  ? enable_work+0xb5/0xc0
>> > [    0.087733]  vmstat_cpu_online+0x5c/0x70
>> > [    0.087733]  cpuhp_invoke_callback+0x133/0x440
>> > [    0.087733]  cpuhp_thread_fun+0x95/0x150
>> > [    0.087733]  smpboot_thread_fn+0xd5/0x1d0
>> > [    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
>> > [    0.087735]  kthread+0xc8/0xf0
>> > [    0.087737]  ? __pfx_kthread+0x10/0x10
>> > [    0.087738]  ret_from_fork+0x2c/0x50
>> > [    0.087739]  ? __pfx_kthread+0x10/0x10
>> > [    0.087740]  ret_from_fork_asm+0x1a/0x30
>> > [    0.087742]  </TASK>
>> > [    0.087742] ---[ end trace 0000000000000000 ]---
>> >
>> >
>> > > ---
>> > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
>> > > ---
>> > >  mm/vmstat.c | 3 ++-
>> > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/mm/vmstat.c b/mm/vmstat.c
>> > > index 4d016314a56c..0889b75cef14 100644
>> > > --- a/mm/vmstat.c
>> > > +++ b/mm/vmstat.c
>> > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
>> > >  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
>> > >  		node_set_state(cpu_to_node(cpu), N_CPU);
>> > >  	}
>> > > +	enable_delayed_work(&per_cpu(vmstat_work, cpu));
>> >
>> > Probably needs to be 'if disabled' here, as this is invoked on normal
>> > startup when the work won't have been disabled?
>> >
>> > Had a brief look at code and couldn't see how that could be done
>> > however... and one would need to be careful about races... Tricky!
>> >
>> > >
>> > >  	return 0;
>> > >  }
>> > >
>> > >  static int vmstat_cpu_down_prep(unsigned int cpu)
>> > >  {
>> > > -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>> > > +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>> > >  	return 0;
>> > >  }
>> > >
>> > > --
>> > > 2.43.0
>> > >
>> > >
>> >
>> > Let me know if you need any more details, .config etc.
>> >
>> > I noticed this warning on a real box too (in both cases running akpm's
>> > mm-unstable branch), FWIW.
>>
>> Thank you for the report. I was able to reproduce the warning and now
>> wonder how I missed it.. My oversight, apologies.
>>
>> In my current view, the simplest solution would be to make sure a local
>> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
>> during boot-up. The following patch suppresses the warning:
>>
>>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>>   index 0889b75cef14..19ceed5d34bf 100644
>>   --- a/mm/vmstat.c
>>   +++ b/mm/vmstat.c
>>   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
>>    {
>>           int cpu;
>>
>>   -       for_each_possible_cpu(cpu)
>>   +       for_each_possible_cpu(cpu) {
>>                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>>                           vmstat_update);
>>
>>   +               /* will be enabled on vmstat_cpu_online */
>>   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>>   +       }
>>   +
>>           schedule_delayed_work(&shepherd,
>>                   round_jiffies_relative(sysctl_stat_interval));
>>    }
>>
>> If you think of a better solution later, please let me know. Otherwise,
>> I'll submit a follow-up fix patch with the above diff.
>
> Thanks, this resolves the problem, but are we sure that _all_ CPUs will
> definitely call vmstat_cpu_online()?
>
> I did a bit of printk output and it seems like this _didn't_ online CPU 0,
> presumably the boot CPU which calls this function in the first instance?

FWIW with the proposed fix I can see that all CPUs are online,
  grep -H . /sys/devices/system/cpu/cpu*/online
  /sys/devices/system/cpu/cpu0/online:1
  /sys/devices/system/cpu/cpu1/online:1
  /sys/devices/system/cpu/cpu2/online:1
  /sys/devices/system/cpu/cpu3/online:1
  /sys/devices/system/cpu/cpu4/online:1
  /sys/devices/system/cpu/cpu5/online:1
  /sys/devices/system/cpu/cpu6/online:1
  /sys/devices/system/cpu/cpu7/online:1

>
> I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls()
> explicitly which does _not_ call the callback, though even if it did this
> would be too early as it calls start_shepherd_timer() _after_ this anyway.
>
> So yeah, unless I'm missing something, I think this patch is broken.
>
> I have added Thomas and Peter to give some insight on the CPU hotplug side.
>
> It feels like the patch really needs an 'enable if not already enabled'
> call in vmstat_cpu_online().
>
>>
>> Thanks.
>>
>> -Koichiro


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-06 12:53       ` Charalampos Mitrodimas
@ 2025-01-06 12:58         ` Lorenzo Stoakes
  0 siblings, 0 replies; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-01-06 12:58 UTC (permalink / raw)
  To: Charalampos Mitrodimas
  Cc: Koichiro Den, linux-mm, akpm, linux-kernel, Thomas Gleixner,
	Peter Zijlstra

On Mon, Jan 06, 2025 at 12:53:36PM +0000, Charalampos Mitrodimas wrote:
> > I did a bit of printk output and it seems like this _didn't_ online CPU 0,
> > presumably the boot CPU which calls this function in the first instance?
>
> FWIW with the proposed fix I can see that all CPUs are online,
>   grep -H . /sys/devices/system/cpu/cpu*/online
>   /sys/devices/system/cpu/cpu0/online:1
>   /sys/devices/system/cpu/cpu1/online:1
>   /sys/devices/system/cpu/cpu2/online:1
>   /sys/devices/system/cpu/cpu3/online:1
>   /sys/devices/system/cpu/cpu4/online:1
>   /sys/devices/system/cpu/cpu5/online:1
>   /sys/devices/system/cpu/cpu6/online:1
>   /sys/devices/system/cpu/cpu7/online:1
>

Sorry maybe I phrased this badly, I'm not suggesting CPUs aren't coming
online, I'm saying it doesn't look like vmstat_cpu_online() will be called
for the boot CPU, which breaks the proposed fix (the delayed work for this
CPU will simply never be enabled in this case).

Naturally, the boot CPU is _already_ online at this point, which is I
imagine why this is the case.

I had wondered if the function would be invoked on the boot CPU _anyway_
but it appears not.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-06 10:52     ` Lorenzo Stoakes
  2025-01-06 12:53       ` Charalampos Mitrodimas
@ 2025-01-06 13:03       ` Koichiro Den
  2025-01-06 13:53         ` Koichiro Den
  2025-01-07  1:18         ` [PATCH] Simple fix Huacai Chen
  1 sibling, 2 replies; 22+ messages in thread
From: Koichiro Den @ 2025-01-06 13:03 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra

On Mon, Jan 06, 2025 at 10:52:37AM +0000, Lorenzo Stoakes wrote:
> +cc tglx, peterz for insight on CPU hot plug
> 
> On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote:
> > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> > > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> > > > Even after mm/vmstat:online teardown, shepherd may still queue work for
> > > > the dying cpu until the cpu is removed from online mask. While it's
> > > > quite rare, this means that after unbind_workers() unbinds a per-cpu
> > > > kworker, it potentially runs vmstat_update for the dying CPU on an
> > > > irrelevant cpu before entering atomic AP states.
> > > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> > > > backtrace.
> > > >
> > > >   BUG: using smp_processor_id() in preemptible [00000000] code: \
> > > >                                                kworker/7:3/1702
> > > >   caller is refresh_cpu_vm_stats+0x235/0x5f0
> > > >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
> > > >   Tainted: [N]=TEST
> > > >   Workqueue: mm_percpu_wq vmstat_update
> > > >   Call Trace:
> > > >    <TASK>
> > > >    dump_stack_lvl+0x8d/0xb0
> > > >    check_preemption_disabled+0xce/0xe0
> > > >    refresh_cpu_vm_stats+0x235/0x5f0
> > > >    vmstat_update+0x17/0xa0
> > > >    process_one_work+0x869/0x1aa0
> > > >    worker_thread+0x5e5/0x1100
> > > >    kthread+0x29e/0x380
> > > >    ret_from_fork+0x2d/0x70
> > > >    ret_from_fork_asm+0x1a/0x30
> > > >    </TASK>
> > > >
> > > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> > > > symmetrically enable it on startup.
> > > >
> > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > >
> > > Hi,
> > >
> > > I observed a warning in my qemu and real hardware, which I bisected to this commit:
> > >
> > > [    0.087733] ------------[ cut here ]------------
> > > [    0.087733] workqueue: work disable count underflowed
> > > [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0
> > >
> > > This is:
> > >
> > > static void work_offqd_enable(struct work_offq_data *offqd)
> > > {
> > > 	if (likely(offqd->disable > 0))
> > > 		offqd->disable--;
> > > 	else
> > > 		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
> > > }
> > >
> > > So (based on this code) presumably an enable is only required if previously
> > > disabled, and this code is being called on startup unconditionally without
> > > the work having been disabled previously? I'm not hugely familiar with
> > > delayed workqueue implementation details.
> > >
> > > [    0.087733] Modules linked in:
> > > [    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
> > > [    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> > > [    0.087733] RIP: 0010:enable_work+0xb5/0xc0
> > > [    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
> > > [    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
> > > [    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
> > > [    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
> > > [    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
> > > [    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
> > > [    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
> > > [    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
> > > [    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
> > > [    0.087733] PKRU: 55555554
> > > [    0.087733] Call Trace:
> > > [    0.087733]  <TASK>
> > > [    0.087733]  ? enable_work+0xb5/0xc0
> > > [    0.087733]  ? __warn.cold+0x93/0xf2
> > > [    0.087733]  ? enable_work+0xb5/0xc0
> > > [    0.087733]  ? report_bug+0xff/0x140
> > > [    0.087733]  ? handle_bug+0x54/0x90
> > > [    0.087733]  ? exc_invalid_op+0x17/0x70
> > > [    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
> > > [    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
> > > [    0.087733]  ? enable_work+0xb5/0xc0
> > > [    0.087733]  vmstat_cpu_online+0x5c/0x70
> > > [    0.087733]  cpuhp_invoke_callback+0x133/0x440
> > > [    0.087733]  cpuhp_thread_fun+0x95/0x150
> > > [    0.087733]  smpboot_thread_fn+0xd5/0x1d0
> > > [    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
> > > [    0.087735]  kthread+0xc8/0xf0
> > > [    0.087737]  ? __pfx_kthread+0x10/0x10
> > > [    0.087738]  ret_from_fork+0x2c/0x50
> > > [    0.087739]  ? __pfx_kthread+0x10/0x10
> > > [    0.087740]  ret_from_fork_asm+0x1a/0x30
> > > [    0.087742]  </TASK>
> > > [    0.087742] ---[ end trace 0000000000000000 ]---
> > >
> > >
> > > > ---
> > > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
> > > > ---
> > > >  mm/vmstat.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > index 4d016314a56c..0889b75cef14 100644
> > > > --- a/mm/vmstat.c
> > > > +++ b/mm/vmstat.c
> > > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
> > > >  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
> > > >  		node_set_state(cpu_to_node(cpu), N_CPU);
> > > >  	}
> > > > +	enable_delayed_work(&per_cpu(vmstat_work, cpu));
> > >
> > > Probably needs to be 'if disabled' here, as this is invoked on normal
> > > startup when the work won't have been disabled?
> > >
> > > Had a brief look at code and couldn't see how that could be done
> > > however... and one would need to be careful about races... Tricky!
> > >
> > > >
> > > >  	return 0;
> > > >  }
> > > >
> > > >  static int vmstat_cpu_down_prep(unsigned int cpu)
> > > >  {
> > > > -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > >  	return 0;
> > > >  }
> > > >
> > > > --
> > > > 2.43.0
> > > >
> > > >
> > >
> > > Let me know if you need any more details, .config etc.
> > >
> > > I noticed this warning on a real box too (in both cases running akpm's
> > > mm-unstable branch), FWIW.
> >
> > Thank you for the report. I was able to reproduce the warning and now
> > wonder how I missed it.. My oversight, apologies.
> >
> > In my current view, the simplest solution would be to make sure a local
> > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> > during boot-up. The following patch suppresses the warning:
> >
> >   diff --git a/mm/vmstat.c b/mm/vmstat.c
> >   index 0889b75cef14..19ceed5d34bf 100644
> >   --- a/mm/vmstat.c
> >   +++ b/mm/vmstat.c
> >   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
> >    {
> >           int cpu;
> >
> >   -       for_each_possible_cpu(cpu)
> >   +       for_each_possible_cpu(cpu) {
> >                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> >                           vmstat_update);
> >
> >   +               /* will be enabled on vmstat_cpu_online */
> >   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >   +       }
> >   +
> >           schedule_delayed_work(&shepherd,
> >                   round_jiffies_relative(sysctl_stat_interval));
> >    }
> >
> > If you think of a better solution later, please let me know. Otherwise,
> > I'll submit a follow-up fix patch with the above diff.
> 
> Thanks, this resolves the problem, but are we sure that _all_ CPUs will
> definitely call vmstat_cpu_online()?
> 
> I did a bit of printk output and it seems like this _didn't_ online CPU 0,
> presumably the boot CPU which calls this function in the first instance?
> 
> I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls()
> explicitly which does _not_ call the callback, though even if it did this
> would be too early as it calls start_shepherd_timer() _after_ this anyway.
> 
> So yeah, unless I'm missing something, I think this patch is broken.

You're absolutely right, thanks a lot.
I also appreciate you testing it, thanks everyone.

> 
> I have added Thomas and Peter to give some insight on the CPU hotplug side.
> 
> It feels like the patch really needs an 'enable if not already enabled'
> call in vmstat_cpu_online().

Right. While not fully polished yet, I've tested the following diff:

  diff --git a/mm/vmstat.c b/mm/vmstat.c
  index 0889b75cef14..f967aa22392f 100644
  --- a/mm/vmstat.c
  +++ b/mm/vmstat.c
  @@ -1937,7 +1937,11 @@ static const struct seq_operations vmstat_op = {
   #endif /* CONFIG_PROC_FS */
  
   #ifdef CONFIG_SMP
  -static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
  +struct vmstat_work {
  +       struct delayed_work dwork;
  +       bool enabled;
  +};
  +static DEFINE_PER_CPU(struct vmstat_work, vmstat_work);
   int sysctl_stat_interval __read_mostly = HZ;
   static int vmstat_late_init_done;
  
  @@ -2015,7 +2019,7 @@ static void vmstat_update(struct work_struct *w)
                   * update worker thread.
                   */
                  queue_delayed_work_on(smp_processor_id(), mm_percpu_wq,
  -                               this_cpu_ptr(&vmstat_work),
  +                               &this_cpu_ptr(&vmstat_work)->dwork,
                                  round_jiffies_relative(sysctl_stat_interval));
          }
   }
  @@ -2059,7 +2063,7 @@ void quiet_vmstat(void)
          if (system_state != SYSTEM_RUNNING)
                  return;
  
  -       if (!delayed_work_pending(this_cpu_ptr(&vmstat_work)))
  +       if (!delayed_work_pending(&this_cpu_ptr(&vmstat_work)->dwork))
                  return;
  
          if (!need_update(smp_processor_id()))
  @@ -2091,7 +2095,7 @@ static void vmstat_shepherd(struct work_struct *w)
          cpus_read_lock();
          /* Check processors whose vmstat worker threads have been disabled */
          for_each_online_cpu(cpu) {
  -               struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
  +               struct delayed_work *dw = &per_cpu(vmstat_work, cpu).dwork;
  
                  /*
                   * In kernel users of vmstat counters either require the precise value and
  @@ -2120,11 +2124,14 @@ static void vmstat_shepherd(struct work_struct *w)
  
   static void __init start_shepherd_timer(void)
   {
  +       struct vmstat_work *vmstat;
          int cpu;
  
  -       for_each_possible_cpu(cpu)
  -               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
  -                       vmstat_update);
  +       for_each_possible_cpu(cpu) {
  +               vmstat = per_cpu_ptr(&vmstat_work, cpu);
  +               INIT_DEFERRABLE_WORK(&vmstat->dwork, vmstat_update);
  +               vmstat->enabled = true;
  +       }
  
          schedule_delayed_work(&shepherd,
                  round_jiffies_relative(sysctl_stat_interval));
  @@ -2142,20 +2149,28 @@ static void __init init_cpu_node_state(void)
  
   static int vmstat_cpu_online(unsigned int cpu)
   {
  +       struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu);
  +
          if (vmstat_late_init_done)
                  refresh_zone_stat_thresholds();
  
          if (!node_state(cpu_to_node(cpu), N_CPU)) {
                  node_set_state(cpu_to_node(cpu), N_CPU);
          }
  -       enable_delayed_work(&per_cpu(vmstat_work, cpu));
  +
  +       if (!vmstat->enabled) {
  +               enable_delayed_work(&vmstat->dwork);
  +               vmstat->enabled = true;
  +       }
  
          return 0;
   }
  
   static int vmstat_cpu_down_prep(unsigned int cpu)
   {
  -       disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
  +       struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu);
  +
  +       disable_delayed_work_sync(&vmstat->dwork);
          return 0;
   }

Lorenzo, and Thomas and Peter if you're available, I'd greatly appreciate
any thoughts or feedback on this.

Thanks.

-Koichiro Den


> 
> >
> > Thanks.
> >
> > -Koichiro


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
  2025-01-06 13:03       ` Koichiro Den
@ 2025-01-06 13:53         ` Koichiro Den
  2025-01-07  1:18         ` [PATCH] Simple fix Huacai Chen
  1 sibling, 0 replies; 22+ messages in thread
From: Koichiro Den @ 2025-01-06 13:53 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra

On Mon, Jan 06, 2025 at 10:03:57PM +0900, Koichiro Den wrote:
> On Mon, Jan 06, 2025 at 10:52:37AM +0000, Lorenzo Stoakes wrote:
> > +cc tglx, peterz for insight on CPU hot plug
> > 
> > On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote:
> > > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote:
> > > > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote:
> > > > > Even after mm/vmstat:online teardown, shepherd may still queue work for
> > > > > the dying cpu until the cpu is removed from online mask. While it's
> > > > > quite rare, this means that after unbind_workers() unbinds a per-cpu
> > > > > kworker, it potentially runs vmstat_update for the dying CPU on an
> > > > > irrelevant cpu before entering atomic AP states.
> > > > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
> > > > > backtrace.
> > > > >
> > > > >   BUG: using smp_processor_id() in preemptible [00000000] code: \
> > > > >                                                kworker/7:3/1702
> > > > >   caller is refresh_cpu_vm_stats+0x235/0x5f0
> > > > >   CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
> > > > >   Tainted: [N]=TEST
> > > > >   Workqueue: mm_percpu_wq vmstat_update
> > > > >   Call Trace:
> > > > >    <TASK>
> > > > >    dump_stack_lvl+0x8d/0xb0
> > > > >    check_preemption_disabled+0xce/0xe0
> > > > >    refresh_cpu_vm_stats+0x235/0x5f0
> > > > >    vmstat_update+0x17/0xa0
> > > > >    process_one_work+0x869/0x1aa0
> > > > >    worker_thread+0x5e5/0x1100
> > > > >    kthread+0x29e/0x380
> > > > >    ret_from_fork+0x2d/0x70
> > > > >    ret_from_fork_asm+0x1a/0x30
> > > > >    </TASK>
> > > > >
> > > > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
> > > > > symmetrically enable it on startup.
> > > > >
> > > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
> > > >
> > > > Hi,
> > > >
> > > > I observed a warning in my qemu and real hardware, which I bisected to this commit:
> > > >
> > > > [    0.087733] ------------[ cut here ]------------
> > > > [    0.087733] workqueue: work disable count underflowed
> > > > [    0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0
> > > >
> > > > This is:
> > > >
> > > > static void work_offqd_enable(struct work_offq_data *offqd)
> > > > {
> > > > 	if (likely(offqd->disable > 0))
> > > > 		offqd->disable--;
> > > > 	else
> > > > 		WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line
> > > > }
> > > >
> > > > So (based on this code) presumably an enable is only required if previously
> > > > disabled, and this code is being called on startup unconditionally without
> > > > the work having been disabled previously? I'm not hugely familiar with
> > > > delayed workqueue implementation details.
> > > >
> > > > [    0.087733] Modules linked in:
> > > > [    0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58
> > > > [    0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
> > > > [    0.087733] RIP: 0010:enable_work+0xb5/0xc0
> > > > [    0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90
> > > > [    0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092
> > > > [    0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000
> > > > [    0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001
> > > > [    0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08
> > > > [    0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4
> > > > [    0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648
> > > > [    0.087733] FS:  0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000
> > > > [    0.087733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [    0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0
> > > > [    0.087733] PKRU: 55555554
> > > > [    0.087733] Call Trace:
> > > > [    0.087733]  <TASK>
> > > > [    0.087733]  ? enable_work+0xb5/0xc0
> > > > [    0.087733]  ? __warn.cold+0x93/0xf2
> > > > [    0.087733]  ? enable_work+0xb5/0xc0
> > > > [    0.087733]  ? report_bug+0xff/0x140
> > > > [    0.087733]  ? handle_bug+0x54/0x90
> > > > [    0.087733]  ? exc_invalid_op+0x17/0x70
> > > > [    0.087733]  ? asm_exc_invalid_op+0x1a/0x20
> > > > [    0.087733]  ? __pfx_vmstat_cpu_online+0x10/0x10
> > > > [    0.087733]  ? enable_work+0xb5/0xc0
> > > > [    0.087733]  vmstat_cpu_online+0x5c/0x70
> > > > [    0.087733]  cpuhp_invoke_callback+0x133/0x440
> > > > [    0.087733]  cpuhp_thread_fun+0x95/0x150
> > > > [    0.087733]  smpboot_thread_fn+0xd5/0x1d0
> > > > [    0.087734]  ? __pfx_smpboot_thread_fn+0x10/0x10
> > > > [    0.087735]  kthread+0xc8/0xf0
> > > > [    0.087737]  ? __pfx_kthread+0x10/0x10
> > > > [    0.087738]  ret_from_fork+0x2c/0x50
> > > > [    0.087739]  ? __pfx_kthread+0x10/0x10
> > > > [    0.087740]  ret_from_fork_asm+0x1a/0x30
> > > > [    0.087742]  </TASK>
> > > > [    0.087742] ---[ end trace 0000000000000000 ]---
> > > >
> > > >
> > > > > ---
> > > > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
> > > > > ---
> > > > >  mm/vmstat.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > > index 4d016314a56c..0889b75cef14 100644
> > > > > --- a/mm/vmstat.c
> > > > > +++ b/mm/vmstat.c
> > > > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
> > > > >  	if (!node_state(cpu_to_node(cpu), N_CPU)) {
> > > > >  		node_set_state(cpu_to_node(cpu), N_CPU);
> > > > >  	}
> > > > > +	enable_delayed_work(&per_cpu(vmstat_work, cpu));
> > > >
> > > > Probably needs to be 'if disabled' here, as this is invoked on normal
> > > > startup when the work won't have been disabled?
> > > >
> > > > Had a brief look at code and couldn't see how that could be done
> > > > however... and one would need to be careful about races... Tricky!
> > > >
> > > > >
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > >  static int vmstat_cpu_down_prep(unsigned int cpu)
> > > > >  {
> > > > > -	cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > > +	disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > --
> > > > > 2.43.0
> > > > >
> > > > >
> > > >
> > > > Let me know if you need any more details, .config etc.
> > > >
> > > > I noticed this warning on a real box too (in both cases running akpm's
> > > > mm-unstable branch), FWIW.
> > >
> > > Thank you for the report. I was able to reproduce the warning and now
> > > wonder how I missed it.. My oversight, apologies.
> > >
> > > In my current view, the simplest solution would be to make sure a local
> > > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even
> > > during boot-up. The following patch suppresses the warning:
> > >
> > >   diff --git a/mm/vmstat.c b/mm/vmstat.c
> > >   index 0889b75cef14..19ceed5d34bf 100644
> > >   --- a/mm/vmstat.c
> > >   +++ b/mm/vmstat.c
> > >   @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void)
> > >    {
> > >           int cpu;
> > >
> > >   -       for_each_possible_cpu(cpu)
> > >   +       for_each_possible_cpu(cpu) {
> > >                   INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > >                           vmstat_update);
> > >
> > >   +               /* will be enabled on vmstat_cpu_online */
> > >   +               disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > >   +       }
> > >   +
> > >           schedule_delayed_work(&shepherd,
> > >                   round_jiffies_relative(sysctl_stat_interval));
> > >    }
> > >
> > > If you think of a better solution later, please let me know. Otherwise,
> > > I'll submit a follow-up fix patch with the above diff.
> > 
> > Thanks, this resolves the problem, but are we sure that _all_ CPUs will
> > definitely call vmstat_cpu_online()?
> > 
> > I did a bit of printk output and it seems like this _didn't_ online CPU 0,
> > presumably the boot CPU which calls this function in the first instance?
> > 
> > I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls()
> > explicitly which does _not_ call the callback, though even if it did this
> > would be too early as it calls start_shepherd_timer() _after_ this anyway.
> > 
> > So yeah, unless I'm missing something, I think this patch is broken.
> 
> You're absolutely right, thanks a lot.
> I also appreciate you testing it, thanks everyone.
> 
> > 
> > I have added Thomas and Peter to give some insight on the CPU hotplug side.
> > 
> > It feels like the patch really needs an 'enable if not already enabled'
> > call in vmstat_cpu_online().
> 
> Right. While not fully polished yet, I've tested the following diff:
> 
>   diff --git a/mm/vmstat.c b/mm/vmstat.c
>   index 0889b75cef14..f967aa22392f 100644
>   --- a/mm/vmstat.c
>   +++ b/mm/vmstat.c
>   @@ -1937,7 +1937,11 @@ static const struct seq_operations vmstat_op = {
>    #endif /* CONFIG_PROC_FS */
>   
>    #ifdef CONFIG_SMP
>   -static DEFINE_PER_CPU(struct delayed_work, vmstat_work);
>   +struct vmstat_work {
>   +       struct delayed_work dwork;
>   +       bool enabled;
>   +};
>   +static DEFINE_PER_CPU(struct vmstat_work, vmstat_work);
>    int sysctl_stat_interval __read_mostly = HZ;
>    static int vmstat_late_init_done;
>   
>   @@ -2015,7 +2019,7 @@ static void vmstat_update(struct work_struct *w)
>                    * update worker thread.
>                    */
>                   queue_delayed_work_on(smp_processor_id(), mm_percpu_wq,
>   -                               this_cpu_ptr(&vmstat_work),
>   +                               &this_cpu_ptr(&vmstat_work)->dwork,
>                                   round_jiffies_relative(sysctl_stat_interval));
>           }
>    }
>   @@ -2059,7 +2063,7 @@ void quiet_vmstat(void)
>           if (system_state != SYSTEM_RUNNING)
>                   return;
>   
>   -       if (!delayed_work_pending(this_cpu_ptr(&vmstat_work)))
>   +       if (!delayed_work_pending(&this_cpu_ptr(&vmstat_work)->dwork))
>                   return;
>   
>           if (!need_update(smp_processor_id()))
>   @@ -2091,7 +2095,7 @@ static void vmstat_shepherd(struct work_struct *w)
>           cpus_read_lock();
>           /* Check processors whose vmstat worker threads have been disabled */
>           for_each_online_cpu(cpu) {
>   -               struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
>   +               struct delayed_work *dw = &per_cpu(vmstat_work, cpu).dwork;
>   
>                   /*
>                    * In kernel users of vmstat counters either require the precise value and
>   @@ -2120,11 +2124,14 @@ static void vmstat_shepherd(struct work_struct *w)
>   
>    static void __init start_shepherd_timer(void)
>    {
>   +       struct vmstat_work *vmstat;
>           int cpu;
>   
>   -       for_each_possible_cpu(cpu)
>   -               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>   -                       vmstat_update);
>   +       for_each_possible_cpu(cpu) {
>   +               vmstat = per_cpu_ptr(&vmstat_work, cpu);
>   +               INIT_DEFERRABLE_WORK(&vmstat->dwork, vmstat_update);
>   +               vmstat->enabled = true;
>   +       }
>   
>           schedule_delayed_work(&shepherd,
>                   round_jiffies_relative(sysctl_stat_interval));
>   @@ -2142,20 +2149,28 @@ static void __init init_cpu_node_state(void)
>   
>    static int vmstat_cpu_online(unsigned int cpu)
>    {
>   +       struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu);
>   +
>           if (vmstat_late_init_done)
>                   refresh_zone_stat_thresholds();
>   
>           if (!node_state(cpu_to_node(cpu), N_CPU)) {
>                   node_set_state(cpu_to_node(cpu), N_CPU);
>           }
>   -       enable_delayed_work(&per_cpu(vmstat_work, cpu));
>   +
>   +       if (!vmstat->enabled) {
>   +               enable_delayed_work(&vmstat->dwork);
>   +               vmstat->enabled = true;
>   +       }
>   
>           return 0;
>    }
>   
>    static int vmstat_cpu_down_prep(unsigned int cpu)
>    {
>   -       disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>   +       struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu);
>   +
>   +       disable_delayed_work_sync(&vmstat->dwork);

The line `vmstat->enabled = false;` was missing here.

With that fixed, I've conducted the following tests:
* Boot-up check: confirmed that vmstat_update() runs on all cpus after boot-up.
* Cpuhp transitions: verified resilience during random CPU hotplug
  transitions using the steps below:
  1. enabled dynamic ftrace on vmstat_update()
  2. ran the following simple silly script for a while
     #!/bin/bash
     while :; do
       now=$(cat /sys/devices/system/cpu/cpu7/hotplug/state)
       next=$(shuf --random-source=/dev/urandom -i 0-238 -n 1)
       echo "${now} -> ${next}"
       echo $next > /sys/devices/system/cpu/cpu7/hotplug/target 2>/dev/null
       sleep 0.001
     done

  3. after stopping the script, ensured that the cpu was fully online again
     by writing the largest value to the 'state' file.
  4. verified that vmstat_update() was working on the cpu.

Let me know if you have any suggestions or concerns.

Thanks.

-Koichiro Den

>           return 0;
>    }
> 
> Lorenzo, and Thomas and Peter if you're available, I'd greatly appreciate
> any thoughts or feedback on this.
> 
> Thanks.
> 
> -Koichiro Den
> 
> 
> > 
> > >
> > > Thanks.
> > >
> > > -Koichiro


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] Simple fix
  2025-01-06 13:03       ` Koichiro Den
  2025-01-06 13:53         ` Koichiro Den
@ 2025-01-07  1:18         ` Huacai Chen
  2025-01-07  3:35           ` Matthew Wilcox
  2025-01-07  8:47           ` Lorenzo Stoakes
  1 sibling, 2 replies; 22+ messages in thread
From: Huacai Chen @ 2025-01-07  1:18 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Andrew Morton, linux-mm, Koichiro Den, Sebastian Andrzej Siewior,
	Lorenzo Stoakes, Mark Rutland, Charalampos Mitrodimas,
	Huacai Chen

Hi, all, I like simple fixes, so is this acceptable (based on an early
version from Koichiro Den)?
---
 mm/vmstat.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 0889b75cef14..1badc24a21ff 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
 {
 	int cpu;
 
-	for_each_possible_cpu(cpu)
+	for_each_possible_cpu(cpu) {
 		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
 			vmstat_update);
 
+		/* Will be enabled on vmstat_cpu_online() */
+		if (!cpu_online(cpu))
+			disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+	}
+
 	schedule_delayed_work(&shepherd,
 		round_jiffies_relative(sysctl_stat_interval));
 }
-- 
2.43.5



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-07  1:18         ` [PATCH] Simple fix Huacai Chen
@ 2025-01-07  3:35           ` Matthew Wilcox
  2025-01-07  3:58             ` Huacai Chen
  2025-01-07  8:47           ` Lorenzo Stoakes
  1 sibling, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2025-01-07  3:35 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den,
	Sebastian Andrzej Siewior, Lorenzo Stoakes, Mark Rutland,
	Charalampos Mitrodimas

On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> Hi, all, I like simple fixes, so is this acceptable (based on an early
> version from Koichiro Den)?

This is an unacceptable commit message.

>  mm/vmstat.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 0889b75cef14..1badc24a21ff 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
>  {
>  	int cpu;
>  
> -	for_each_possible_cpu(cpu)
> +	for_each_possible_cpu(cpu) {
>  		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>  			vmstat_update);
>  
> +		/* Will be enabled on vmstat_cpu_online() */
> +		if (!cpu_online(cpu))
> +			disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> +	}
> +
>  	schedule_delayed_work(&shepherd,
>  		round_jiffies_relative(sysctl_stat_interval));
>  }
> -- 
> 2.43.5
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-07  3:35           ` Matthew Wilcox
@ 2025-01-07  3:58             ` Huacai Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Huacai Chen @ 2025-01-07  3:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den,
	Sebastian Andrzej Siewior, Lorenzo Stoakes, Mark Rutland,
	Charalampos Mitrodimas

On Tue, Jan 7, 2025 at 11:35 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > version from Koichiro Den)?
>
> This is an unacceptable commit message.
Of course, this is just for discussion, not a proper patch.

Huacai

>
> >  mm/vmstat.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 0889b75cef14..1badc24a21ff 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> >  {
> >       int cpu;
> >
> > -     for_each_possible_cpu(cpu)
> > +     for_each_possible_cpu(cpu) {
> >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> >                       vmstat_update);
> >
> > +             /* Will be enabled on vmstat_cpu_online() */
> > +             if (!cpu_online(cpu))
> > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > +     }
> > +
> >       schedule_delayed_work(&shepherd,
> >               round_jiffies_relative(sysctl_stat_interval));
> >  }
> > --
> > 2.43.5
> >
> >


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-07  1:18         ` [PATCH] Simple fix Huacai Chen
  2025-01-07  3:35           ` Matthew Wilcox
@ 2025-01-07  8:47           ` Lorenzo Stoakes
  2025-01-07 10:29             ` Huacai Chen
  1 sibling, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-01-07  8:47 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> Hi, all, I like simple fixes, so is this acceptable (based on an early
> version from Koichiro Den)?

No not at all. This is bizarre - in the mail you are replying to Koichiro
agrees with me that the approach of his code that you've sent here (I don't
know why) is fundamentally broken and suggest another.

I am at a loss as to why you've sent this? Perhaps a miscommunication
somewhere? :)

In any case, please don't send '[PATCH] xxx' mails that aren't intended to
be patches, a better form of this would be to say 'oh can we just do...'
then to put this code in the mail underneath, without any '[PATCH]' prefix.

But please do review the discussion - the below is insufficient as simple
as it is (sadly) because the boot CPU's delayed work will never be
executed.

I will take a look at Koichiro's new approach as soon as I am able.

Cheers!

> ---
>  mm/vmstat.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 0889b75cef14..1badc24a21ff 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
>  {
>  	int cpu;
>
> -	for_each_possible_cpu(cpu)
> +	for_each_possible_cpu(cpu) {
>  		INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>  			vmstat_update);
>
> +		/* Will be enabled on vmstat_cpu_online() */
> +		if (!cpu_online(cpu))
> +			disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> +	}
> +
>  	schedule_delayed_work(&shepherd,
>  		round_jiffies_relative(sysctl_stat_interval));
>  }
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-07  8:47           ` Lorenzo Stoakes
@ 2025-01-07 10:29             ` Huacai Chen
  2025-01-07 11:00               ` Lorenzo Stoakes
  0 siblings, 1 reply; 22+ messages in thread
From: Huacai Chen @ 2025-01-07 10:29 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

Hi, Lorenzo,

On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > version from Koichiro Den)?
>
> No not at all. This is bizarre - in the mail you are replying to Koichiro
> agrees with me that the approach of his code that you've sent here (I don't
> know why) is fundamentally broken and suggest another.
>
> I am at a loss as to why you've sent this? Perhaps a miscommunication
> somewhere? :)
>
> In any case, please don't send '[PATCH] xxx' mails that aren't intended to
> be patches, a better form of this would be to say 'oh can we just do...'
> then to put this code in the mail underneath, without any '[PATCH]' prefix.
I wasn't in the CC list, and I also found the bug yesterday, so I can
only reply to this email with "git send-email --in-reply-to". This is
the reason why my reply looks so stranne.

>
> But please do review the discussion - the below is insufficient as simple
> as it is (sadly) because the boot CPU's delayed work will never be
> executed.
Koichiro's simple fix causes the boot CPU's delayed work to never be
executed, this is obvious, and of course I have read the early
discussion. And so I improve it, with a "cpu_online()" checking, then
the boot CPU is unaffected.

Huacai

>
> I will take a look at Koichiro's new approach as soon as I am able.
>
> Cheers!
>
> > ---
> >  mm/vmstat.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 0889b75cef14..1badc24a21ff 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> >  {
> >       int cpu;
> >
> > -     for_each_possible_cpu(cpu)
> > +     for_each_possible_cpu(cpu) {
> >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> >                       vmstat_update);
> >
> > +             /* Will be enabled on vmstat_cpu_online() */
> > +             if (!cpu_online(cpu))
> > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > +     }
> > +
> >       schedule_delayed_work(&shepherd,
> >               round_jiffies_relative(sysctl_stat_interval));
> >  }
> > --
> > 2.43.5
> >


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-07 10:29             ` Huacai Chen
@ 2025-01-07 11:00               ` Lorenzo Stoakes
  2025-01-08  2:22                 ` Koichiro Den
  0 siblings, 1 reply; 22+ messages in thread
From: Lorenzo Stoakes @ 2025-01-07 11:00 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote:
> Hi, Lorenzo,
>
> On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > > version from Koichiro Den)?
> >
> > No not at all. This is bizarre - in the mail you are replying to Koichiro
> > agrees with me that the approach of his code that you've sent here (I don't
> > know why) is fundamentally broken and suggest another.
> >
> > I am at a loss as to why you've sent this? Perhaps a miscommunication
> > somewhere? :)
> >
> > In any case, please don't send '[PATCH] xxx' mails that aren't intended to
> > be patches, a better form of this would be to say 'oh can we just do...'
> > then to put this code in the mail underneath, without any '[PATCH]' prefix.
> I wasn't in the CC list, and I also found the bug yesterday, so I can
> only reply to this email with "git send-email --in-reply-to". This is
> the reason why my reply looks so stranne.
>
> >
> > But please do review the discussion - the below is insufficient as simple
> > as it is (sadly) because the boot CPU's delayed work will never be
> > executed.
> Koichiro's simple fix causes the boot CPU's delayed work to never be
> executed, this is obvious, and of course I have read the early
> discussion. And so I improve it, with a "cpu_online()" checking, then
> the boot CPU is unaffected.

With respect Haucai, this is not how you engage in kernel discussion. You
could simply have replied to the mail or given more information, you now
have two people telling you this, please take it on board.

And it caused me to miss your actually quite valuable suggestion so this is
beneficial for all! :)

ANYWAY, moving on to the technical side:

>
> Huacai
>
> >
> > I will take a look at Koichiro's new approach as soon as I am able.
> >
> > Cheers!
> >
> > > ---
> > >  mm/vmstat.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > index 0889b75cef14..1badc24a21ff 100644
> > > --- a/mm/vmstat.c
> > > +++ b/mm/vmstat.c
> > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> > >  {
> > >       int cpu;
> > >
> > > -     for_each_possible_cpu(cpu)
> > > +     for_each_possible_cpu(cpu) {
> > >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > >                       vmstat_update);
> > >
> > > +             /* Will be enabled on vmstat_cpu_online() */
> > > +             if (!cpu_online(cpu))

This might actually be workable as something simpler, on assumption there
can be no race here (I don't think so right?).

Koichiro - could you check this and see whether it resolves the issue and
whether you feel this is sensible?

Might be worth expanding comment to say that we disable on offline, enable
on online and we're just providing symmetry here.

> > > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > +     }
> > > +
> > >       schedule_delayed_work(&shepherd,
> > >               round_jiffies_relative(sysctl_stat_interval));
> > >  }
> > > --
> > > 2.43.5
> > >
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-07 11:00               ` Lorenzo Stoakes
@ 2025-01-08  2:22                 ` Koichiro Den
  2025-01-08  2:26                   ` Koichiro Den
  2025-01-08  2:31                   ` Huacai Chen
  0 siblings, 2 replies; 22+ messages in thread
From: Koichiro Den @ 2025-01-08  2:22 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Huacai Chen, Huacai Chen, Andrew Morton, linux-mm,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote:
> On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote:
> > Hi, Lorenzo,
> >
> > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > > > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > > > version from Koichiro Den)?
> > >
> > > No not at all. This is bizarre - in the mail you are replying to Koichiro
> > > agrees with me that the approach of his code that you've sent here (I don't
> > > know why) is fundamentally broken and suggest another.
> > >
> > > I am at a loss as to why you've sent this? Perhaps a miscommunication
> > > somewhere? :)
> > >
> > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to
> > > be patches, a better form of this would be to say 'oh can we just do...'
> > > then to put this code in the mail underneath, without any '[PATCH]' prefix.
> > I wasn't in the CC list, and I also found the bug yesterday, so I can
> > only reply to this email with "git send-email --in-reply-to". This is
> > the reason why my reply looks so stranne.
> >
> > >
> > > But please do review the discussion - the below is insufficient as simple
> > > as it is (sadly) because the boot CPU's delayed work will never be
> > > executed.
> > Koichiro's simple fix causes the boot CPU's delayed work to never be
> > executed, this is obvious, and of course I have read the early
> > discussion. And so I improve it, with a "cpu_online()" checking, then
> > the boot CPU is unaffected.
> 
> With respect Haucai, this is not how you engage in kernel discussion. You
> could simply have replied to the mail or given more information, you now
> have two people telling you this, please take it on board.
> 
> And it caused me to miss your actually quite valuable suggestion so this is
> beneficial for all! :)
> 
> ANYWAY, moving on to the technical side:
> 
> >
> > Huacai
> >
> > >
> > > I will take a look at Koichiro's new approach as soon as I am able.
> > >
> > > Cheers!
> > >
> > > > ---
> > > >  mm/vmstat.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > index 0889b75cef14..1badc24a21ff 100644
> > > > --- a/mm/vmstat.c
> > > > +++ b/mm/vmstat.c
> > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> > > >  {
> > > >       int cpu;
> > > >
> > > > -     for_each_possible_cpu(cpu)
> > > > +     for_each_possible_cpu(cpu) {
> > > >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > > >                       vmstat_update);
> > > >
> > > > +             /* Will be enabled on vmstat_cpu_online() */
> > > > +             if (!cpu_online(cpu))
> 
> This might actually be workable as something simpler, on assumption there

Sorry about late response. And thank you very much, Huacai. Your suggestion
looks great. Much simpler and intuitive.

> can be no race here (I don't think so right?).

IIUC, there is no race since smp_init() always runs before init_mm_internals().
At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary
CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks
simple and the best.

> 
> Koichiro - could you check this and see whether it resolves the issue and
> whether you feel this is sensible?

I tested it and it seems to work well both on booting and cpuhp events.

(By the way, in my previous email comment [1], I forgot to mention that I also
applied other unmerged patches, [2] and [3], just to be able to run such random
transisioning test. This time here, I just tested by switching CPUHP_ONLINE <->
CPUHP_OFFLINE only.)

[1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55
[2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/
[3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/

> 
> Might be worth expanding comment to say that we disable on offline, enable
> on online and we're just providing symmetry here.

Sounds good. Like this?

    -       for_each_possible_cpu(cpu)
    +       for_each_possible_cpu(cpu) {
                    INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
                            vmstat_update);
    
    +               /*
    +                * For secondary CPUs during CPU hotplug scenarios,
    +                * vmstat_cpu_online() will enable the work.
    +                * mm/vmstat:online enables and disables vmstat_work
    +                * symmetrically during CPU hotplug events.
    +                */
    +               if (!cpu_online(cpu))
    +                       disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
    +       }


Before submitting a revised patch, I'd like to confirm:
Huacai, would you be comfortable with me adding your Signed-off-by to the
commit, or would you prefer a Suggested-by tag instead?

Thanks.

-Koichiro Den

> 
> > > > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > +     }
> > > > +
> > > >       schedule_delayed_work(&shepherd,
> > > >               round_jiffies_relative(sysctl_stat_interval));
> > > >  }
> > > > --
> > > > 2.43.5
> > > >
> >
> 
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-08  2:22                 ` Koichiro Den
@ 2025-01-08  2:26                   ` Koichiro Den
  2025-01-08  2:31                   ` Huacai Chen
  1 sibling, 0 replies; 22+ messages in thread
From: Koichiro Den @ 2025-01-08  2:26 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Huacai Chen, Huacai Chen, Andrew Morton, linux-mm,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

On Wed, Jan 08, 2025 at 11:22:42AM +0900, Koichiro Den wrote:
> On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote:
> > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote:
> > > Hi, Lorenzo,
> > >
> > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > >
> > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > > > > version from Koichiro Den)?
> > > >
> > > > No not at all. This is bizarre - in the mail you are replying to Koichiro
> > > > agrees with me that the approach of his code that you've sent here (I don't
> > > > know why) is fundamentally broken and suggest another.
> > > >
> > > > I am at a loss as to why you've sent this? Perhaps a miscommunication
> > > > somewhere? :)
> > > >
> > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to
> > > > be patches, a better form of this would be to say 'oh can we just do...'
> > > > then to put this code in the mail underneath, without any '[PATCH]' prefix.
> > > I wasn't in the CC list, and I also found the bug yesterday, so I can
> > > only reply to this email with "git send-email --in-reply-to". This is
> > > the reason why my reply looks so stranne.
> > >
> > > >
> > > > But please do review the discussion - the below is insufficient as simple
> > > > as it is (sadly) because the boot CPU's delayed work will never be
> > > > executed.
> > > Koichiro's simple fix causes the boot CPU's delayed work to never be
> > > executed, this is obvious, and of course I have read the early
> > > discussion. And so I improve it, with a "cpu_online()" checking, then
> > > the boot CPU is unaffected.
> > 
> > With respect Haucai, this is not how you engage in kernel discussion. You
> > could simply have replied to the mail or given more information, you now
> > have two people telling you this, please take it on board.
> > 
> > And it caused me to miss your actually quite valuable suggestion so this is
> > beneficial for all! :)
> > 
> > ANYWAY, moving on to the technical side:
> > 
> > >
> > > Huacai
> > >
> > > >
> > > > I will take a look at Koichiro's new approach as soon as I am able.
> > > >
> > > > Cheers!
> > > >
> > > > > ---
> > > > >  mm/vmstat.c | 7 ++++++-
> > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > > index 0889b75cef14..1badc24a21ff 100644
> > > > > --- a/mm/vmstat.c
> > > > > +++ b/mm/vmstat.c
> > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> > > > >  {
> > > > >       int cpu;
> > > > >
> > > > > -     for_each_possible_cpu(cpu)
> > > > > +     for_each_possible_cpu(cpu) {
> > > > >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > > > >                       vmstat_update);
> > > > >
> > > > > +             /* Will be enabled on vmstat_cpu_online() */
> > > > > +             if (!cpu_online(cpu))
> > 
> > This might actually be workable as something simpler, on assumption there
> 
> Sorry about late response. And thank you very much, Huacai. Your suggestion
> looks great. Much simpler and intuitive.
> 
> > can be no race here (I don't think so right?).
> 
> IIUC, there is no race since smp_init() always runs before init_mm_internals().

Oops, critical typo. s/before/after/.

> At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary
> CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks
> simple and the best.
> 
> > 
> > Koichiro - could you check this and see whether it resolves the issue and
> > whether you feel this is sensible?
> 
> I tested it and it seems to work well both on booting and cpuhp events.
> 
> (By the way, in my previous email comment [1], I forgot to mention that I also
> applied other unmerged patches, [2] and [3], just to be able to run such random
> transisioning test. This time here, I just tested by switching CPUHP_ONLINE <->
> CPUHP_OFFLINE only.)
> 
> [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55
> [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/
> [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/
> 
> > 
> > Might be worth expanding comment to say that we disable on offline, enable
> > on online and we're just providing symmetry here.
> 
> Sounds good. Like this?
> 
>     -       for_each_possible_cpu(cpu)
>     +       for_each_possible_cpu(cpu) {
>                     INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                             vmstat_update);
>     
>     +               /*
>     +                * For secondary CPUs during CPU hotplug scenarios,
>     +                * vmstat_cpu_online() will enable the work.
>     +                * mm/vmstat:online enables and disables vmstat_work
>     +                * symmetrically during CPU hotplug events.
>     +                */
>     +               if (!cpu_online(cpu))
>     +                       disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>     +       }
> 
> 
> Before submitting a revised patch, I'd like to confirm:
> Huacai, would you be comfortable with me adding your Signed-off-by to the
> commit, or would you prefer a Suggested-by tag instead?
> 
> Thanks.
> 
> -Koichiro Den
> 
> > 
> > > > > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > > +     }
> > > > > +
> > > > >       schedule_delayed_work(&shepherd,
> > > > >               round_jiffies_relative(sysctl_stat_interval));
> > > > >  }
> > > > > --
> > > > > 2.43.5
> > > > >
> > >
> > 
> > Cheers, Lorenzo
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-08  2:22                 ` Koichiro Den
  2025-01-08  2:26                   ` Koichiro Den
@ 2025-01-08  2:31                   ` Huacai Chen
  2025-01-08  3:41                     ` Koichiro Den
  1 sibling, 1 reply; 22+ messages in thread
From: Huacai Chen @ 2025-01-08  2:31 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Lorenzo Stoakes, Huacai Chen, Andrew Morton, linux-mm,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

Hi, Koichiro,

On Wed, Jan 8, 2025 at 10:22 AM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote:
> > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote:
> > > Hi, Lorenzo,
> > >
> > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > >
> > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > > > > version from Koichiro Den)?
> > > >
> > > > No not at all. This is bizarre - in the mail you are replying to Koichiro
> > > > agrees with me that the approach of his code that you've sent here (I don't
> > > > know why) is fundamentally broken and suggest another.
> > > >
> > > > I am at a loss as to why you've sent this? Perhaps a miscommunication
> > > > somewhere? :)
> > > >
> > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to
> > > > be patches, a better form of this would be to say 'oh can we just do...'
> > > > then to put this code in the mail underneath, without any '[PATCH]' prefix.
> > > I wasn't in the CC list, and I also found the bug yesterday, so I can
> > > only reply to this email with "git send-email --in-reply-to". This is
> > > the reason why my reply looks so stranne.
> > >
> > > >
> > > > But please do review the discussion - the below is insufficient as simple
> > > > as it is (sadly) because the boot CPU's delayed work will never be
> > > > executed.
> > > Koichiro's simple fix causes the boot CPU's delayed work to never be
> > > executed, this is obvious, and of course I have read the early
> > > discussion. And so I improve it, with a "cpu_online()" checking, then
> > > the boot CPU is unaffected.
> >
> > With respect Haucai, this is not how you engage in kernel discussion. You
> > could simply have replied to the mail or given more information, you now
> > have two people telling you this, please take it on board.
> >
> > And it caused me to miss your actually quite valuable suggestion so this is
> > beneficial for all! :)
> >
> > ANYWAY, moving on to the technical side:
> >
> > >
> > > Huacai
> > >
> > > >
> > > > I will take a look at Koichiro's new approach as soon as I am able.
> > > >
> > > > Cheers!
> > > >
> > > > > ---
> > > > >  mm/vmstat.c | 7 ++++++-
> > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > > index 0889b75cef14..1badc24a21ff 100644
> > > > > --- a/mm/vmstat.c
> > > > > +++ b/mm/vmstat.c
> > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> > > > >  {
> > > > >       int cpu;
> > > > >
> > > > > -     for_each_possible_cpu(cpu)
> > > > > +     for_each_possible_cpu(cpu) {
> > > > >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > > > >                       vmstat_update);
> > > > >
> > > > > +             /* Will be enabled on vmstat_cpu_online() */
> > > > > +             if (!cpu_online(cpu))
> >
> > This might actually be workable as something simpler, on assumption there
>
> Sorry about late response. And thank you very much, Huacai. Your suggestion
> looks great. Much simpler and intuitive.
>
> > can be no race here (I don't think so right?).
>
> IIUC, there is no race since smp_init() always runs before init_mm_internals().
> At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary
> CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks
> simple and the best.
>
> >
> > Koichiro - could you check this and see whether it resolves the issue and
> > whether you feel this is sensible?
>
> I tested it and it seems to work well both on booting and cpuhp events.
>
> (By the way, in my previous email comment [1], I forgot to mention that I also
> applied other unmerged patches, [2] and [3], just to be able to run such random
> transisioning test. This time here, I just tested by switching CPUHP_ONLINE <->
> CPUHP_OFFLINE only.)
>
> [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55
> [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/
> [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/
>
> >
> > Might be worth expanding comment to say that we disable on offline, enable
> > on online and we're just providing symmetry here.
>
> Sounds good. Like this?
>
>     -       for_each_possible_cpu(cpu)
>     +       for_each_possible_cpu(cpu) {
>                     INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
>                             vmstat_update);
>
>     +               /*
>     +                * For secondary CPUs during CPU hotplug scenarios,
>     +                * vmstat_cpu_online() will enable the work.
>     +                * mm/vmstat:online enables and disables vmstat_work
>     +                * symmetrically during CPU hotplug events.
>     +                */
>     +               if (!cpu_online(cpu))
>     +                       disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
>     +       }
>
>
> Before submitting a revised patch, I'd like to confirm:
> Huacai, would you be comfortable with me adding your Signed-off-by to the
> commit, or would you prefer a Suggested-by tag instead?
Since the original V2 patch has been reverted in mainline, so I think
you will send a V3 which integrate my suggestion, right?

Both Signed-off-by and Suggested-by is OK for me.

Huacai

>
> Thanks.
>
> -Koichiro Den
>
> >
> > > > > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > > +     }
> > > > > +
> > > > >       schedule_delayed_work(&shepherd,
> > > > >               round_jiffies_relative(sysctl_stat_interval));
> > > > >  }
> > > > > --
> > > > > 2.43.5
> > > > >
> > >
> >
> > Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-08  2:31                   ` Huacai Chen
@ 2025-01-08  3:41                     ` Koichiro Den
  2025-01-08  3:57                       ` Huacai Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Koichiro Den @ 2025-01-08  3:41 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Lorenzo Stoakes, Huacai Chen, Andrew Morton, linux-mm,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

On Wed, Jan 08, 2025 at 10:31:40AM +0800, Huacai Chen wrote:
> Hi, Koichiro,
> 
> On Wed, Jan 8, 2025 at 10:22 AM Koichiro Den <koichiro.den@canonical.com> wrote:
> >
> > On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote:
> > > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote:
> > > > Hi, Lorenzo,
> > > >
> > > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes
> > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > >
> > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > > > > > version from Koichiro Den)?
> > > > >
> > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro
> > > > > agrees with me that the approach of his code that you've sent here (I don't
> > > > > know why) is fundamentally broken and suggest another.
> > > > >
> > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication
> > > > > somewhere? :)
> > > > >
> > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to
> > > > > be patches, a better form of this would be to say 'oh can we just do...'
> > > > > then to put this code in the mail underneath, without any '[PATCH]' prefix.
> > > > I wasn't in the CC list, and I also found the bug yesterday, so I can
> > > > only reply to this email with "git send-email --in-reply-to". This is
> > > > the reason why my reply looks so stranne.
> > > >
> > > > >
> > > > > But please do review the discussion - the below is insufficient as simple
> > > > > as it is (sadly) because the boot CPU's delayed work will never be
> > > > > executed.
> > > > Koichiro's simple fix causes the boot CPU's delayed work to never be
> > > > executed, this is obvious, and of course I have read the early
> > > > discussion. And so I improve it, with a "cpu_online()" checking, then
> > > > the boot CPU is unaffected.
> > >
> > > With respect Haucai, this is not how you engage in kernel discussion. You
> > > could simply have replied to the mail or given more information, you now
> > > have two people telling you this, please take it on board.
> > >
> > > And it caused me to miss your actually quite valuable suggestion so this is
> > > beneficial for all! :)
> > >
> > > ANYWAY, moving on to the technical side:
> > >
> > > >
> > > > Huacai
> > > >
> > > > >
> > > > > I will take a look at Koichiro's new approach as soon as I am able.
> > > > >
> > > > > Cheers!
> > > > >
> > > > > > ---
> > > > > >  mm/vmstat.c | 7 ++++++-
> > > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > > > index 0889b75cef14..1badc24a21ff 100644
> > > > > > --- a/mm/vmstat.c
> > > > > > +++ b/mm/vmstat.c
> > > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> > > > > >  {
> > > > > >       int cpu;
> > > > > >
> > > > > > -     for_each_possible_cpu(cpu)
> > > > > > +     for_each_possible_cpu(cpu) {
> > > > > >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > > > > >                       vmstat_update);
> > > > > >
> > > > > > +             /* Will be enabled on vmstat_cpu_online() */
> > > > > > +             if (!cpu_online(cpu))
> > >
> > > This might actually be workable as something simpler, on assumption there
> >
> > Sorry about late response. And thank you very much, Huacai. Your suggestion
> > looks great. Much simpler and intuitive.
> >
> > > can be no race here (I don't think so right?).
> >
> > IIUC, there is no race since smp_init() always runs before init_mm_internals().
> > At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary
> > CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks
> > simple and the best.
> >
> > >
> > > Koichiro - could you check this and see whether it resolves the issue and
> > > whether you feel this is sensible?
> >
> > I tested it and it seems to work well both on booting and cpuhp events.
> >
> > (By the way, in my previous email comment [1], I forgot to mention that I also
> > applied other unmerged patches, [2] and [3], just to be able to run such random
> > transisioning test. This time here, I just tested by switching CPUHP_ONLINE <->
> > CPUHP_OFFLINE only.)
> >
> > [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55
> > [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/
> > [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/
> >
> > >
> > > Might be worth expanding comment to say that we disable on offline, enable
> > > on online and we're just providing symmetry here.
> >
> > Sounds good. Like this?
> >
> >     -       for_each_possible_cpu(cpu)
> >     +       for_each_possible_cpu(cpu) {
> >                     INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> >                             vmstat_update);
> >
> >     +               /*
> >     +                * For secondary CPUs during CPU hotplug scenarios,
> >     +                * vmstat_cpu_online() will enable the work.
> >     +                * mm/vmstat:online enables and disables vmstat_work
> >     +                * symmetrically during CPU hotplug events.
> >     +                */
> >     +               if (!cpu_online(cpu))
> >     +                       disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> >     +       }
> >
> >
> > Before submitting a revised patch, I'd like to confirm:
> > Huacai, would you be comfortable with me adding your Signed-off-by to the
> > commit, or would you prefer a Suggested-by tag instead?
> Since the original V2 patch has been reverted in mainline, so I think
> you will send a V3 which integrate my suggestion, right?

I was considering sending a separate patch with a title distinct from the
original, while referencing both the original (v2) patch and the revert
commit. However, I'm fine with either approach.

If there’s any documentation that mentions a recommended method in this
kind of situation, please let me know. Personally, I’m not a fan of
multiple commits with identical titles appearing in a branch.

> 
> Both Signed-off-by and Suggested-by is OK for me.

Alright, thanks!

-Koichiro Den

> 
> Huacai
> 
> >
> > Thanks.
> >
> > -Koichiro Den
> >
> > >
> > > > > > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > > > +     }
> > > > > > +
> > > > > >       schedule_delayed_work(&shepherd,
> > > > > >               round_jiffies_relative(sysctl_stat_interval));
> > > > > >  }
> > > > > > --
> > > > > > 2.43.5
> > > > > >
> > > >
> > >
> > > Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Simple fix
  2025-01-08  3:41                     ` Koichiro Den
@ 2025-01-08  3:57                       ` Huacai Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Huacai Chen @ 2025-01-08  3:57 UTC (permalink / raw)
  To: Koichiro Den
  Cc: Lorenzo Stoakes, Huacai Chen, Andrew Morton, linux-mm,
	Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas

On Wed, Jan 8, 2025 at 11:41 AM Koichiro Den <koichiro.den@canonical.com> wrote:
>
> On Wed, Jan 08, 2025 at 10:31:40AM +0800, Huacai Chen wrote:
> > Hi, Koichiro,
> >
> > On Wed, Jan 8, 2025 at 10:22 AM Koichiro Den <koichiro.den@canonical.com> wrote:
> > >
> > > On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote:
> > > > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote:
> > > > > Hi, Lorenzo,
> > > > >
> > > > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes
> > > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > > >
> > > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote:
> > > > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early
> > > > > > > version from Koichiro Den)?
> > > > > >
> > > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro
> > > > > > agrees with me that the approach of his code that you've sent here (I don't
> > > > > > know why) is fundamentally broken and suggest another.
> > > > > >
> > > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication
> > > > > > somewhere? :)
> > > > > >
> > > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to
> > > > > > be patches, a better form of this would be to say 'oh can we just do...'
> > > > > > then to put this code in the mail underneath, without any '[PATCH]' prefix.
> > > > > I wasn't in the CC list, and I also found the bug yesterday, so I can
> > > > > only reply to this email with "git send-email --in-reply-to". This is
> > > > > the reason why my reply looks so stranne.
> > > > >
> > > > > >
> > > > > > But please do review the discussion - the below is insufficient as simple
> > > > > > as it is (sadly) because the boot CPU's delayed work will never be
> > > > > > executed.
> > > > > Koichiro's simple fix causes the boot CPU's delayed work to never be
> > > > > executed, this is obvious, and of course I have read the early
> > > > > discussion. And so I improve it, with a "cpu_online()" checking, then
> > > > > the boot CPU is unaffected.
> > > >
> > > > With respect Haucai, this is not how you engage in kernel discussion. You
> > > > could simply have replied to the mail or given more information, you now
> > > > have two people telling you this, please take it on board.
> > > >
> > > > And it caused me to miss your actually quite valuable suggestion so this is
> > > > beneficial for all! :)
> > > >
> > > > ANYWAY, moving on to the technical side:
> > > >
> > > > >
> > > > > Huacai
> > > > >
> > > > > >
> > > > > > I will take a look at Koichiro's new approach as soon as I am able.
> > > > > >
> > > > > > Cheers!
> > > > > >
> > > > > > > ---
> > > > > > >  mm/vmstat.c | 7 ++++++-
> > > > > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > > > > > > index 0889b75cef14..1badc24a21ff 100644
> > > > > > > --- a/mm/vmstat.c
> > > > > > > +++ b/mm/vmstat.c
> > > > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void)
> > > > > > >  {
> > > > > > >       int cpu;
> > > > > > >
> > > > > > > -     for_each_possible_cpu(cpu)
> > > > > > > +     for_each_possible_cpu(cpu) {
> > > > > > >               INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > > > > > >                       vmstat_update);
> > > > > > >
> > > > > > > +             /* Will be enabled on vmstat_cpu_online() */
> > > > > > > +             if (!cpu_online(cpu))
> > > >
> > > > This might actually be workable as something simpler, on assumption there
> > >
> > > Sorry about late response. And thank you very much, Huacai. Your suggestion
> > > looks great. Much simpler and intuitive.
> > >
> > > > can be no race here (I don't think so right?).
> > >
> > > IIUC, there is no race since smp_init() always runs before init_mm_internals().
> > > At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary
> > > CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks
> > > simple and the best.
> > >
> > > >
> > > > Koichiro - could you check this and see whether it resolves the issue and
> > > > whether you feel this is sensible?
> > >
> > > I tested it and it seems to work well both on booting and cpuhp events.
> > >
> > > (By the way, in my previous email comment [1], I forgot to mention that I also
> > > applied other unmerged patches, [2] and [3], just to be able to run such random
> > > transisioning test. This time here, I just tested by switching CPUHP_ONLINE <->
> > > CPUHP_OFFLINE only.)
> > >
> > > [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55
> > > [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/
> > > [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/
> > >
> > > >
> > > > Might be worth expanding comment to say that we disable on offline, enable
> > > > on online and we're just providing symmetry here.
> > >
> > > Sounds good. Like this?
> > >
> > >     -       for_each_possible_cpu(cpu)
> > >     +       for_each_possible_cpu(cpu) {
> > >                     INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu),
> > >                             vmstat_update);
> > >
> > >     +               /*
> > >     +                * For secondary CPUs during CPU hotplug scenarios,
> > >     +                * vmstat_cpu_online() will enable the work.
> > >     +                * mm/vmstat:online enables and disables vmstat_work
> > >     +                * symmetrically during CPU hotplug events.
> > >     +                */
> > >     +               if (!cpu_online(cpu))
> > >     +                       disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > >     +       }
> > >
> > >
> > > Before submitting a revised patch, I'd like to confirm:
> > > Huacai, would you be comfortable with me adding your Signed-off-by to the
> > > commit, or would you prefer a Suggested-by tag instead?
> > Since the original V2 patch has been reverted in mainline, so I think
> > you will send a V3 which integrate my suggestion, right?
>
> I was considering sending a separate patch with a title distinct from the
> original, while referencing both the original (v2) patch and the revert
> commit. However, I'm fine with either approach.
Don't separate, just sending a single V3 patch is OK for me.

Huacai

>
> If there’s any documentation that mentions a recommended method in this
> kind of situation, please let me know. Personally, I’m not a fan of
> multiple commits with identical titles appearing in a branch.
>
> >
> > Both Signed-off-by and Suggested-by is OK for me.
>
> Alright, thanks!
>
> -Koichiro Den
>
> >
> > Huacai
> >
> > >
> > > Thanks.
> > >
> > > -Koichiro Den
> > >
> > > >
> > > > > > > +                     disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
> > > > > > > +     }
> > > > > > > +
> > > > > > >       schedule_delayed_work(&shepherd,
> > > > > > >               round_jiffies_relative(sysctl_stat_interval));
> > > > > > >  }
> > > > > > > --
> > > > > > > 2.43.5
> > > > > > >
> > > > >
> > > >
> > > > Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-01-08  3:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-21  3:33 [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() Koichiro Den
2025-01-03 23:33 ` Lorenzo Stoakes
2025-01-04  4:00   ` Koichiro Den
2025-01-06  2:18     ` Charalampos Mitrodimas
2025-01-06 10:04     ` Mark Rutland
2025-01-06 10:18     ` Geert Uytterhoeven
2025-01-06 10:52     ` Lorenzo Stoakes
2025-01-06 12:53       ` Charalampos Mitrodimas
2025-01-06 12:58         ` Lorenzo Stoakes
2025-01-06 13:03       ` Koichiro Den
2025-01-06 13:53         ` Koichiro Den
2025-01-07  1:18         ` [PATCH] Simple fix Huacai Chen
2025-01-07  3:35           ` Matthew Wilcox
2025-01-07  3:58             ` Huacai Chen
2025-01-07  8:47           ` Lorenzo Stoakes
2025-01-07 10:29             ` Huacai Chen
2025-01-07 11:00               ` Lorenzo Stoakes
2025-01-08  2:22                 ` Koichiro Den
2025-01-08  2:26                   ` Koichiro Den
2025-01-08  2:31                   ` Huacai Chen
2025-01-08  3:41                     ` Koichiro Den
2025-01-08  3:57                       ` Huacai Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox