* [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep()
@ 2024-12-21 3:33 Koichiro Den
2025-01-03 23:33 ` Lorenzo Stoakes
0 siblings, 1 reply; 22+ messages in thread
From: Koichiro Den @ 2024-12-21 3:33 UTC (permalink / raw)
To: linux-mm; +Cc: akpm, linux-kernel
Even after mm/vmstat:online teardown, shepherd may still queue work for
the dying cpu until the cpu is removed from online mask. While it's
quite rare, this means that after unbind_workers() unbinds a per-cpu
kworker, it potentially runs vmstat_update for the dying CPU on an
irrelevant cpu before entering atomic AP states.
When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the
backtrace.
BUG: using smp_processor_id() in preemptible [00000000] code: \
kworker/7:3/1702
caller is refresh_cpu_vm_stats+0x235/0x5f0
CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G
Tainted: [N]=TEST
Workqueue: mm_percpu_wq vmstat_update
Call Trace:
<TASK>
dump_stack_lvl+0x8d/0xb0
check_preemption_disabled+0xce/0xe0
refresh_cpu_vm_stats+0x235/0x5f0
vmstat_update+0x17/0xa0
process_one_work+0x869/0x1aa0
worker_thread+0x5e5/0x1100
kthread+0x29e/0x380
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30
</TASK>
So, for mm/vmstat:online, disable vmstat_work reliably on teardown and
symmetrically enable it on startup.
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
---
v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/
---
mm/vmstat.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4d016314a56c..0889b75cef14 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu)
if (!node_state(cpu_to_node(cpu), N_CPU)) {
node_set_state(cpu_to_node(cpu), N_CPU);
}
+ enable_delayed_work(&per_cpu(vmstat_work, cpu));
return 0;
}
static int vmstat_cpu_down_prep(unsigned int cpu)
{
- cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu));
+ disable_delayed_work_sync(&per_cpu(vmstat_work, cpu));
return 0;
}
--
2.43.0
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2024-12-21 3:33 [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() Koichiro Den @ 2025-01-03 23:33 ` Lorenzo Stoakes 2025-01-04 4:00 ` Koichiro Den 0 siblings, 1 reply; 22+ messages in thread From: Lorenzo Stoakes @ 2025-01-03 23:33 UTC (permalink / raw) To: Koichiro Den; +Cc: linux-mm, akpm, linux-kernel On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: > Even after mm/vmstat:online teardown, shepherd may still queue work for > the dying cpu until the cpu is removed from online mask. While it's > quite rare, this means that after unbind_workers() unbinds a per-cpu > kworker, it potentially runs vmstat_update for the dying CPU on an > irrelevant cpu before entering atomic AP states. > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the > backtrace. > > BUG: using smp_processor_id() in preemptible [00000000] code: \ > kworker/7:3/1702 > caller is refresh_cpu_vm_stats+0x235/0x5f0 > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G > Tainted: [N]=TEST > Workqueue: mm_percpu_wq vmstat_update > Call Trace: > <TASK> > dump_stack_lvl+0x8d/0xb0 > check_preemption_disabled+0xce/0xe0 > refresh_cpu_vm_stats+0x235/0x5f0 > vmstat_update+0x17/0xa0 > process_one_work+0x869/0x1aa0 > worker_thread+0x5e5/0x1100 > kthread+0x29e/0x380 > ret_from_fork+0x2d/0x70 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and > symmetrically enable it on startup. > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> Hi, I observed a warning in my qemu and real hardware, which I bisected to this commit: [ 0.087733] ------------[ cut here ]------------ [ 0.087733] workqueue: work disable count underflowed [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 This is: static void work_offqd_enable(struct work_offq_data *offqd) { if (likely(offqd->disable > 0)) offqd->disable--; else WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line } So (based on this code) presumably an enable is only required if previously disabled, and this code is being called on startup unconditionally without the work having been disabled previously? I'm not hugely familiar with delayed workqueue implementation details. [ 0.087733] Modules linked in: [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 [ 0.087733] PKRU: 55555554 [ 0.087733] Call Trace: [ 0.087733] <TASK> [ 0.087733] ? enable_work+0xb5/0xc0 [ 0.087733] ? __warn.cold+0x93/0xf2 [ 0.087733] ? enable_work+0xb5/0xc0 [ 0.087733] ? report_bug+0xff/0x140 [ 0.087733] ? handle_bug+0x54/0x90 [ 0.087733] ? exc_invalid_op+0x17/0x70 [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 [ 0.087733] ? enable_work+0xb5/0xc0 [ 0.087733] vmstat_cpu_online+0x5c/0x70 [ 0.087733] cpuhp_invoke_callback+0x133/0x440 [ 0.087733] cpuhp_thread_fun+0x95/0x150 [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 0.087735] kthread+0xc8/0xf0 [ 0.087737] ? __pfx_kthread+0x10/0x10 [ 0.087738] ret_from_fork+0x2c/0x50 [ 0.087739] ? __pfx_kthread+0x10/0x10 [ 0.087740] ret_from_fork_asm+0x1a/0x30 [ 0.087742] </TASK> [ 0.087742] ---[ end trace 0000000000000000 ]--- > --- > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ > --- > mm/vmstat.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 4d016314a56c..0889b75cef14 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) > if (!node_state(cpu_to_node(cpu), N_CPU)) { > node_set_state(cpu_to_node(cpu), N_CPU); > } > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); Probably needs to be 'if disabled' here, as this is invoked on normal startup when the work won't have been disabled? Had a brief look at code and couldn't see how that could be done however... and one would need to be careful about races... Tricky! > > return 0; > } > > static int vmstat_cpu_down_prep(unsigned int cpu) > { > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > return 0; > } > > -- > 2.43.0 > > Let me know if you need any more details, .config etc. I noticed this warning on a real box too (in both cases running akpm's mm-unstable branch), FWIW. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-03 23:33 ` Lorenzo Stoakes @ 2025-01-04 4:00 ` Koichiro Den 2025-01-06 2:18 ` Charalampos Mitrodimas ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Koichiro Den @ 2025-01-04 4:00 UTC (permalink / raw) To: Lorenzo Stoakes; +Cc: linux-mm, akpm, linux-kernel On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: > > Even after mm/vmstat:online teardown, shepherd may still queue work for > > the dying cpu until the cpu is removed from online mask. While it's > > quite rare, this means that after unbind_workers() unbinds a per-cpu > > kworker, it potentially runs vmstat_update for the dying CPU on an > > irrelevant cpu before entering atomic AP states. > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the > > backtrace. > > > > BUG: using smp_processor_id() in preemptible [00000000] code: \ > > kworker/7:3/1702 > > caller is refresh_cpu_vm_stats+0x235/0x5f0 > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G > > Tainted: [N]=TEST > > Workqueue: mm_percpu_wq vmstat_update > > Call Trace: > > <TASK> > > dump_stack_lvl+0x8d/0xb0 > > check_preemption_disabled+0xce/0xe0 > > refresh_cpu_vm_stats+0x235/0x5f0 > > vmstat_update+0x17/0xa0 > > process_one_work+0x869/0x1aa0 > > worker_thread+0x5e5/0x1100 > > kthread+0x29e/0x380 > > ret_from_fork+0x2d/0x70 > > ret_from_fork_asm+0x1a/0x30 > > </TASK> > > > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and > > symmetrically enable it on startup. > > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> > > Hi, > > I observed a warning in my qemu and real hardware, which I bisected to this commit: > > [ 0.087733] ------------[ cut here ]------------ > [ 0.087733] workqueue: work disable count underflowed > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 > > This is: > > static void work_offqd_enable(struct work_offq_data *offqd) > { > if (likely(offqd->disable > 0)) > offqd->disable--; > else > WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line > } > > So (based on this code) presumably an enable is only required if previously > disabled, and this code is being called on startup unconditionally without > the work having been disabled previously? I'm not hugely familiar with > delayed workqueue implementation details. > > [ 0.087733] Modules linked in: > [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 > [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 > [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 > [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 > [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 > [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 > [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 > [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 > [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 > [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 > [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 > [ 0.087733] PKRU: 55555554 > [ 0.087733] Call Trace: > [ 0.087733] <TASK> > [ 0.087733] ? enable_work+0xb5/0xc0 > [ 0.087733] ? __warn.cold+0x93/0xf2 > [ 0.087733] ? enable_work+0xb5/0xc0 > [ 0.087733] ? report_bug+0xff/0x140 > [ 0.087733] ? handle_bug+0x54/0x90 > [ 0.087733] ? exc_invalid_op+0x17/0x70 > [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 > [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 > [ 0.087733] ? enable_work+0xb5/0xc0 > [ 0.087733] vmstat_cpu_online+0x5c/0x70 > [ 0.087733] cpuhp_invoke_callback+0x133/0x440 > [ 0.087733] cpuhp_thread_fun+0x95/0x150 > [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 > [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 > [ 0.087735] kthread+0xc8/0xf0 > [ 0.087737] ? __pfx_kthread+0x10/0x10 > [ 0.087738] ret_from_fork+0x2c/0x50 > [ 0.087739] ? __pfx_kthread+0x10/0x10 > [ 0.087740] ret_from_fork_asm+0x1a/0x30 > [ 0.087742] </TASK> > [ 0.087742] ---[ end trace 0000000000000000 ]--- > > > > --- > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ > > --- > > mm/vmstat.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 4d016314a56c..0889b75cef14 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) > > if (!node_state(cpu_to_node(cpu), N_CPU)) { > > node_set_state(cpu_to_node(cpu), N_CPU); > > } > > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); > > Probably needs to be 'if disabled' here, as this is invoked on normal > startup when the work won't have been disabled? > > Had a brief look at code and couldn't see how that could be done > however... and one would need to be careful about races... Tricky! > > > > > return 0; > > } > > > > static int vmstat_cpu_down_prep(unsigned int cpu) > > { > > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > return 0; > > } > > > > -- > > 2.43.0 > > > > > > Let me know if you need any more details, .config etc. > > I noticed this warning on a real box too (in both cases running akpm's > mm-unstable branch), FWIW. Thank you for the report. I was able to reproduce the warning and now wonder how I missed it.. My oversight, apologies. In my current view, the simplest solution would be to make sure a local vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even during boot-up. The following patch suppresses the warning: diff --git a/mm/vmstat.c b/mm/vmstat.c index 0889b75cef14..19ceed5d34bf 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) { int cpu; - for_each_possible_cpu(cpu) + for_each_possible_cpu(cpu) { INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), vmstat_update); + /* will be enabled on vmstat_cpu_online */ + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); + } + schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); } If you think of a better solution later, please let me know. Otherwise, I'll submit a follow-up fix patch with the above diff. Thanks. -Koichiro ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-04 4:00 ` Koichiro Den @ 2025-01-06 2:18 ` Charalampos Mitrodimas 2025-01-06 10:04 ` Mark Rutland ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Charalampos Mitrodimas @ 2025-01-06 2:18 UTC (permalink / raw) To: Koichiro Den; +Cc: Lorenzo Stoakes, linux-mm, akpm, linux-kernel Koichiro Den <koichiro.den@canonical.com> writes: > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: >> On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: >> > Even after mm/vmstat:online teardown, shepherd may still queue work for >> > the dying cpu until the cpu is removed from online mask. While it's >> > quite rare, this means that after unbind_workers() unbinds a per-cpu >> > kworker, it potentially runs vmstat_update for the dying CPU on an >> > irrelevant cpu before entering atomic AP states. >> > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the >> > backtrace. >> > >> > BUG: using smp_processor_id() in preemptible [00000000] code: \ >> > kworker/7:3/1702 >> > caller is refresh_cpu_vm_stats+0x235/0x5f0 >> > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G >> > Tainted: [N]=TEST >> > Workqueue: mm_percpu_wq vmstat_update >> > Call Trace: >> > <TASK> >> > dump_stack_lvl+0x8d/0xb0 >> > check_preemption_disabled+0xce/0xe0 >> > refresh_cpu_vm_stats+0x235/0x5f0 >> > vmstat_update+0x17/0xa0 >> > process_one_work+0x869/0x1aa0 >> > worker_thread+0x5e5/0x1100 >> > kthread+0x29e/0x380 >> > ret_from_fork+0x2d/0x70 >> > ret_from_fork_asm+0x1a/0x30 >> > </TASK> >> > >> > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and >> > symmetrically enable it on startup. >> > >> > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> >> >> Hi, >> >> I observed a warning in my qemu and real hardware, which I bisected to this commit: >> >> [ 0.087733] ------------[ cut here ]------------ >> [ 0.087733] workqueue: work disable count underflowed >> [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 I also encountered this in my QEMU Debian installation. >> >> This is: >> >> static void work_offqd_enable(struct work_offq_data *offqd) >> { >> if (likely(offqd->disable > 0)) >> offqd->disable--; >> else >> WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line >> } >> >> So (based on this code) presumably an enable is only required if previously >> disabled, and this code is being called on startup unconditionally without >> the work having been disabled previously? I'm not hugely familiar with >> delayed workqueue implementation details. >> >> [ 0.087733] Modules linked in: >> [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 >> [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 >> [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 >> [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 >> [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 >> [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 >> [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 >> [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 >> [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 >> [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 >> [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 >> [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 >> [ 0.087733] PKRU: 55555554 >> [ 0.087733] Call Trace: >> [ 0.087733] <TASK> >> [ 0.087733] ? enable_work+0xb5/0xc0 >> [ 0.087733] ? __warn.cold+0x93/0xf2 >> [ 0.087733] ? enable_work+0xb5/0xc0 >> [ 0.087733] ? report_bug+0xff/0x140 >> [ 0.087733] ? handle_bug+0x54/0x90 >> [ 0.087733] ? exc_invalid_op+0x17/0x70 >> [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 >> [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 >> [ 0.087733] ? enable_work+0xb5/0xc0 >> [ 0.087733] vmstat_cpu_online+0x5c/0x70 >> [ 0.087733] cpuhp_invoke_callback+0x133/0x440 >> [ 0.087733] cpuhp_thread_fun+0x95/0x150 >> [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 >> [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 >> [ 0.087735] kthread+0xc8/0xf0 >> [ 0.087737] ? __pfx_kthread+0x10/0x10 >> [ 0.087738] ret_from_fork+0x2c/0x50 >> [ 0.087739] ? __pfx_kthread+0x10/0x10 >> [ 0.087740] ret_from_fork_asm+0x1a/0x30 >> [ 0.087742] </TASK> >> [ 0.087742] ---[ end trace 0000000000000000 ]--- >> >> >> > --- >> > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ >> > --- >> > mm/vmstat.c | 3 ++- >> > 1 file changed, 2 insertions(+), 1 deletion(-) >> > >> > diff --git a/mm/vmstat.c b/mm/vmstat.c >> > index 4d016314a56c..0889b75cef14 100644 >> > --- a/mm/vmstat.c >> > +++ b/mm/vmstat.c >> > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) >> > if (!node_state(cpu_to_node(cpu), N_CPU)) { >> > node_set_state(cpu_to_node(cpu), N_CPU); >> > } >> > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); >> >> Probably needs to be 'if disabled' here, as this is invoked on normal >> startup when the work won't have been disabled? >> >> Had a brief look at code and couldn't see how that could be done >> however... and one would need to be careful about races... Tricky! >> >> > >> > return 0; >> > } >> > >> > static int vmstat_cpu_down_prep(unsigned int cpu) >> > { >> > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> > return 0; >> > } >> > >> > -- >> > 2.43.0 >> > >> > >> >> Let me know if you need any more details, .config etc. >> >> I noticed this warning on a real box too (in both cases running akpm's >> mm-unstable branch), FWIW. > > Thank you for the report. I was able to reproduce the warning and now > wonder how I missed it.. My oversight, apologies. > > In my current view, the simplest solution would be to make sure a local > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even > during boot-up. The following patch suppresses the warning: > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 0889b75cef14..19ceed5d34bf 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) > { > int cpu; > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* will be enabled on vmstat_cpu_online */ > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > + > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } > > If you think of a better solution later, please let me know. Otherwise, > I'll submit a follow-up fix patch with the above diff. Can't think of a better solution myself but this fixes the issue. Thanks! > > Thanks. > > -Koichiro C. Mitrodimas ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-04 4:00 ` Koichiro Den 2025-01-06 2:18 ` Charalampos Mitrodimas @ 2025-01-06 10:04 ` Mark Rutland 2025-01-06 10:18 ` Geert Uytterhoeven 2025-01-06 10:52 ` Lorenzo Stoakes 3 siblings, 0 replies; 22+ messages in thread From: Mark Rutland @ 2025-01-06 10:04 UTC (permalink / raw) To: Koichiro Den; +Cc: Lorenzo Stoakes, linux-mm, akpm, linux-kernel On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote: > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: > > Hi, > > > > I observed a warning in my qemu and real hardware, which I bisected to this commit: > > > > [ 0.087733] ------------[ cut here ]------------ > > [ 0.087733] workqueue: work disable count underflowed > > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 FWIW, I hit similar when testing v6.13-rc6 defconfig on arm64, when booting secondaries I always get a splat (trimmed): | ------------[ cut here ]------------ | workqueue: work disable count underflowed | WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4317 enable_work+0xfc/0x108 | Modules linked in: | CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc6 #1 | Hardware name: linux,dummy-virt (DT) | [...] | Call trace: | enable_work+0xfc/0x108 (P) | enable_delayed_work+0x10/0x1c | vmstat_cpu_online+0x88/0xbc | cpuhp_invoke_callback+0x10c/0x208 | cpuhp_thread_fun+0xb0/0x1a0 | smpboot_thread_fn+0x20c/0x234 | kthread+0x110/0x114 | ret_from_fork+0x10/0x20 | ---[ end trace 0000000000000000 ]--- [...] > In my current view, the simplest solution would be to make sure a local > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even > during boot-up. The following patch suppresses the warning: > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 0889b75cef14..19ceed5d34bf 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) > { > int cpu; > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* will be enabled on vmstat_cpu_online */ > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > + > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } FWIW, the above solves the warning for me. Mark. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-04 4:00 ` Koichiro Den 2025-01-06 2:18 ` Charalampos Mitrodimas 2025-01-06 10:04 ` Mark Rutland @ 2025-01-06 10:18 ` Geert Uytterhoeven 2025-01-06 10:52 ` Lorenzo Stoakes 3 siblings, 0 replies; 22+ messages in thread From: Geert Uytterhoeven @ 2025-01-06 10:18 UTC (permalink / raw) To: Koichiro Den; +Cc: Lorenzo Stoakes, linux-mm, akpm, linux-kernel, Linux-Renesas Hi Koichiro, On Sat, Jan 4, 2025 at 5:00 AM Koichiro Den <koichiro.den@canonical.com> wrote: > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: > > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: > > > Even after mm/vmstat:online teardown, shepherd may still queue work for > > > the dying cpu until the cpu is removed from online mask. While it's > > > quite rare, this means that after unbind_workers() unbinds a per-cpu > > > kworker, it potentially runs vmstat_update for the dying CPU on an > > > irrelevant cpu before entering atomic AP states. > > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the > > > backtrace. > > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: \ > > > kworker/7:3/1702 > > > caller is refresh_cpu_vm_stats+0x235/0x5f0 > > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G > > > Tainted: [N]=TEST > > > Workqueue: mm_percpu_wq vmstat_update > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x8d/0xb0 > > > check_preemption_disabled+0xce/0xe0 > > > refresh_cpu_vm_stats+0x235/0x5f0 > > > vmstat_update+0x17/0xa0 > > > process_one_work+0x869/0x1aa0 > > > worker_thread+0x5e5/0x1100 > > > kthread+0x29e/0x380 > > > ret_from_fork+0x2d/0x70 > > > ret_from_fork_asm+0x1a/0x30 > > > </TASK> > > > > > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and > > > symmetrically enable it on startup. > > > > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> > > > > I observed a warning in my qemu and real hardware, which I bisected to this commit: > > > > [ 0.087733] ------------[ cut here ]------------ > > [ 0.087733] workqueue: work disable count underflowed > > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 I am seeing the same on arm32 (R-Car M2-W) and arm64 (R-Car H3 ES2.0). > Thank you for the report. I was able to reproduce the warning and now > wonder how I missed it.. My oversight, apologies. > > In my current view, the simplest solution would be to make sure a local > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even > during boot-up. The following patch suppresses the warning: > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 0889b75cef14..19ceed5d34bf 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) > { > int cpu; > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* will be enabled on vmstat_cpu_online */ > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > + > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } > > If you think of a better solution later, please let me know. Otherwise, > I'll submit a follow-up fix patch with the above diff. Thank you, that fixes the warnings for me! Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-04 4:00 ` Koichiro Den ` (2 preceding siblings ...) 2025-01-06 10:18 ` Geert Uytterhoeven @ 2025-01-06 10:52 ` Lorenzo Stoakes 2025-01-06 12:53 ` Charalampos Mitrodimas 2025-01-06 13:03 ` Koichiro Den 3 siblings, 2 replies; 22+ messages in thread From: Lorenzo Stoakes @ 2025-01-06 10:52 UTC (permalink / raw) To: Koichiro Den Cc: linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra +cc tglx, peterz for insight on CPU hot plug On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote: > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: > > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: > > > Even after mm/vmstat:online teardown, shepherd may still queue work for > > > the dying cpu until the cpu is removed from online mask. While it's > > > quite rare, this means that after unbind_workers() unbinds a per-cpu > > > kworker, it potentially runs vmstat_update for the dying CPU on an > > > irrelevant cpu before entering atomic AP states. > > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the > > > backtrace. > > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: \ > > > kworker/7:3/1702 > > > caller is refresh_cpu_vm_stats+0x235/0x5f0 > > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G > > > Tainted: [N]=TEST > > > Workqueue: mm_percpu_wq vmstat_update > > > Call Trace: > > > <TASK> > > > dump_stack_lvl+0x8d/0xb0 > > > check_preemption_disabled+0xce/0xe0 > > > refresh_cpu_vm_stats+0x235/0x5f0 > > > vmstat_update+0x17/0xa0 > > > process_one_work+0x869/0x1aa0 > > > worker_thread+0x5e5/0x1100 > > > kthread+0x29e/0x380 > > > ret_from_fork+0x2d/0x70 > > > ret_from_fork_asm+0x1a/0x30 > > > </TASK> > > > > > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and > > > symmetrically enable it on startup. > > > > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> > > > > Hi, > > > > I observed a warning in my qemu and real hardware, which I bisected to this commit: > > > > [ 0.087733] ------------[ cut here ]------------ > > [ 0.087733] workqueue: work disable count underflowed > > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 > > > > This is: > > > > static void work_offqd_enable(struct work_offq_data *offqd) > > { > > if (likely(offqd->disable > 0)) > > offqd->disable--; > > else > > WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line > > } > > > > So (based on this code) presumably an enable is only required if previously > > disabled, and this code is being called on startup unconditionally without > > the work having been disabled previously? I'm not hugely familiar with > > delayed workqueue implementation details. > > > > [ 0.087733] Modules linked in: > > [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 > > [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > > [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 > > [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 > > [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 > > [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 > > [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 > > [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 > > [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 > > [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 > > [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 > > [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 > > [ 0.087733] PKRU: 55555554 > > [ 0.087733] Call Trace: > > [ 0.087733] <TASK> > > [ 0.087733] ? enable_work+0xb5/0xc0 > > [ 0.087733] ? __warn.cold+0x93/0xf2 > > [ 0.087733] ? enable_work+0xb5/0xc0 > > [ 0.087733] ? report_bug+0xff/0x140 > > [ 0.087733] ? handle_bug+0x54/0x90 > > [ 0.087733] ? exc_invalid_op+0x17/0x70 > > [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 > > [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 > > [ 0.087733] ? enable_work+0xb5/0xc0 > > [ 0.087733] vmstat_cpu_online+0x5c/0x70 > > [ 0.087733] cpuhp_invoke_callback+0x133/0x440 > > [ 0.087733] cpuhp_thread_fun+0x95/0x150 > > [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 > > [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 > > [ 0.087735] kthread+0xc8/0xf0 > > [ 0.087737] ? __pfx_kthread+0x10/0x10 > > [ 0.087738] ret_from_fork+0x2c/0x50 > > [ 0.087739] ? __pfx_kthread+0x10/0x10 > > [ 0.087740] ret_from_fork_asm+0x1a/0x30 > > [ 0.087742] </TASK> > > [ 0.087742] ---[ end trace 0000000000000000 ]--- > > > > > > > --- > > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ > > > --- > > > mm/vmstat.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > index 4d016314a56c..0889b75cef14 100644 > > > --- a/mm/vmstat.c > > > +++ b/mm/vmstat.c > > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) > > > if (!node_state(cpu_to_node(cpu), N_CPU)) { > > > node_set_state(cpu_to_node(cpu), N_CPU); > > > } > > > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); > > > > Probably needs to be 'if disabled' here, as this is invoked on normal > > startup when the work won't have been disabled? > > > > Had a brief look at code and couldn't see how that could be done > > however... and one would need to be careful about races... Tricky! > > > > > > > > return 0; > > > } > > > > > > static int vmstat_cpu_down_prep(unsigned int cpu) > > > { > > > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > return 0; > > > } > > > > > > -- > > > 2.43.0 > > > > > > > > > > Let me know if you need any more details, .config etc. > > > > I noticed this warning on a real box too (in both cases running akpm's > > mm-unstable branch), FWIW. > > Thank you for the report. I was able to reproduce the warning and now > wonder how I missed it.. My oversight, apologies. > > In my current view, the simplest solution would be to make sure a local > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even > during boot-up. The following patch suppresses the warning: > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 0889b75cef14..19ceed5d34bf 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) > { > int cpu; > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* will be enabled on vmstat_cpu_online */ > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > + > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } > > If you think of a better solution later, please let me know. Otherwise, > I'll submit a follow-up fix patch with the above diff. Thanks, this resolves the problem, but are we sure that _all_ CPUs will definitely call vmstat_cpu_online()? I did a bit of printk output and it seems like this _didn't_ online CPU 0, presumably the boot CPU which calls this function in the first instance? I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls() explicitly which does _not_ call the callback, though even if it did this would be too early as it calls start_shepherd_timer() _after_ this anyway. So yeah, unless I'm missing something, I think this patch is broken. I have added Thomas and Peter to give some insight on the CPU hotplug side. It feels like the patch really needs an 'enable if not already enabled' call in vmstat_cpu_online(). > > Thanks. > > -Koichiro ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-06 10:52 ` Lorenzo Stoakes @ 2025-01-06 12:53 ` Charalampos Mitrodimas 2025-01-06 12:58 ` Lorenzo Stoakes 2025-01-06 13:03 ` Koichiro Den 1 sibling, 1 reply; 22+ messages in thread From: Charalampos Mitrodimas @ 2025-01-06 12:53 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Koichiro Den, linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra Lorenzo Stoakes <lorenzo.stoakes@oracle.com> writes: > +cc tglx, peterz for insight on CPU hot plug > > On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote: >> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: >> > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: >> > > Even after mm/vmstat:online teardown, shepherd may still queue work for >> > > the dying cpu until the cpu is removed from online mask. While it's >> > > quite rare, this means that after unbind_workers() unbinds a per-cpu >> > > kworker, it potentially runs vmstat_update for the dying CPU on an >> > > irrelevant cpu before entering atomic AP states. >> > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the >> > > backtrace. >> > > >> > > BUG: using smp_processor_id() in preemptible [00000000] code: \ >> > > kworker/7:3/1702 >> > > caller is refresh_cpu_vm_stats+0x235/0x5f0 >> > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G >> > > Tainted: [N]=TEST >> > > Workqueue: mm_percpu_wq vmstat_update >> > > Call Trace: >> > > <TASK> >> > > dump_stack_lvl+0x8d/0xb0 >> > > check_preemption_disabled+0xce/0xe0 >> > > refresh_cpu_vm_stats+0x235/0x5f0 >> > > vmstat_update+0x17/0xa0 >> > > process_one_work+0x869/0x1aa0 >> > > worker_thread+0x5e5/0x1100 >> > > kthread+0x29e/0x380 >> > > ret_from_fork+0x2d/0x70 >> > > ret_from_fork_asm+0x1a/0x30 >> > > </TASK> >> > > >> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and >> > > symmetrically enable it on startup. >> > > >> > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> >> > >> > Hi, >> > >> > I observed a warning in my qemu and real hardware, which I bisected to this commit: >> > >> > [ 0.087733] ------------[ cut here ]------------ >> > [ 0.087733] workqueue: work disable count underflowed >> > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 >> > >> > This is: >> > >> > static void work_offqd_enable(struct work_offq_data *offqd) >> > { >> > if (likely(offqd->disable > 0)) >> > offqd->disable--; >> > else >> > WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line >> > } >> > >> > So (based on this code) presumably an enable is only required if previously >> > disabled, and this code is being called on startup unconditionally without >> > the work having been disabled previously? I'm not hugely familiar with >> > delayed workqueue implementation details. >> > >> > [ 0.087733] Modules linked in: >> > [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 >> > [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 >> > [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 >> > [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 >> > [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 >> > [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 >> > [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 >> > [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 >> > [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 >> > [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 >> > [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 >> > [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 >> > [ 0.087733] PKRU: 55555554 >> > [ 0.087733] Call Trace: >> > [ 0.087733] <TASK> >> > [ 0.087733] ? enable_work+0xb5/0xc0 >> > [ 0.087733] ? __warn.cold+0x93/0xf2 >> > [ 0.087733] ? enable_work+0xb5/0xc0 >> > [ 0.087733] ? report_bug+0xff/0x140 >> > [ 0.087733] ? handle_bug+0x54/0x90 >> > [ 0.087733] ? exc_invalid_op+0x17/0x70 >> > [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 >> > [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 >> > [ 0.087733] ? enable_work+0xb5/0xc0 >> > [ 0.087733] vmstat_cpu_online+0x5c/0x70 >> > [ 0.087733] cpuhp_invoke_callback+0x133/0x440 >> > [ 0.087733] cpuhp_thread_fun+0x95/0x150 >> > [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 >> > [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 >> > [ 0.087735] kthread+0xc8/0xf0 >> > [ 0.087737] ? __pfx_kthread+0x10/0x10 >> > [ 0.087738] ret_from_fork+0x2c/0x50 >> > [ 0.087739] ? __pfx_kthread+0x10/0x10 >> > [ 0.087740] ret_from_fork_asm+0x1a/0x30 >> > [ 0.087742] </TASK> >> > [ 0.087742] ---[ end trace 0000000000000000 ]--- >> > >> > >> > > --- >> > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ >> > > --- >> > > mm/vmstat.c | 3 ++- >> > > 1 file changed, 2 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/mm/vmstat.c b/mm/vmstat.c >> > > index 4d016314a56c..0889b75cef14 100644 >> > > --- a/mm/vmstat.c >> > > +++ b/mm/vmstat.c >> > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) >> > > if (!node_state(cpu_to_node(cpu), N_CPU)) { >> > > node_set_state(cpu_to_node(cpu), N_CPU); >> > > } >> > > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); >> > >> > Probably needs to be 'if disabled' here, as this is invoked on normal >> > startup when the work won't have been disabled? >> > >> > Had a brief look at code and couldn't see how that could be done >> > however... and one would need to be careful about races... Tricky! >> > >> > > >> > > return 0; >> > > } >> > > >> > > static int vmstat_cpu_down_prep(unsigned int cpu) >> > > { >> > > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> > > return 0; >> > > } >> > > >> > > -- >> > > 2.43.0 >> > > >> > > >> > >> > Let me know if you need any more details, .config etc. >> > >> > I noticed this warning on a real box too (in both cases running akpm's >> > mm-unstable branch), FWIW. >> >> Thank you for the report. I was able to reproduce the warning and now >> wonder how I missed it.. My oversight, apologies. >> >> In my current view, the simplest solution would be to make sure a local >> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even >> during boot-up. The following patch suppresses the warning: >> >> diff --git a/mm/vmstat.c b/mm/vmstat.c >> index 0889b75cef14..19ceed5d34bf 100644 >> --- a/mm/vmstat.c >> +++ b/mm/vmstat.c >> @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) >> { >> int cpu; >> >> - for_each_possible_cpu(cpu) >> + for_each_possible_cpu(cpu) { >> INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), >> vmstat_update); >> >> + /* will be enabled on vmstat_cpu_online */ >> + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> + } >> + >> schedule_delayed_work(&shepherd, >> round_jiffies_relative(sysctl_stat_interval)); >> } >> >> If you think of a better solution later, please let me know. Otherwise, >> I'll submit a follow-up fix patch with the above diff. > > Thanks, this resolves the problem, but are we sure that _all_ CPUs will > definitely call vmstat_cpu_online()? > > I did a bit of printk output and it seems like this _didn't_ online CPU 0, > presumably the boot CPU which calls this function in the first instance? FWIW with the proposed fix I can see that all CPUs are online, grep -H . /sys/devices/system/cpu/cpu*/online /sys/devices/system/cpu/cpu0/online:1 /sys/devices/system/cpu/cpu1/online:1 /sys/devices/system/cpu/cpu2/online:1 /sys/devices/system/cpu/cpu3/online:1 /sys/devices/system/cpu/cpu4/online:1 /sys/devices/system/cpu/cpu5/online:1 /sys/devices/system/cpu/cpu6/online:1 /sys/devices/system/cpu/cpu7/online:1 > > I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls() > explicitly which does _not_ call the callback, though even if it did this > would be too early as it calls start_shepherd_timer() _after_ this anyway. > > So yeah, unless I'm missing something, I think this patch is broken. > > I have added Thomas and Peter to give some insight on the CPU hotplug side. > > It feels like the patch really needs an 'enable if not already enabled' > call in vmstat_cpu_online(). > >> >> Thanks. >> >> -Koichiro ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-06 12:53 ` Charalampos Mitrodimas @ 2025-01-06 12:58 ` Lorenzo Stoakes 0 siblings, 0 replies; 22+ messages in thread From: Lorenzo Stoakes @ 2025-01-06 12:58 UTC (permalink / raw) To: Charalampos Mitrodimas Cc: Koichiro Den, linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra On Mon, Jan 06, 2025 at 12:53:36PM +0000, Charalampos Mitrodimas wrote: > > I did a bit of printk output and it seems like this _didn't_ online CPU 0, > > presumably the boot CPU which calls this function in the first instance? > > FWIW with the proposed fix I can see that all CPUs are online, > grep -H . /sys/devices/system/cpu/cpu*/online > /sys/devices/system/cpu/cpu0/online:1 > /sys/devices/system/cpu/cpu1/online:1 > /sys/devices/system/cpu/cpu2/online:1 > /sys/devices/system/cpu/cpu3/online:1 > /sys/devices/system/cpu/cpu4/online:1 > /sys/devices/system/cpu/cpu5/online:1 > /sys/devices/system/cpu/cpu6/online:1 > /sys/devices/system/cpu/cpu7/online:1 > Sorry maybe I phrased this badly, I'm not suggesting CPUs aren't coming online, I'm saying it doesn't look like vmstat_cpu_online() will be called for the boot CPU, which breaks the proposed fix (the delayed work for this CPU will simply never be enabled in this case). Naturally, the boot CPU is _already_ online at this point, which is I imagine why this is the case. I had wondered if the function would be invoked on the boot CPU _anyway_ but it appears not. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-06 10:52 ` Lorenzo Stoakes 2025-01-06 12:53 ` Charalampos Mitrodimas @ 2025-01-06 13:03 ` Koichiro Den 2025-01-06 13:53 ` Koichiro Den 2025-01-07 1:18 ` [PATCH] Simple fix Huacai Chen 1 sibling, 2 replies; 22+ messages in thread From: Koichiro Den @ 2025-01-06 13:03 UTC (permalink / raw) To: Lorenzo Stoakes Cc: linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra On Mon, Jan 06, 2025 at 10:52:37AM +0000, Lorenzo Stoakes wrote: > +cc tglx, peterz for insight on CPU hot plug > > On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote: > > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: > > > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: > > > > Even after mm/vmstat:online teardown, shepherd may still queue work for > > > > the dying cpu until the cpu is removed from online mask. While it's > > > > quite rare, this means that after unbind_workers() unbinds a per-cpu > > > > kworker, it potentially runs vmstat_update for the dying CPU on an > > > > irrelevant cpu before entering atomic AP states. > > > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the > > > > backtrace. > > > > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: \ > > > > kworker/7:3/1702 > > > > caller is refresh_cpu_vm_stats+0x235/0x5f0 > > > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G > > > > Tainted: [N]=TEST > > > > Workqueue: mm_percpu_wq vmstat_update > > > > Call Trace: > > > > <TASK> > > > > dump_stack_lvl+0x8d/0xb0 > > > > check_preemption_disabled+0xce/0xe0 > > > > refresh_cpu_vm_stats+0x235/0x5f0 > > > > vmstat_update+0x17/0xa0 > > > > process_one_work+0x869/0x1aa0 > > > > worker_thread+0x5e5/0x1100 > > > > kthread+0x29e/0x380 > > > > ret_from_fork+0x2d/0x70 > > > > ret_from_fork_asm+0x1a/0x30 > > > > </TASK> > > > > > > > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and > > > > symmetrically enable it on startup. > > > > > > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> > > > > > > Hi, > > > > > > I observed a warning in my qemu and real hardware, which I bisected to this commit: > > > > > > [ 0.087733] ------------[ cut here ]------------ > > > [ 0.087733] workqueue: work disable count underflowed > > > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 > > > > > > This is: > > > > > > static void work_offqd_enable(struct work_offq_data *offqd) > > > { > > > if (likely(offqd->disable > 0)) > > > offqd->disable--; > > > else > > > WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line > > > } > > > > > > So (based on this code) presumably an enable is only required if previously > > > disabled, and this code is being called on startup unconditionally without > > > the work having been disabled previously? I'm not hugely familiar with > > > delayed workqueue implementation details. > > > > > > [ 0.087733] Modules linked in: > > > [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 > > > [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > > > [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 > > > [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 > > > [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 > > > [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 > > > [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 > > > [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 > > > [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 > > > [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 > > > [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 > > > [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 > > > [ 0.087733] PKRU: 55555554 > > > [ 0.087733] Call Trace: > > > [ 0.087733] <TASK> > > > [ 0.087733] ? enable_work+0xb5/0xc0 > > > [ 0.087733] ? __warn.cold+0x93/0xf2 > > > [ 0.087733] ? enable_work+0xb5/0xc0 > > > [ 0.087733] ? report_bug+0xff/0x140 > > > [ 0.087733] ? handle_bug+0x54/0x90 > > > [ 0.087733] ? exc_invalid_op+0x17/0x70 > > > [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 > > > [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 > > > [ 0.087733] ? enable_work+0xb5/0xc0 > > > [ 0.087733] vmstat_cpu_online+0x5c/0x70 > > > [ 0.087733] cpuhp_invoke_callback+0x133/0x440 > > > [ 0.087733] cpuhp_thread_fun+0x95/0x150 > > > [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 > > > [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 > > > [ 0.087735] kthread+0xc8/0xf0 > > > [ 0.087737] ? __pfx_kthread+0x10/0x10 > > > [ 0.087738] ret_from_fork+0x2c/0x50 > > > [ 0.087739] ? __pfx_kthread+0x10/0x10 > > > [ 0.087740] ret_from_fork_asm+0x1a/0x30 > > > [ 0.087742] </TASK> > > > [ 0.087742] ---[ end trace 0000000000000000 ]--- > > > > > > > > > > --- > > > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ > > > > --- > > > > mm/vmstat.c | 3 ++- > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > index 4d016314a56c..0889b75cef14 100644 > > > > --- a/mm/vmstat.c > > > > +++ b/mm/vmstat.c > > > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) > > > > if (!node_state(cpu_to_node(cpu), N_CPU)) { > > > > node_set_state(cpu_to_node(cpu), N_CPU); > > > > } > > > > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); > > > > > > Probably needs to be 'if disabled' here, as this is invoked on normal > > > startup when the work won't have been disabled? > > > > > > Had a brief look at code and couldn't see how that could be done > > > however... and one would need to be careful about races... Tricky! > > > > > > > > > > > return 0; > > > > } > > > > > > > > static int vmstat_cpu_down_prep(unsigned int cpu) > > > > { > > > > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > return 0; > > > > } > > > > > > > > -- > > > > 2.43.0 > > > > > > > > > > > > > > Let me know if you need any more details, .config etc. > > > > > > I noticed this warning on a real box too (in both cases running akpm's > > > mm-unstable branch), FWIW. > > > > Thank you for the report. I was able to reproduce the warning and now > > wonder how I missed it.. My oversight, apologies. > > > > In my current view, the simplest solution would be to make sure a local > > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even > > during boot-up. The following patch suppresses the warning: > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 0889b75cef14..19ceed5d34bf 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) > > { > > int cpu; > > > > - for_each_possible_cpu(cpu) > > + for_each_possible_cpu(cpu) { > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > vmstat_update); > > > > + /* will be enabled on vmstat_cpu_online */ > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > + } > > + > > schedule_delayed_work(&shepherd, > > round_jiffies_relative(sysctl_stat_interval)); > > } > > > > If you think of a better solution later, please let me know. Otherwise, > > I'll submit a follow-up fix patch with the above diff. > > Thanks, this resolves the problem, but are we sure that _all_ CPUs will > definitely call vmstat_cpu_online()? > > I did a bit of printk output and it seems like this _didn't_ online CPU 0, > presumably the boot CPU which calls this function in the first instance? > > I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls() > explicitly which does _not_ call the callback, though even if it did this > would be too early as it calls start_shepherd_timer() _after_ this anyway. > > So yeah, unless I'm missing something, I think this patch is broken. You're absolutely right, thanks a lot. I also appreciate you testing it, thanks everyone. > > I have added Thomas and Peter to give some insight on the CPU hotplug side. > > It feels like the patch really needs an 'enable if not already enabled' > call in vmstat_cpu_online(). Right. While not fully polished yet, I've tested the following diff: diff --git a/mm/vmstat.c b/mm/vmstat.c index 0889b75cef14..f967aa22392f 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1937,7 +1937,11 @@ static const struct seq_operations vmstat_op = { #endif /* CONFIG_PROC_FS */ #ifdef CONFIG_SMP -static DEFINE_PER_CPU(struct delayed_work, vmstat_work); +struct vmstat_work { + struct delayed_work dwork; + bool enabled; +}; +static DEFINE_PER_CPU(struct vmstat_work, vmstat_work); int sysctl_stat_interval __read_mostly = HZ; static int vmstat_late_init_done; @@ -2015,7 +2019,7 @@ static void vmstat_update(struct work_struct *w) * update worker thread. */ queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, - this_cpu_ptr(&vmstat_work), + &this_cpu_ptr(&vmstat_work)->dwork, round_jiffies_relative(sysctl_stat_interval)); } } @@ -2059,7 +2063,7 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) + if (!delayed_work_pending(&this_cpu_ptr(&vmstat_work)->dwork)) return; if (!need_update(smp_processor_id())) @@ -2091,7 +2095,7 @@ static void vmstat_shepherd(struct work_struct *w) cpus_read_lock(); /* Check processors whose vmstat worker threads have been disabled */ for_each_online_cpu(cpu) { - struct delayed_work *dw = &per_cpu(vmstat_work, cpu); + struct delayed_work *dw = &per_cpu(vmstat_work, cpu).dwork; /* * In kernel users of vmstat counters either require the precise value and @@ -2120,11 +2124,14 @@ static void vmstat_shepherd(struct work_struct *w) static void __init start_shepherd_timer(void) { + struct vmstat_work *vmstat; int cpu; - for_each_possible_cpu(cpu) - INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), - vmstat_update); + for_each_possible_cpu(cpu) { + vmstat = per_cpu_ptr(&vmstat_work, cpu); + INIT_DEFERRABLE_WORK(&vmstat->dwork, vmstat_update); + vmstat->enabled = true; + } schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); @@ -2142,20 +2149,28 @@ static void __init init_cpu_node_state(void) static int vmstat_cpu_online(unsigned int cpu) { + struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu); + if (vmstat_late_init_done) refresh_zone_stat_thresholds(); if (!node_state(cpu_to_node(cpu), N_CPU)) { node_set_state(cpu_to_node(cpu), N_CPU); } - enable_delayed_work(&per_cpu(vmstat_work, cpu)); + + if (!vmstat->enabled) { + enable_delayed_work(&vmstat->dwork); + vmstat->enabled = true; + } return 0; } static int vmstat_cpu_down_prep(unsigned int cpu) { - disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); + struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu); + + disable_delayed_work_sync(&vmstat->dwork); return 0; } Lorenzo, and Thomas and Peter if you're available, I'd greatly appreciate any thoughts or feedback on this. Thanks. -Koichiro Den > > > > > Thanks. > > > > -Koichiro ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() 2025-01-06 13:03 ` Koichiro Den @ 2025-01-06 13:53 ` Koichiro Den 2025-01-07 1:18 ` [PATCH] Simple fix Huacai Chen 1 sibling, 0 replies; 22+ messages in thread From: Koichiro Den @ 2025-01-06 13:53 UTC (permalink / raw) To: Lorenzo Stoakes Cc: linux-mm, akpm, linux-kernel, Thomas Gleixner, Peter Zijlstra On Mon, Jan 06, 2025 at 10:03:57PM +0900, Koichiro Den wrote: > On Mon, Jan 06, 2025 at 10:52:37AM +0000, Lorenzo Stoakes wrote: > > +cc tglx, peterz for insight on CPU hot plug > > > > On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote: > > > On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: > > > > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: > > > > > Even after mm/vmstat:online teardown, shepherd may still queue work for > > > > > the dying cpu until the cpu is removed from online mask. While it's > > > > > quite rare, this means that after unbind_workers() unbinds a per-cpu > > > > > kworker, it potentially runs vmstat_update for the dying CPU on an > > > > > irrelevant cpu before entering atomic AP states. > > > > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the > > > > > backtrace. > > > > > > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: \ > > > > > kworker/7:3/1702 > > > > > caller is refresh_cpu_vm_stats+0x235/0x5f0 > > > > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G > > > > > Tainted: [N]=TEST > > > > > Workqueue: mm_percpu_wq vmstat_update > > > > > Call Trace: > > > > > <TASK> > > > > > dump_stack_lvl+0x8d/0xb0 > > > > > check_preemption_disabled+0xce/0xe0 > > > > > refresh_cpu_vm_stats+0x235/0x5f0 > > > > > vmstat_update+0x17/0xa0 > > > > > process_one_work+0x869/0x1aa0 > > > > > worker_thread+0x5e5/0x1100 > > > > > kthread+0x29e/0x380 > > > > > ret_from_fork+0x2d/0x70 > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > </TASK> > > > > > > > > > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and > > > > > symmetrically enable it on startup. > > > > > > > > > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> > > > > > > > > Hi, > > > > > > > > I observed a warning in my qemu and real hardware, which I bisected to this commit: > > > > > > > > [ 0.087733] ------------[ cut here ]------------ > > > > [ 0.087733] workqueue: work disable count underflowed > > > > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 > > > > > > > > This is: > > > > > > > > static void work_offqd_enable(struct work_offq_data *offqd) > > > > { > > > > if (likely(offqd->disable > 0)) > > > > offqd->disable--; > > > > else > > > > WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line > > > > } > > > > > > > > So (based on this code) presumably an enable is only required if previously > > > > disabled, and this code is being called on startup unconditionally without > > > > the work having been disabled previously? I'm not hugely familiar with > > > > delayed workqueue implementation details. > > > > > > > > [ 0.087733] Modules linked in: > > > > [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 > > > > [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > > > > [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 > > > > [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 > > > > [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 > > > > [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 > > > > [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 > > > > [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 > > > > [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 > > > > [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 > > > > [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 > > > > [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 > > > > [ 0.087733] PKRU: 55555554 > > > > [ 0.087733] Call Trace: > > > > [ 0.087733] <TASK> > > > > [ 0.087733] ? enable_work+0xb5/0xc0 > > > > [ 0.087733] ? __warn.cold+0x93/0xf2 > > > > [ 0.087733] ? enable_work+0xb5/0xc0 > > > > [ 0.087733] ? report_bug+0xff/0x140 > > > > [ 0.087733] ? handle_bug+0x54/0x90 > > > > [ 0.087733] ? exc_invalid_op+0x17/0x70 > > > > [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 > > > > [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 > > > > [ 0.087733] ? enable_work+0xb5/0xc0 > > > > [ 0.087733] vmstat_cpu_online+0x5c/0x70 > > > > [ 0.087733] cpuhp_invoke_callback+0x133/0x440 > > > > [ 0.087733] cpuhp_thread_fun+0x95/0x150 > > > > [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 > > > > [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 > > > > [ 0.087735] kthread+0xc8/0xf0 > > > > [ 0.087737] ? __pfx_kthread+0x10/0x10 > > > > [ 0.087738] ret_from_fork+0x2c/0x50 > > > > [ 0.087739] ? __pfx_kthread+0x10/0x10 > > > > [ 0.087740] ret_from_fork_asm+0x1a/0x30 > > > > [ 0.087742] </TASK> > > > > [ 0.087742] ---[ end trace 0000000000000000 ]--- > > > > > > > > > > > > > --- > > > > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ > > > > > --- > > > > > mm/vmstat.c | 3 ++- > > > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > > index 4d016314a56c..0889b75cef14 100644 > > > > > --- a/mm/vmstat.c > > > > > +++ b/mm/vmstat.c > > > > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) > > > > > if (!node_state(cpu_to_node(cpu), N_CPU)) { > > > > > node_set_state(cpu_to_node(cpu), N_CPU); > > > > > } > > > > > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); > > > > > > > > Probably needs to be 'if disabled' here, as this is invoked on normal > > > > startup when the work won't have been disabled? > > > > > > > > Had a brief look at code and couldn't see how that could be done > > > > however... and one would need to be careful about races... Tricky! > > > > > > > > > > > > > > return 0; > > > > > } > > > > > > > > > > static int vmstat_cpu_down_prep(unsigned int cpu) > > > > > { > > > > > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > > return 0; > > > > > } > > > > > > > > > > -- > > > > > 2.43.0 > > > > > > > > > > > > > > > > > > Let me know if you need any more details, .config etc. > > > > > > > > I noticed this warning on a real box too (in both cases running akpm's > > > > mm-unstable branch), FWIW. > > > > > > Thank you for the report. I was able to reproduce the warning and now > > > wonder how I missed it.. My oversight, apologies. > > > > > > In my current view, the simplest solution would be to make sure a local > > > vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even > > > during boot-up. The following patch suppresses the warning: > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > index 0889b75cef14..19ceed5d34bf 100644 > > > --- a/mm/vmstat.c > > > +++ b/mm/vmstat.c > > > @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) > > > { > > > int cpu; > > > > > > - for_each_possible_cpu(cpu) > > > + for_each_possible_cpu(cpu) { > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > vmstat_update); > > > > > > + /* will be enabled on vmstat_cpu_online */ > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > + } > > > + > > > schedule_delayed_work(&shepherd, > > > round_jiffies_relative(sysctl_stat_interval)); > > > } > > > > > > If you think of a better solution later, please let me know. Otherwise, > > > I'll submit a follow-up fix patch with the above diff. > > > > Thanks, this resolves the problem, but are we sure that _all_ CPUs will > > definitely call vmstat_cpu_online()? > > > > I did a bit of printk output and it seems like this _didn't_ online CPU 0, > > presumably the boot CPU which calls this function in the first instance? > > > > I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls() > > explicitly which does _not_ call the callback, though even if it did this > > would be too early as it calls start_shepherd_timer() _after_ this anyway. > > > > So yeah, unless I'm missing something, I think this patch is broken. > > You're absolutely right, thanks a lot. > I also appreciate you testing it, thanks everyone. > > > > > I have added Thomas and Peter to give some insight on the CPU hotplug side. > > > > It feels like the patch really needs an 'enable if not already enabled' > > call in vmstat_cpu_online(). > > Right. While not fully polished yet, I've tested the following diff: > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 0889b75cef14..f967aa22392f 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -1937,7 +1937,11 @@ static const struct seq_operations vmstat_op = { > #endif /* CONFIG_PROC_FS */ > > #ifdef CONFIG_SMP > -static DEFINE_PER_CPU(struct delayed_work, vmstat_work); > +struct vmstat_work { > + struct delayed_work dwork; > + bool enabled; > +}; > +static DEFINE_PER_CPU(struct vmstat_work, vmstat_work); > int sysctl_stat_interval __read_mostly = HZ; > static int vmstat_late_init_done; > > @@ -2015,7 +2019,7 @@ static void vmstat_update(struct work_struct *w) > * update worker thread. > */ > queue_delayed_work_on(smp_processor_id(), mm_percpu_wq, > - this_cpu_ptr(&vmstat_work), > + &this_cpu_ptr(&vmstat_work)->dwork, > round_jiffies_relative(sysctl_stat_interval)); > } > } > @@ -2059,7 +2063,7 @@ void quiet_vmstat(void) > if (system_state != SYSTEM_RUNNING) > return; > > - if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) > + if (!delayed_work_pending(&this_cpu_ptr(&vmstat_work)->dwork)) > return; > > if (!need_update(smp_processor_id())) > @@ -2091,7 +2095,7 @@ static void vmstat_shepherd(struct work_struct *w) > cpus_read_lock(); > /* Check processors whose vmstat worker threads have been disabled */ > for_each_online_cpu(cpu) { > - struct delayed_work *dw = &per_cpu(vmstat_work, cpu); > + struct delayed_work *dw = &per_cpu(vmstat_work, cpu).dwork; > > /* > * In kernel users of vmstat counters either require the precise value and > @@ -2120,11 +2124,14 @@ static void vmstat_shepherd(struct work_struct *w) > > static void __init start_shepherd_timer(void) > { > + struct vmstat_work *vmstat; > int cpu; > > - for_each_possible_cpu(cpu) > - INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > - vmstat_update); > + for_each_possible_cpu(cpu) { > + vmstat = per_cpu_ptr(&vmstat_work, cpu); > + INIT_DEFERRABLE_WORK(&vmstat->dwork, vmstat_update); > + vmstat->enabled = true; > + } > > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > @@ -2142,20 +2149,28 @@ static void __init init_cpu_node_state(void) > > static int vmstat_cpu_online(unsigned int cpu) > { > + struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu); > + > if (vmstat_late_init_done) > refresh_zone_stat_thresholds(); > > if (!node_state(cpu_to_node(cpu), N_CPU)) { > node_set_state(cpu_to_node(cpu), N_CPU); > } > - enable_delayed_work(&per_cpu(vmstat_work, cpu)); > + > + if (!vmstat->enabled) { > + enable_delayed_work(&vmstat->dwork); > + vmstat->enabled = true; > + } > > return 0; > } > > static int vmstat_cpu_down_prep(unsigned int cpu) > { > - disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + struct vmstat_work *vmstat = per_cpu_ptr(&vmstat_work, cpu); > + > + disable_delayed_work_sync(&vmstat->dwork); The line `vmstat->enabled = false;` was missing here. With that fixed, I've conducted the following tests: * Boot-up check: confirmed that vmstat_update() runs on all cpus after boot-up. * Cpuhp transitions: verified resilience during random CPU hotplug transitions using the steps below: 1. enabled dynamic ftrace on vmstat_update() 2. ran the following simple silly script for a while #!/bin/bash while :; do now=$(cat /sys/devices/system/cpu/cpu7/hotplug/state) next=$(shuf --random-source=/dev/urandom -i 0-238 -n 1) echo "${now} -> ${next}" echo $next > /sys/devices/system/cpu/cpu7/hotplug/target 2>/dev/null sleep 0.001 done 3. after stopping the script, ensured that the cpu was fully online again by writing the largest value to the 'state' file. 4. verified that vmstat_update() was working on the cpu. Let me know if you have any suggestions or concerns. Thanks. -Koichiro Den > return 0; > } > > Lorenzo, and Thomas and Peter if you're available, I'd greatly appreciate > any thoughts or feedback on this. > > Thanks. > > -Koichiro Den > > > > > > > > > > Thanks. > > > > > > -Koichiro ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] Simple fix 2025-01-06 13:03 ` Koichiro Den 2025-01-06 13:53 ` Koichiro Den @ 2025-01-07 1:18 ` Huacai Chen 2025-01-07 3:35 ` Matthew Wilcox 2025-01-07 8:47 ` Lorenzo Stoakes 1 sibling, 2 replies; 22+ messages in thread From: Huacai Chen @ 2025-01-07 1:18 UTC (permalink / raw) To: Huacai Chen Cc: Andrew Morton, linux-mm, Koichiro Den, Sebastian Andrzej Siewior, Lorenzo Stoakes, Mark Rutland, Charalampos Mitrodimas, Huacai Chen Hi, all, I like simple fixes, so is this acceptable (based on an early version from Koichiro Den)? --- mm/vmstat.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/mm/vmstat.c b/mm/vmstat.c index 0889b75cef14..1badc24a21ff 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) { int cpu; - for_each_possible_cpu(cpu) + for_each_possible_cpu(cpu) { INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), vmstat_update); + /* Will be enabled on vmstat_cpu_online() */ + if (!cpu_online(cpu)) + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); + } + schedule_delayed_work(&shepherd, round_jiffies_relative(sysctl_stat_interval)); } -- 2.43.5 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-07 1:18 ` [PATCH] Simple fix Huacai Chen @ 2025-01-07 3:35 ` Matthew Wilcox 2025-01-07 3:58 ` Huacai Chen 2025-01-07 8:47 ` Lorenzo Stoakes 1 sibling, 1 reply; 22+ messages in thread From: Matthew Wilcox @ 2025-01-07 3:35 UTC (permalink / raw) To: Huacai Chen Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den, Sebastian Andrzej Siewior, Lorenzo Stoakes, Mark Rutland, Charalampos Mitrodimas On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > Hi, all, I like simple fixes, so is this acceptable (based on an early > version from Koichiro Den)? This is an unacceptable commit message. > mm/vmstat.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 0889b75cef14..1badc24a21ff 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > { > int cpu; > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* Will be enabled on vmstat_cpu_online() */ > + if (!cpu_online(cpu)) > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > + > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } > -- > 2.43.5 > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-07 3:35 ` Matthew Wilcox @ 2025-01-07 3:58 ` Huacai Chen 0 siblings, 0 replies; 22+ messages in thread From: Huacai Chen @ 2025-01-07 3:58 UTC (permalink / raw) To: Matthew Wilcox Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den, Sebastian Andrzej Siewior, Lorenzo Stoakes, Mark Rutland, Charalampos Mitrodimas On Tue, Jan 7, 2025 at 11:35 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > version from Koichiro Den)? > > This is an unacceptable commit message. Of course, this is just for discussion, not a proper patch. Huacai > > > mm/vmstat.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 0889b75cef14..1badc24a21ff 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > { > > int cpu; > > > > - for_each_possible_cpu(cpu) > > + for_each_possible_cpu(cpu) { > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > vmstat_update); > > > > + /* Will be enabled on vmstat_cpu_online() */ > > + if (!cpu_online(cpu)) > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > + } > > + > > schedule_delayed_work(&shepherd, > > round_jiffies_relative(sysctl_stat_interval)); > > } > > -- > > 2.43.5 > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-07 1:18 ` [PATCH] Simple fix Huacai Chen 2025-01-07 3:35 ` Matthew Wilcox @ 2025-01-07 8:47 ` Lorenzo Stoakes 2025-01-07 10:29 ` Huacai Chen 1 sibling, 1 reply; 22+ messages in thread From: Lorenzo Stoakes @ 2025-01-07 8:47 UTC (permalink / raw) To: Huacai Chen Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > Hi, all, I like simple fixes, so is this acceptable (based on an early > version from Koichiro Den)? No not at all. This is bizarre - in the mail you are replying to Koichiro agrees with me that the approach of his code that you've sent here (I don't know why) is fundamentally broken and suggest another. I am at a loss as to why you've sent this? Perhaps a miscommunication somewhere? :) In any case, please don't send '[PATCH] xxx' mails that aren't intended to be patches, a better form of this would be to say 'oh can we just do...' then to put this code in the mail underneath, without any '[PATCH]' prefix. But please do review the discussion - the below is insufficient as simple as it is (sadly) because the boot CPU's delayed work will never be executed. I will take a look at Koichiro's new approach as soon as I am able. Cheers! > --- > mm/vmstat.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 0889b75cef14..1badc24a21ff 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > { > int cpu; > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* Will be enabled on vmstat_cpu_online() */ > + if (!cpu_online(cpu)) > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > + > schedule_delayed_work(&shepherd, > round_jiffies_relative(sysctl_stat_interval)); > } > -- > 2.43.5 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-07 8:47 ` Lorenzo Stoakes @ 2025-01-07 10:29 ` Huacai Chen 2025-01-07 11:00 ` Lorenzo Stoakes 0 siblings, 1 reply; 22+ messages in thread From: Huacai Chen @ 2025-01-07 10:29 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas Hi, Lorenzo, On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > version from Koichiro Den)? > > No not at all. This is bizarre - in the mail you are replying to Koichiro > agrees with me that the approach of his code that you've sent here (I don't > know why) is fundamentally broken and suggest another. > > I am at a loss as to why you've sent this? Perhaps a miscommunication > somewhere? :) > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to > be patches, a better form of this would be to say 'oh can we just do...' > then to put this code in the mail underneath, without any '[PATCH]' prefix. I wasn't in the CC list, and I also found the bug yesterday, so I can only reply to this email with "git send-email --in-reply-to". This is the reason why my reply looks so stranne. > > But please do review the discussion - the below is insufficient as simple > as it is (sadly) because the boot CPU's delayed work will never be > executed. Koichiro's simple fix causes the boot CPU's delayed work to never be executed, this is obvious, and of course I have read the early discussion. And so I improve it, with a "cpu_online()" checking, then the boot CPU is unaffected. Huacai > > I will take a look at Koichiro's new approach as soon as I am able. > > Cheers! > > > --- > > mm/vmstat.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 0889b75cef14..1badc24a21ff 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > { > > int cpu; > > > > - for_each_possible_cpu(cpu) > > + for_each_possible_cpu(cpu) { > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > vmstat_update); > > > > + /* Will be enabled on vmstat_cpu_online() */ > > + if (!cpu_online(cpu)) > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > + } > > + > > schedule_delayed_work(&shepherd, > > round_jiffies_relative(sysctl_stat_interval)); > > } > > -- > > 2.43.5 > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-07 10:29 ` Huacai Chen @ 2025-01-07 11:00 ` Lorenzo Stoakes 2025-01-08 2:22 ` Koichiro Den 0 siblings, 1 reply; 22+ messages in thread From: Lorenzo Stoakes @ 2025-01-07 11:00 UTC (permalink / raw) To: Huacai Chen Cc: Huacai Chen, Andrew Morton, linux-mm, Koichiro Den, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote: > Hi, Lorenzo, > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > > version from Koichiro Den)? > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro > > agrees with me that the approach of his code that you've sent here (I don't > > know why) is fundamentally broken and suggest another. > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication > > somewhere? :) > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to > > be patches, a better form of this would be to say 'oh can we just do...' > > then to put this code in the mail underneath, without any '[PATCH]' prefix. > I wasn't in the CC list, and I also found the bug yesterday, so I can > only reply to this email with "git send-email --in-reply-to". This is > the reason why my reply looks so stranne. > > > > > But please do review the discussion - the below is insufficient as simple > > as it is (sadly) because the boot CPU's delayed work will never be > > executed. > Koichiro's simple fix causes the boot CPU's delayed work to never be > executed, this is obvious, and of course I have read the early > discussion. And so I improve it, with a "cpu_online()" checking, then > the boot CPU is unaffected. With respect Haucai, this is not how you engage in kernel discussion. You could simply have replied to the mail or given more information, you now have two people telling you this, please take it on board. And it caused me to miss your actually quite valuable suggestion so this is beneficial for all! :) ANYWAY, moving on to the technical side: > > Huacai > > > > > I will take a look at Koichiro's new approach as soon as I am able. > > > > Cheers! > > > > > --- > > > mm/vmstat.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > index 0889b75cef14..1badc24a21ff 100644 > > > --- a/mm/vmstat.c > > > +++ b/mm/vmstat.c > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > > { > > > int cpu; > > > > > > - for_each_possible_cpu(cpu) > > > + for_each_possible_cpu(cpu) { > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > vmstat_update); > > > > > > + /* Will be enabled on vmstat_cpu_online() */ > > > + if (!cpu_online(cpu)) This might actually be workable as something simpler, on assumption there can be no race here (I don't think so right?). Koichiro - could you check this and see whether it resolves the issue and whether you feel this is sensible? Might be worth expanding comment to say that we disable on offline, enable on online and we're just providing symmetry here. > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > + } > > > + > > > schedule_delayed_work(&shepherd, > > > round_jiffies_relative(sysctl_stat_interval)); > > > } > > > -- > > > 2.43.5 > > > > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-07 11:00 ` Lorenzo Stoakes @ 2025-01-08 2:22 ` Koichiro Den 2025-01-08 2:26 ` Koichiro Den 2025-01-08 2:31 ` Huacai Chen 0 siblings, 2 replies; 22+ messages in thread From: Koichiro Den @ 2025-01-08 2:22 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Huacai Chen, Huacai Chen, Andrew Morton, linux-mm, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote: > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote: > > Hi, Lorenzo, > > > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > > > version from Koichiro Den)? > > > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro > > > agrees with me that the approach of his code that you've sent here (I don't > > > know why) is fundamentally broken and suggest another. > > > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication > > > somewhere? :) > > > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to > > > be patches, a better form of this would be to say 'oh can we just do...' > > > then to put this code in the mail underneath, without any '[PATCH]' prefix. > > I wasn't in the CC list, and I also found the bug yesterday, so I can > > only reply to this email with "git send-email --in-reply-to". This is > > the reason why my reply looks so stranne. > > > > > > > > But please do review the discussion - the below is insufficient as simple > > > as it is (sadly) because the boot CPU's delayed work will never be > > > executed. > > Koichiro's simple fix causes the boot CPU's delayed work to never be > > executed, this is obvious, and of course I have read the early > > discussion. And so I improve it, with a "cpu_online()" checking, then > > the boot CPU is unaffected. > > With respect Haucai, this is not how you engage in kernel discussion. You > could simply have replied to the mail or given more information, you now > have two people telling you this, please take it on board. > > And it caused me to miss your actually quite valuable suggestion so this is > beneficial for all! :) > > ANYWAY, moving on to the technical side: > > > > > Huacai > > > > > > > > I will take a look at Koichiro's new approach as soon as I am able. > > > > > > Cheers! > > > > > > > --- > > > > mm/vmstat.c | 7 ++++++- > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > index 0889b75cef14..1badc24a21ff 100644 > > > > --- a/mm/vmstat.c > > > > +++ b/mm/vmstat.c > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > > > { > > > > int cpu; > > > > > > > > - for_each_possible_cpu(cpu) > > > > + for_each_possible_cpu(cpu) { > > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > > vmstat_update); > > > > > > > > + /* Will be enabled on vmstat_cpu_online() */ > > > > + if (!cpu_online(cpu)) > > This might actually be workable as something simpler, on assumption there Sorry about late response. And thank you very much, Huacai. Your suggestion looks great. Much simpler and intuitive. > can be no race here (I don't think so right?). IIUC, there is no race since smp_init() always runs before init_mm_internals(). At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks simple and the best. > > Koichiro - could you check this and see whether it resolves the issue and > whether you feel this is sensible? I tested it and it seems to work well both on booting and cpuhp events. (By the way, in my previous email comment [1], I forgot to mention that I also applied other unmerged patches, [2] and [3], just to be able to run such random transisioning test. This time here, I just tested by switching CPUHP_ONLINE <-> CPUHP_OFFLINE only.) [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55 [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/ [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/ > > Might be worth expanding comment to say that we disable on offline, enable > on online and we're just providing symmetry here. Sounds good. Like this? - for_each_possible_cpu(cpu) + for_each_possible_cpu(cpu) { INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), vmstat_update); + /* + * For secondary CPUs during CPU hotplug scenarios, + * vmstat_cpu_online() will enable the work. + * mm/vmstat:online enables and disables vmstat_work + * symmetrically during CPU hotplug events. + */ + if (!cpu_online(cpu)) + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); + } Before submitting a revised patch, I'd like to confirm: Huacai, would you be comfortable with me adding your Signed-off-by to the commit, or would you prefer a Suggested-by tag instead? Thanks. -Koichiro Den > > > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > + } > > > > + > > > > schedule_delayed_work(&shepherd, > > > > round_jiffies_relative(sysctl_stat_interval)); > > > > } > > > > -- > > > > 2.43.5 > > > > > > > > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-08 2:22 ` Koichiro Den @ 2025-01-08 2:26 ` Koichiro Den 2025-01-08 2:31 ` Huacai Chen 1 sibling, 0 replies; 22+ messages in thread From: Koichiro Den @ 2025-01-08 2:26 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Huacai Chen, Huacai Chen, Andrew Morton, linux-mm, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas On Wed, Jan 08, 2025 at 11:22:42AM +0900, Koichiro Den wrote: > On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote: > > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote: > > > Hi, Lorenzo, > > > > > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > > > > version from Koichiro Den)? > > > > > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro > > > > agrees with me that the approach of his code that you've sent here (I don't > > > > know why) is fundamentally broken and suggest another. > > > > > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication > > > > somewhere? :) > > > > > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to > > > > be patches, a better form of this would be to say 'oh can we just do...' > > > > then to put this code in the mail underneath, without any '[PATCH]' prefix. > > > I wasn't in the CC list, and I also found the bug yesterday, so I can > > > only reply to this email with "git send-email --in-reply-to". This is > > > the reason why my reply looks so stranne. > > > > > > > > > > > But please do review the discussion - the below is insufficient as simple > > > > as it is (sadly) because the boot CPU's delayed work will never be > > > > executed. > > > Koichiro's simple fix causes the boot CPU's delayed work to never be > > > executed, this is obvious, and of course I have read the early > > > discussion. And so I improve it, with a "cpu_online()" checking, then > > > the boot CPU is unaffected. > > > > With respect Haucai, this is not how you engage in kernel discussion. You > > could simply have replied to the mail or given more information, you now > > have two people telling you this, please take it on board. > > > > And it caused me to miss your actually quite valuable suggestion so this is > > beneficial for all! :) > > > > ANYWAY, moving on to the technical side: > > > > > > > > Huacai > > > > > > > > > > > I will take a look at Koichiro's new approach as soon as I am able. > > > > > > > > Cheers! > > > > > > > > > --- > > > > > mm/vmstat.c | 7 ++++++- > > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > > index 0889b75cef14..1badc24a21ff 100644 > > > > > --- a/mm/vmstat.c > > > > > +++ b/mm/vmstat.c > > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > > > > { > > > > > int cpu; > > > > > > > > > > - for_each_possible_cpu(cpu) > > > > > + for_each_possible_cpu(cpu) { > > > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > > > vmstat_update); > > > > > > > > > > + /* Will be enabled on vmstat_cpu_online() */ > > > > > + if (!cpu_online(cpu)) > > > > This might actually be workable as something simpler, on assumption there > > Sorry about late response. And thank you very much, Huacai. Your suggestion > looks great. Much simpler and intuitive. > > > can be no race here (I don't think so right?). > > IIUC, there is no race since smp_init() always runs before init_mm_internals(). Oops, critical typo. s/before/after/. > At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary > CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks > simple and the best. > > > > > Koichiro - could you check this and see whether it resolves the issue and > > whether you feel this is sensible? > > I tested it and it seems to work well both on booting and cpuhp events. > > (By the way, in my previous email comment [1], I forgot to mention that I also > applied other unmerged patches, [2] and [3], just to be able to run such random > transisioning test. This time here, I just tested by switching CPUHP_ONLINE <-> > CPUHP_OFFLINE only.) > > [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55 > [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/ > [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/ > > > > > Might be worth expanding comment to say that we disable on offline, enable > > on online and we're just providing symmetry here. > > Sounds good. Like this? > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* > + * For secondary CPUs during CPU hotplug scenarios, > + * vmstat_cpu_online() will enable the work. > + * mm/vmstat:online enables and disables vmstat_work > + * symmetrically during CPU hotplug events. > + */ > + if (!cpu_online(cpu)) > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > > > Before submitting a revised patch, I'd like to confirm: > Huacai, would you be comfortable with me adding your Signed-off-by to the > commit, or would you prefer a Suggested-by tag instead? > > Thanks. > > -Koichiro Den > > > > > > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > > + } > > > > > + > > > > > schedule_delayed_work(&shepherd, > > > > > round_jiffies_relative(sysctl_stat_interval)); > > > > > } > > > > > -- > > > > > 2.43.5 > > > > > > > > > > > > Cheers, Lorenzo > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-08 2:22 ` Koichiro Den 2025-01-08 2:26 ` Koichiro Den @ 2025-01-08 2:31 ` Huacai Chen 2025-01-08 3:41 ` Koichiro Den 1 sibling, 1 reply; 22+ messages in thread From: Huacai Chen @ 2025-01-08 2:31 UTC (permalink / raw) To: Koichiro Den Cc: Lorenzo Stoakes, Huacai Chen, Andrew Morton, linux-mm, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas Hi, Koichiro, On Wed, Jan 8, 2025 at 10:22 AM Koichiro Den <koichiro.den@canonical.com> wrote: > > On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote: > > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote: > > > Hi, Lorenzo, > > > > > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > > > > version from Koichiro Den)? > > > > > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro > > > > agrees with me that the approach of his code that you've sent here (I don't > > > > know why) is fundamentally broken and suggest another. > > > > > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication > > > > somewhere? :) > > > > > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to > > > > be patches, a better form of this would be to say 'oh can we just do...' > > > > then to put this code in the mail underneath, without any '[PATCH]' prefix. > > > I wasn't in the CC list, and I also found the bug yesterday, so I can > > > only reply to this email with "git send-email --in-reply-to". This is > > > the reason why my reply looks so stranne. > > > > > > > > > > > But please do review the discussion - the below is insufficient as simple > > > > as it is (sadly) because the boot CPU's delayed work will never be > > > > executed. > > > Koichiro's simple fix causes the boot CPU's delayed work to never be > > > executed, this is obvious, and of course I have read the early > > > discussion. And so I improve it, with a "cpu_online()" checking, then > > > the boot CPU is unaffected. > > > > With respect Haucai, this is not how you engage in kernel discussion. You > > could simply have replied to the mail or given more information, you now > > have two people telling you this, please take it on board. > > > > And it caused me to miss your actually quite valuable suggestion so this is > > beneficial for all! :) > > > > ANYWAY, moving on to the technical side: > > > > > > > > Huacai > > > > > > > > > > > I will take a look at Koichiro's new approach as soon as I am able. > > > > > > > > Cheers! > > > > > > > > > --- > > > > > mm/vmstat.c | 7 ++++++- > > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > > index 0889b75cef14..1badc24a21ff 100644 > > > > > --- a/mm/vmstat.c > > > > > +++ b/mm/vmstat.c > > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > > > > { > > > > > int cpu; > > > > > > > > > > - for_each_possible_cpu(cpu) > > > > > + for_each_possible_cpu(cpu) { > > > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > > > vmstat_update); > > > > > > > > > > + /* Will be enabled on vmstat_cpu_online() */ > > > > > + if (!cpu_online(cpu)) > > > > This might actually be workable as something simpler, on assumption there > > Sorry about late response. And thank you very much, Huacai. Your suggestion > looks great. Much simpler and intuitive. > > > can be no race here (I don't think so right?). > > IIUC, there is no race since smp_init() always runs before init_mm_internals(). > At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary > CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks > simple and the best. > > > > > Koichiro - could you check this and see whether it resolves the issue and > > whether you feel this is sensible? > > I tested it and it seems to work well both on booting and cpuhp events. > > (By the way, in my previous email comment [1], I forgot to mention that I also > applied other unmerged patches, [2] and [3], just to be able to run such random > transisioning test. This time here, I just tested by switching CPUHP_ONLINE <-> > CPUHP_OFFLINE only.) > > [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55 > [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/ > [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/ > > > > > Might be worth expanding comment to say that we disable on offline, enable > > on online and we're just providing symmetry here. > > Sounds good. Like this? > > - for_each_possible_cpu(cpu) > + for_each_possible_cpu(cpu) { > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > vmstat_update); > > + /* > + * For secondary CPUs during CPU hotplug scenarios, > + * vmstat_cpu_online() will enable the work. > + * mm/vmstat:online enables and disables vmstat_work > + * symmetrically during CPU hotplug events. > + */ > + if (!cpu_online(cpu)) > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + } > > > Before submitting a revised patch, I'd like to confirm: > Huacai, would you be comfortable with me adding your Signed-off-by to the > commit, or would you prefer a Suggested-by tag instead? Since the original V2 patch has been reverted in mainline, so I think you will send a V3 which integrate my suggestion, right? Both Signed-off-by and Suggested-by is OK for me. Huacai > > Thanks. > > -Koichiro Den > > > > > > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > > + } > > > > > + > > > > > schedule_delayed_work(&shepherd, > > > > > round_jiffies_relative(sysctl_stat_interval)); > > > > > } > > > > > -- > > > > > 2.43.5 > > > > > > > > > > > > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-08 2:31 ` Huacai Chen @ 2025-01-08 3:41 ` Koichiro Den 2025-01-08 3:57 ` Huacai Chen 0 siblings, 1 reply; 22+ messages in thread From: Koichiro Den @ 2025-01-08 3:41 UTC (permalink / raw) To: Huacai Chen Cc: Lorenzo Stoakes, Huacai Chen, Andrew Morton, linux-mm, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas On Wed, Jan 08, 2025 at 10:31:40AM +0800, Huacai Chen wrote: > Hi, Koichiro, > > On Wed, Jan 8, 2025 at 10:22 AM Koichiro Den <koichiro.den@canonical.com> wrote: > > > > On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote: > > > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote: > > > > Hi, Lorenzo, > > > > > > > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes > > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > > > > > version from Koichiro Den)? > > > > > > > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro > > > > > agrees with me that the approach of his code that you've sent here (I don't > > > > > know why) is fundamentally broken and suggest another. > > > > > > > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication > > > > > somewhere? :) > > > > > > > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to > > > > > be patches, a better form of this would be to say 'oh can we just do...' > > > > > then to put this code in the mail underneath, without any '[PATCH]' prefix. > > > > I wasn't in the CC list, and I also found the bug yesterday, so I can > > > > only reply to this email with "git send-email --in-reply-to". This is > > > > the reason why my reply looks so stranne. > > > > > > > > > > > > > > But please do review the discussion - the below is insufficient as simple > > > > > as it is (sadly) because the boot CPU's delayed work will never be > > > > > executed. > > > > Koichiro's simple fix causes the boot CPU's delayed work to never be > > > > executed, this is obvious, and of course I have read the early > > > > discussion. And so I improve it, with a "cpu_online()" checking, then > > > > the boot CPU is unaffected. > > > > > > With respect Haucai, this is not how you engage in kernel discussion. You > > > could simply have replied to the mail or given more information, you now > > > have two people telling you this, please take it on board. > > > > > > And it caused me to miss your actually quite valuable suggestion so this is > > > beneficial for all! :) > > > > > > ANYWAY, moving on to the technical side: > > > > > > > > > > > Huacai > > > > > > > > > > > > > > I will take a look at Koichiro's new approach as soon as I am able. > > > > > > > > > > Cheers! > > > > > > > > > > > --- > > > > > > mm/vmstat.c | 7 ++++++- > > > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > > > index 0889b75cef14..1badc24a21ff 100644 > > > > > > --- a/mm/vmstat.c > > > > > > +++ b/mm/vmstat.c > > > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > > > > > { > > > > > > int cpu; > > > > > > > > > > > > - for_each_possible_cpu(cpu) > > > > > > + for_each_possible_cpu(cpu) { > > > > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > > > > vmstat_update); > > > > > > > > > > > > + /* Will be enabled on vmstat_cpu_online() */ > > > > > > + if (!cpu_online(cpu)) > > > > > > This might actually be workable as something simpler, on assumption there > > > > Sorry about late response. And thank you very much, Huacai. Your suggestion > > looks great. Much simpler and intuitive. > > > > > can be no race here (I don't think so right?). > > > > IIUC, there is no race since smp_init() always runs before init_mm_internals(). > > At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary > > CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks > > simple and the best. > > > > > > > > Koichiro - could you check this and see whether it resolves the issue and > > > whether you feel this is sensible? > > > > I tested it and it seems to work well both on booting and cpuhp events. > > > > (By the way, in my previous email comment [1], I forgot to mention that I also > > applied other unmerged patches, [2] and [3], just to be able to run such random > > transisioning test. This time here, I just tested by switching CPUHP_ONLINE <-> > > CPUHP_OFFLINE only.) > > > > [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55 > > [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/ > > [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/ > > > > > > > > Might be worth expanding comment to say that we disable on offline, enable > > > on online and we're just providing symmetry here. > > > > Sounds good. Like this? > > > > - for_each_possible_cpu(cpu) > > + for_each_possible_cpu(cpu) { > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > vmstat_update); > > > > + /* > > + * For secondary CPUs during CPU hotplug scenarios, > > + * vmstat_cpu_online() will enable the work. > > + * mm/vmstat:online enables and disables vmstat_work > > + * symmetrically during CPU hotplug events. > > + */ > > + if (!cpu_online(cpu)) > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > + } > > > > > > Before submitting a revised patch, I'd like to confirm: > > Huacai, would you be comfortable with me adding your Signed-off-by to the > > commit, or would you prefer a Suggested-by tag instead? > Since the original V2 patch has been reverted in mainline, so I think > you will send a V3 which integrate my suggestion, right? I was considering sending a separate patch with a title distinct from the original, while referencing both the original (v2) patch and the revert commit. However, I'm fine with either approach. If there’s any documentation that mentions a recommended method in this kind of situation, please let me know. Personally, I’m not a fan of multiple commits with identical titles appearing in a branch. > > Both Signed-off-by and Suggested-by is OK for me. Alright, thanks! -Koichiro Den > > Huacai > > > > > Thanks. > > > > -Koichiro Den > > > > > > > > > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > > > + } > > > > > > + > > > > > > schedule_delayed_work(&shepherd, > > > > > > round_jiffies_relative(sysctl_stat_interval)); > > > > > > } > > > > > > -- > > > > > > 2.43.5 > > > > > > > > > > > > > > > > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] Simple fix 2025-01-08 3:41 ` Koichiro Den @ 2025-01-08 3:57 ` Huacai Chen 0 siblings, 0 replies; 22+ messages in thread From: Huacai Chen @ 2025-01-08 3:57 UTC (permalink / raw) To: Koichiro Den Cc: Lorenzo Stoakes, Huacai Chen, Andrew Morton, linux-mm, Sebastian Andrzej Siewior, Mark Rutland, Charalampos Mitrodimas On Wed, Jan 8, 2025 at 11:41 AM Koichiro Den <koichiro.den@canonical.com> wrote: > > On Wed, Jan 08, 2025 at 10:31:40AM +0800, Huacai Chen wrote: > > Hi, Koichiro, > > > > On Wed, Jan 8, 2025 at 10:22 AM Koichiro Den <koichiro.den@canonical.com> wrote: > > > > > > On Tue, Jan 07, 2025 at 11:00:10AM +0000, Lorenzo Stoakes wrote: > > > > On Tue, Jan 07, 2025 at 06:29:03PM +0800, Huacai Chen wrote: > > > > > Hi, Lorenzo, > > > > > > > > > > On Tue, Jan 7, 2025 at 4:48 PM Lorenzo Stoakes > > > > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > > > > > > > On Tue, Jan 07, 2025 at 09:18:48AM +0800, Huacai Chen wrote: > > > > > > > Hi, all, I like simple fixes, so is this acceptable (based on an early > > > > > > > version from Koichiro Den)? > > > > > > > > > > > > No not at all. This is bizarre - in the mail you are replying to Koichiro > > > > > > agrees with me that the approach of his code that you've sent here (I don't > > > > > > know why) is fundamentally broken and suggest another. > > > > > > > > > > > > I am at a loss as to why you've sent this? Perhaps a miscommunication > > > > > > somewhere? :) > > > > > > > > > > > > In any case, please don't send '[PATCH] xxx' mails that aren't intended to > > > > > > be patches, a better form of this would be to say 'oh can we just do...' > > > > > > then to put this code in the mail underneath, without any '[PATCH]' prefix. > > > > > I wasn't in the CC list, and I also found the bug yesterday, so I can > > > > > only reply to this email with "git send-email --in-reply-to". This is > > > > > the reason why my reply looks so stranne. > > > > > > > > > > > > > > > > > But please do review the discussion - the below is insufficient as simple > > > > > > as it is (sadly) because the boot CPU's delayed work will never be > > > > > > executed. > > > > > Koichiro's simple fix causes the boot CPU's delayed work to never be > > > > > executed, this is obvious, and of course I have read the early > > > > > discussion. And so I improve it, with a "cpu_online()" checking, then > > > > > the boot CPU is unaffected. > > > > > > > > With respect Haucai, this is not how you engage in kernel discussion. You > > > > could simply have replied to the mail or given more information, you now > > > > have two people telling you this, please take it on board. > > > > > > > > And it caused me to miss your actually quite valuable suggestion so this is > > > > beneficial for all! :) > > > > > > > > ANYWAY, moving on to the technical side: > > > > > > > > > > > > > > Huacai > > > > > > > > > > > > > > > > > I will take a look at Koichiro's new approach as soon as I am able. > > > > > > > > > > > > Cheers! > > > > > > > > > > > > > --- > > > > > > > mm/vmstat.c | 7 ++++++- > > > > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > > > > > > index 0889b75cef14..1badc24a21ff 100644 > > > > > > > --- a/mm/vmstat.c > > > > > > > +++ b/mm/vmstat.c > > > > > > > @@ -2122,10 +2122,15 @@ static void __init start_shepherd_timer(void) > > > > > > > { > > > > > > > int cpu; > > > > > > > > > > > > > > - for_each_possible_cpu(cpu) > > > > > > > + for_each_possible_cpu(cpu) { > > > > > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > > > > > vmstat_update); > > > > > > > > > > > > > > + /* Will be enabled on vmstat_cpu_online() */ > > > > > > > + if (!cpu_online(cpu)) > > > > > > > > This might actually be workable as something simpler, on assumption there > > > > > > Sorry about late response. And thank you very much, Huacai. Your suggestion > > > looks great. Much simpler and intuitive. > > > > > > > can be no race here (I don't think so right?). > > > > > > IIUC, there is no race since smp_init() always runs before init_mm_internals(). > > > At this stage (i.e. start_shepherd_timer()), for cpuhp scenario, the secondary > > > CPUs haven’t even been brought up yet. So the simple 'cpu_online' check looks > > > simple and the best. > > > > > > > > > > > Koichiro - could you check this and see whether it resolves the issue and > > > > whether you feel this is sensible? > > > > > > I tested it and it seems to work well both on booting and cpuhp events. > > > > > > (By the way, in my previous email comment [1], I forgot to mention that I also > > > applied other unmerged patches, [2] and [3], just to be able to run such random > > > transisioning test. This time here, I just tested by switching CPUHP_ONLINE <-> > > > CPUHP_OFFLINE only.) > > > > > > [1]: https://lore.kernel.org/linux-mm/7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local/T/#m11d983715699d3cea295b8618aba7b6ccec4db55 > > > [2]: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com/ > > > [3]: https://lore.kernel.org/all/20241220141538.4018232-1-koichiro.den@canonical.com/ > > > > > > > > > > > Might be worth expanding comment to say that we disable on offline, enable > > > > on online and we're just providing symmetry here. > > > > > > Sounds good. Like this? > > > > > > - for_each_possible_cpu(cpu) > > > + for_each_possible_cpu(cpu) { > > > INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), > > > vmstat_update); > > > > > > + /* > > > + * For secondary CPUs during CPU hotplug scenarios, > > > + * vmstat_cpu_online() will enable the work. > > > + * mm/vmstat:online enables and disables vmstat_work > > > + * symmetrically during CPU hotplug events. > > > + */ > > > + if (!cpu_online(cpu)) > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > + } > > > > > > > > > Before submitting a revised patch, I'd like to confirm: > > > Huacai, would you be comfortable with me adding your Signed-off-by to the > > > commit, or would you prefer a Suggested-by tag instead? > > Since the original V2 patch has been reverted in mainline, so I think > > you will send a V3 which integrate my suggestion, right? > > I was considering sending a separate patch with a title distinct from the > original, while referencing both the original (v2) patch and the revert > commit. However, I'm fine with either approach. Don't separate, just sending a single V3 patch is OK for me. Huacai > > If there’s any documentation that mentions a recommended method in this > kind of situation, please let me know. Personally, I’m not a fan of > multiple commits with identical titles appearing in a branch. > > > > > Both Signed-off-by and Suggested-by is OK for me. > > Alright, thanks! > > -Koichiro Den > > > > > Huacai > > > > > > > > Thanks. > > > > > > -Koichiro Den > > > > > > > > > > > > > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > > > > > > > + } > > > > > > > + > > > > > > > schedule_delayed_work(&shepherd, > > > > > > > round_jiffies_relative(sysctl_stat_interval)); > > > > > > > } > > > > > > > -- > > > > > > > 2.43.5 > > > > > > > > > > > > > > > > > > > > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-01-08 3:58 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-12-21 3:33 [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() Koichiro Den 2025-01-03 23:33 ` Lorenzo Stoakes 2025-01-04 4:00 ` Koichiro Den 2025-01-06 2:18 ` Charalampos Mitrodimas 2025-01-06 10:04 ` Mark Rutland 2025-01-06 10:18 ` Geert Uytterhoeven 2025-01-06 10:52 ` Lorenzo Stoakes 2025-01-06 12:53 ` Charalampos Mitrodimas 2025-01-06 12:58 ` Lorenzo Stoakes 2025-01-06 13:03 ` Koichiro Den 2025-01-06 13:53 ` Koichiro Den 2025-01-07 1:18 ` [PATCH] Simple fix Huacai Chen 2025-01-07 3:35 ` Matthew Wilcox 2025-01-07 3:58 ` Huacai Chen 2025-01-07 8:47 ` Lorenzo Stoakes 2025-01-07 10:29 ` Huacai Chen 2025-01-07 11:00 ` Lorenzo Stoakes 2025-01-08 2:22 ` Koichiro Den 2025-01-08 2:26 ` Koichiro Den 2025-01-08 2:31 ` Huacai Chen 2025-01-08 3:41 ` Koichiro Den 2025-01-08 3:57 ` Huacai Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox