From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54BCBE77188 for ; Mon, 6 Jan 2025 12:53:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BA38A6B008C; Mon, 6 Jan 2025 07:53:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B53886B0092; Mon, 6 Jan 2025 07:53:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1B816B0093; Mon, 6 Jan 2025 07:53:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 81A9C6B008C for ; Mon, 6 Jan 2025 07:53:44 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E794FC0DB5 for ; Mon, 6 Jan 2025 12:53:43 +0000 (UTC) X-FDA: 82977018726.29.A0C8580 Received: from mout02.posteo.de (mout02.posteo.de [185.67.36.66]) by imf24.hostedemail.com (Postfix) with ESMTP id D675D180009 for ; Mon, 6 Jan 2025 12:53:41 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=posteo.net header.s=2017 header.b=r7z6+hlt; spf=pass (imf24.hostedemail.com: domain of charmitro@posteo.net designates 185.67.36.66 as permitted sender) smtp.mailfrom=charmitro@posteo.net; dmarc=pass (policy=none) header.from=posteo.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736168022; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+5RPVjSqHvW5fSPmj3VnEYGbe5Y7XUTerBngs/CHl2I=; b=HGnZsIWHjIrscN8dQLUHjlZuHHdj0c+ewsynM4k+Wp4XlwHjR6GB5zcUSpOciKClq79iqw gfxI5McaGmd4g3dkPZtDGTZCVv6j0lVmP/7HzpCkm1fSGFXC24kz4nyroXUTNmgJ8OAEKu 4qtm74pOnlyB7MWi0b2ruwj42nSStWk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=posteo.net header.s=2017 header.b=r7z6+hlt; spf=pass (imf24.hostedemail.com: domain of charmitro@posteo.net designates 185.67.36.66 as permitted sender) smtp.mailfrom=charmitro@posteo.net; dmarc=pass (policy=none) header.from=posteo.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736168022; a=rsa-sha256; cv=none; b=oeTK1ta1Z/rsokIvjDR2PuaPn1Cx6cPgujFWbMbbJglN8f9jAUNnLqDM+54EF07OIwrphY yt2NQ1kv9FXOmbaRpKOaZzM0DBzhM1F3BVCfByBD5pFtuRG2NWTL+1NRRDv1cFW+SyUC6V SaO5YNAr0Lrt5Brq7a4b4IgXi7otEOI= Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 770E9240101 for ; Mon, 6 Jan 2025 13:53:39 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1736168019; bh=NwPD9OTbzu7amAfdX3cPOauvw7fdeW5apjUajyr5o4k=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type: From; b=r7z6+hlt/T/E9O0wr+I1iPvdeBdT83r5+1v8XkkhIei2ukm1bCT9t3mX7Nyqcvms5 eylCh67J7TXlKhgE34SpUFUBlUP58Ya6R1SGXDsloRrmywdq6KA/eK1VsI8qFoxhOk 3hj4feXBs/KxZSG0GQQ9UOlaoM0JkMUXi60WrqAuybQKNuRbiKW/XfHq8uQPBe/VLs qXHfZt/PL8mx8hy+nMJD9dLr51uWDNbgTjEh0ndTo/MnNQ/Z4egfeNMdSOMj5LWy8c 0Q1z4X7jd+VrFiz7PDCuQssUYyedtApktCv3g/fiUu5dAlKGr53m4xa3Bwkr4z2L0+ BMkMrebZEXL1A== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4YRYyT5nydz9rxL; Mon, 6 Jan 2025 13:53:37 +0100 (CET) From: Charalampos Mitrodimas To: Lorenzo Stoakes Cc: Koichiro Den , linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH v2] vmstat: disable vmstat_work on vmstat_cpu_down_prep() In-Reply-To: <7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local> (Lorenzo Stoakes's message of "Mon, 6 Jan 2025 10:52:37 +0000") References: <20241221033321.4154409-1-koichiro.den@canonical.com> <2q7ge6cgzeowqffyn6w6ed4trhaaumv5ubdgud2tsoolen7wpw@4akuomhbacyh> <7ed97096-859e-46d0-8f27-16a2298a8914@lucifer.local> Date: Mon, 06 Jan 2025 12:53:36 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Rspamd-Server: rspam05 X-Stat-Signature: 4pifnz69qt99wjrnxsft97g5pfiqnt97 X-Rspamd-Queue-Id: D675D180009 X-Rspam-User: X-HE-Tag: 1736168021-149518 X-HE-Meta: U2FsdGVkX18rniNswUiEoaCx68HvO8uGtfHNdq6FaMxxMhNzE8Ug3084Fnp0F9MfbpkYTHxSyxzkjN25RpLDgO4Jtn6wUh7SIZmSWX911bubVWKLzSuy/7gNBpnJRZ3g7WM+r/KHMtmQbRuZEHwKVcux+FK6Rk3Ihj/EFXzQUre/PY1eKHleKwtBBeFK6e3LkSJsdRGT1NxXcFJVLnWq5k90Tv+dG7ImsuS6AjX6z/obaableC/eGTjuiGm/iNRmwwiiQJMBmJ6qC/k9tLIsvvTNtqyetJjPXjlXuwisAurLMDYuFlvU9bJq2ypInYSw2sKHNdlg/lPk65DVinJzE1RMF4nGBTJFu1mheZ4c1SkNCgEtDTD+X6EX/pYeeK/enq1+q2MUwkcR07wjCp3xYS/eLDSYXOrT7YLqK8eVNkfZ1ZSgIuQ99Rf7sB6ErOtbmcxcB8qdd/VRGz2sbSf/gvGUGDqXhkgvsu1yZV5pb1DvGanoeM+835nn9zyQO1cWUJNc43hxEniiynAKX+VQyCuCb4RgiUhqOgY+Xr7RBOnMYTrHx1wKDqHlUZENkyLQKibc3wf+6Y/J1HIDNI1Ls3pMlZOAhp/3iC96GD5gDD4SYCInhkowk0IEdprAQD4a97dnlkj8elVm8Jp3Dn1c44ySwBLzBOoYa1EO61dTPBeRAV3Cd78MOsdopuhos1AFX4qNx6wR3XA8zpNbMfVTSVc90WKe0MrvdLVR7QGQejhox2QXwco8J4tQXFil9td4kwsjdhNOTNZSu5dXcxqdC947leONwCUcTIHsN0IHKYZAgr7LfDKbhPGfPKi15pJWKrgZbE20aeIAOBSb80ZF0kDv0XavSDvfMZ4oSMZAPdFf+tWpB0DSYeP/lWrL+e+0+M0ZsgdNcCaAvgjvFq9Oy7padwKC8nbb6QKQO4E323nP6eGdV8Mu+xdUPCDBsx5TGn8hiR78jqbpmoT9wQN PmQ1PqV+ zZ6yfOK05ii89d+AO35buZMRNTFRlIqROsu+6SYo0NElYi/HOzUcW1bkJGwc6s4tpMmIggi8XdlTD4dkYt1gIQTuHYkSJjXhbnV+l8UxnztJMazb5AHnxHLM8KCxptNC5gAXlfXhd26otaqre7itskroCiFplOgKD41EYKG/rqGnibdTo4kPVOflKBD0s5U3wrHxoniOrVT2jXqZqwTQUaTZzMl+E7Y06hr4I6NRIbClROp5V5E1IT/WVxh7+38b2yshY3XAK/qbLIgfpQRAGc6wto3qqCIsOouD5vzcXaM4UerpGRXf2jpQ/ffhaQfaMKixn+ntd9CnH+AvLvbD88J01Uj3LT0uRGkEefvJxq0eIWncC/1SVTrl4Hi/GmF3cCYruRKkXvGFSLGq1TMe+TLcUAmEuYwQpKBJoIKu+fphQjdLVO6hg26IvJJXvt/J9pcNr3K+Dj0Wm8R0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Lorenzo Stoakes writes: > +cc tglx, peterz for insight on CPU hot plug > > On Sat, Jan 04, 2025 at 01:00:17PM +0900, Koichiro Den wrote: >> On Fri, Jan 03, 2025 at 11:33:19PM +0000, Lorenzo Stoakes wrote: >> > On Sat, Dec 21, 2024 at 12:33:20PM +0900, Koichiro Den wrote: >> > > Even after mm/vmstat:online teardown, shepherd may still queue work for >> > > the dying cpu until the cpu is removed from online mask. While it's >> > > quite rare, this means that after unbind_workers() unbinds a per-cpu >> > > kworker, it potentially runs vmstat_update for the dying CPU on an >> > > irrelevant cpu before entering atomic AP states. >> > > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the >> > > backtrace. >> > > >> > > BUG: using smp_processor_id() in preemptible [00000000] code: \ >> > > kworker/7:3/1702 >> > > caller is refresh_cpu_vm_stats+0x235/0x5f0 >> > > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G >> > > Tainted: [N]=TEST >> > > Workqueue: mm_percpu_wq vmstat_update >> > > Call Trace: >> > > >> > > dump_stack_lvl+0x8d/0xb0 >> > > check_preemption_disabled+0xce/0xe0 >> > > refresh_cpu_vm_stats+0x235/0x5f0 >> > > vmstat_update+0x17/0xa0 >> > > process_one_work+0x869/0x1aa0 >> > > worker_thread+0x5e5/0x1100 >> > > kthread+0x29e/0x380 >> > > ret_from_fork+0x2d/0x70 >> > > ret_from_fork_asm+0x1a/0x30 >> > > >> > > >> > > So, for mm/vmstat:online, disable vmstat_work reliably on teardown and >> > > symmetrically enable it on startup. >> > > >> > > Signed-off-by: Koichiro Den >> > >> > Hi, >> > >> > I observed a warning in my qemu and real hardware, which I bisected to this commit: >> > >> > [ 0.087733] ------------[ cut here ]------------ >> > [ 0.087733] workqueue: work disable count underflowed >> > [ 0.087733] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:4313 enable_work+0xb5/0xc0 >> > >> > This is: >> > >> > static void work_offqd_enable(struct work_offq_data *offqd) >> > { >> > if (likely(offqd->disable > 0)) >> > offqd->disable--; >> > else >> > WARN_ONCE(true, "workqueue: work disable count underflowed\n"); <-- this line >> > } >> > >> > So (based on this code) presumably an enable is only required if previously >> > disabled, and this code is being called on startup unconditionally without >> > the work having been disabled previously? I'm not hugely familiar with >> > delayed workqueue implementation details. >> > >> > [ 0.087733] Modules linked in: >> > [ 0.087733] CPU: 1 UID: 0 PID: 21 Comm: cpuhp/1 Not tainted 6.13.0-rc4+ #58 >> > [ 0.087733] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 >> > [ 0.087733] RIP: 0010:enable_work+0xb5/0xc0 >> > [ 0.087733] Code: 6f b8 01 00 74 0f 31 d2 be 01 00 00 00 eb b5 90 0f 0b 90 eb ca c6 05 60 6f b8 01 01 90 48 c7 c7 b0 a9 6e 82 e8 4c a4 fd ff 90 <0f> 0b 90 90 eb d6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 >> > [ 0.087733] RSP: 0018:ffffc900000cbe30 EFLAGS: 00010092 >> > [ 0.087733] RAX: 0000000000000029 RBX: ffff888263ca9d60 RCX: 0000000000000000 >> > [ 0.087733] RDX: 0000000000000001 RSI: ffffc900000cbce8 RDI: 0000000000000001 >> > [ 0.087733] RBP: ffffc900000cbe30 R08: 00000000ffffdfff R09: ffffffff82b12f08 >> > [ 0.087733] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000000c4 >> > [ 0.087733] R13: ffffffff81278d90 R14: 0000000000000000 R15: ffff888263c9c648 >> > [ 0.087733] FS: 0000000000000000(0000) GS:ffff888263c80000(0000) knlGS:0000000000000000 >> > [ 0.087733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > [ 0.087733] CR2: 0000000000000000 CR3: 0000000002a2e000 CR4: 0000000000750ef0 >> > [ 0.087733] PKRU: 55555554 >> > [ 0.087733] Call Trace: >> > [ 0.087733] >> > [ 0.087733] ? enable_work+0xb5/0xc0 >> > [ 0.087733] ? __warn.cold+0x93/0xf2 >> > [ 0.087733] ? enable_work+0xb5/0xc0 >> > [ 0.087733] ? report_bug+0xff/0x140 >> > [ 0.087733] ? handle_bug+0x54/0x90 >> > [ 0.087733] ? exc_invalid_op+0x17/0x70 >> > [ 0.087733] ? asm_exc_invalid_op+0x1a/0x20 >> > [ 0.087733] ? __pfx_vmstat_cpu_online+0x10/0x10 >> > [ 0.087733] ? enable_work+0xb5/0xc0 >> > [ 0.087733] vmstat_cpu_online+0x5c/0x70 >> > [ 0.087733] cpuhp_invoke_callback+0x133/0x440 >> > [ 0.087733] cpuhp_thread_fun+0x95/0x150 >> > [ 0.087733] smpboot_thread_fn+0xd5/0x1d0 >> > [ 0.087734] ? __pfx_smpboot_thread_fn+0x10/0x10 >> > [ 0.087735] kthread+0xc8/0xf0 >> > [ 0.087737] ? __pfx_kthread+0x10/0x10 >> > [ 0.087738] ret_from_fork+0x2c/0x50 >> > [ 0.087739] ? __pfx_kthread+0x10/0x10 >> > [ 0.087740] ret_from_fork_asm+0x1a/0x30 >> > [ 0.087742] >> > [ 0.087742] ---[ end trace 0000000000000000 ]--- >> > >> > >> > > --- >> > > v1: https://lore.kernel.org/all/20241220134234.3809621-1-koichiro.den@canonical.com/ >> > > --- >> > > mm/vmstat.c | 3 ++- >> > > 1 file changed, 2 insertions(+), 1 deletion(-) >> > > >> > > diff --git a/mm/vmstat.c b/mm/vmstat.c >> > > index 4d016314a56c..0889b75cef14 100644 >> > > --- a/mm/vmstat.c >> > > +++ b/mm/vmstat.c >> > > @@ -2148,13 +2148,14 @@ static int vmstat_cpu_online(unsigned int cpu) >> > > if (!node_state(cpu_to_node(cpu), N_CPU)) { >> > > node_set_state(cpu_to_node(cpu), N_CPU); >> > > } >> > > + enable_delayed_work(&per_cpu(vmstat_work, cpu)); >> > >> > Probably needs to be 'if disabled' here, as this is invoked on normal >> > startup when the work won't have been disabled? >> > >> > Had a brief look at code and couldn't see how that could be done >> > however... and one would need to be careful about races... Tricky! >> > >> > > >> > > return 0; >> > > } >> > > >> > > static int vmstat_cpu_down_prep(unsigned int cpu) >> > > { >> > > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> > > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> > > return 0; >> > > } >> > > >> > > -- >> > > 2.43.0 >> > > >> > > >> > >> > Let me know if you need any more details, .config etc. >> > >> > I noticed this warning on a real box too (in both cases running akpm's >> > mm-unstable branch), FWIW. >> >> Thank you for the report. I was able to reproduce the warning and now >> wonder how I missed it.. My oversight, apologies. >> >> In my current view, the simplest solution would be to make sure a local >> vmstat_work is disabled until vmstat_cpu_online() runs for the cpu, even >> during boot-up. The following patch suppresses the warning: >> >> diff --git a/mm/vmstat.c b/mm/vmstat.c >> index 0889b75cef14..19ceed5d34bf 100644 >> --- a/mm/vmstat.c >> +++ b/mm/vmstat.c >> @@ -2122,10 +2122,14 @@ static void __init start_shepherd_timer(void) >> { >> int cpu; >> >> - for_each_possible_cpu(cpu) >> + for_each_possible_cpu(cpu) { >> INIT_DEFERRABLE_WORK(per_cpu_ptr(&vmstat_work, cpu), >> vmstat_update); >> >> + /* will be enabled on vmstat_cpu_online */ >> + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); >> + } >> + >> schedule_delayed_work(&shepherd, >> round_jiffies_relative(sysctl_stat_interval)); >> } >> >> If you think of a better solution later, please let me know. Otherwise, >> I'll submit a follow-up fix patch with the above diff. > > Thanks, this resolves the problem, but are we sure that _all_ CPUs will > definitely call vmstat_cpu_online()? > > I did a bit of printk output and it seems like this _didn't_ online CPU 0, > presumably the boot CPU which calls this function in the first instance? FWIW with the proposed fix I can see that all CPUs are online, grep -H . /sys/devices/system/cpu/cpu*/online /sys/devices/system/cpu/cpu0/online:1 /sys/devices/system/cpu/cpu1/online:1 /sys/devices/system/cpu/cpu2/online:1 /sys/devices/system/cpu/cpu3/online:1 /sys/devices/system/cpu/cpu4/online:1 /sys/devices/system/cpu/cpu5/online:1 /sys/devices/system/cpu/cpu6/online:1 /sys/devices/system/cpu/cpu7/online:1 > > I also see that init_mm_internals() invokes cpuhp_setup_state_nocalls() > explicitly which does _not_ call the callback, though even if it did this > would be too early as it calls start_shepherd_timer() _after_ this anyway. > > So yeah, unless I'm missing something, I think this patch is broken. > > I have added Thomas and Peter to give some insight on the CPU hotplug side. > > It feels like the patch really needs an 'enable if not already enabled' > call in vmstat_cpu_online(). > >> >> Thanks. >> >> -Koichiro