From: Frederic Weisbecker <frederic@kernel.org>
To: Waiman Long <llong@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Gabriele Monaco" <gmonaco@redhat.com>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"Ingo Molnar" <mingo@redhat.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Jens Axboe" <axboe@kernel.dk>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
"Marco Crivellari" <marco.crivellari@suse.com>,
"Michal Hocko" <mhocko@suse.com>,
"Muchun Song" <muchun.song@linux.dev>,
"Paolo Abeni" <pabeni@redhat.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Phil Auld" <pauld@redhat.com>,
"Rafael J . Wysocki" <rafael@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Shakeel Butt" <shakeel.butt@linux.dev>,
"Simon Horman" <horms@kernel.org>, "Tejun Heo" <tj@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Will Deacon" <will@kernel.org>,
cgroups@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-block@vger.kernel.org, linux-mm@kvack.org,
linux-pci@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 22/33] kthread: Include unbound kthreads in the managed affinity list
Date: Wed, 5 Nov 2025 17:57:00 +0100 [thread overview]
Message-ID: <aQuB3Oy7i6Z0SJlA@localhost.localdomain> (raw)
In-Reply-To: <ba437129-062a-4a2f-a753-64945e9a13ff@redhat.com>
Le Tue, Oct 21, 2025 at 06:42:59PM -0400, Waiman Long a écrit :
>
> On 10/13/25 4:31 PM, Frederic Weisbecker wrote:
> > The managed affinity list currently contains only unbound kthreads that
> > have affinity preferences. Unbound kthreads globally affine by default
> > are outside of the list because their affinity is automatically managed
> > by the scheduler (through the fallback housekeeping mask) and by cpuset.
> >
> > However in order to preserve the preferred affinity of kthreads, cpuset
> > will delegate the isolated partition update propagation to the
> > housekeeping and kthread code.
> >
> > Prepare for that with including all unbound kthreads in the managed
> > affinity list.
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> > kernel/kthread.c | 59 ++++++++++++++++++++++++------------------------
> > 1 file changed, 30 insertions(+), 29 deletions(-)
> >
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index c4dd967e9e9c..cba3d297f267 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -365,9 +365,10 @@ static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpum
> > if (kthread->preferred_affinity) {
> > pref = kthread->preferred_affinity;
> > } else {
> > - if (WARN_ON_ONCE(kthread->node == NUMA_NO_NODE))
> > - return;
> > - pref = cpumask_of_node(kthread->node);
> > + if (kthread->node == NUMA_NO_NODE)
> > + pref = housekeeping_cpumask(HK_TYPE_KTHREAD);
> > + else
> > + pref = cpumask_of_node(kthread->node);
> > }
> > cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD));
> > @@ -380,32 +381,29 @@ static void kthread_affine_node(void)
> > struct kthread *kthread = to_kthread(current);
> > cpumask_var_t affinity;
> > - WARN_ON_ONCE(kthread_is_per_cpu(current));
> > + if (WARN_ON_ONCE(kthread_is_per_cpu(current)))
> > + return;
> > - if (kthread->node == NUMA_NO_NODE) {
> > - housekeeping_affine(current, HK_TYPE_KTHREAD);
> > - } else {
> > - if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) {
> > - WARN_ON_ONCE(1);
> > - return;
> > - }
> > -
> > - mutex_lock(&kthread_affinity_lock);
> > - WARN_ON_ONCE(!list_empty(&kthread->affinity_node));
> > - list_add_tail(&kthread->affinity_node, &kthread_affinity_list);
> > - /*
> > - * The node cpumask is racy when read from kthread() but:
> > - * - a racing CPU going down will either fail on the subsequent
> > - * call to set_cpus_allowed_ptr() or be migrated to housekeepers
> > - * afterwards by the scheduler.
> > - * - a racing CPU going up will be handled by kthreads_online_cpu()
> > - */
> > - kthread_fetch_affinity(kthread, affinity);
> > - set_cpus_allowed_ptr(current, affinity);
> > - mutex_unlock(&kthread_affinity_lock);
> > -
> > - free_cpumask_var(affinity);
> > + if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) {
> > + WARN_ON_ONCE(1);
> > + return;
> > }
> > +
> > + mutex_lock(&kthread_affinity_lock);
> > + WARN_ON_ONCE(!list_empty(&kthread->affinity_node));
> > + list_add_tail(&kthread->affinity_node, &kthread_affinity_list);
> > + /*
> > + * The node cpumask is racy when read from kthread() but:
> > + * - a racing CPU going down will either fail on the subsequent
> > + * call to set_cpus_allowed_ptr() or be migrated to housekeepers
> > + * afterwards by the scheduler.
> > + * - a racing CPU going up will be handled by kthreads_online_cpu()
> > + */
> > + kthread_fetch_affinity(kthread, affinity);
> > + set_cpus_allowed_ptr(current, affinity);
> > + mutex_unlock(&kthread_affinity_lock);
> > +
> > + free_cpumask_var(affinity);
> > }
> > static int kthread(void *_create)
> > @@ -924,8 +922,11 @@ static int kthreads_online_cpu(unsigned int cpu)
> > ret = -EINVAL;
> > continue;
> > }
> > - kthread_fetch_affinity(k, affinity);
> > - set_cpus_allowed_ptr(k->task, affinity);
> > +
> > + if (k->preferred_affinity || k->node != NUMA_NO_NODE) {
> > + kthread_fetch_affinity(k, affinity);
> > + set_cpus_allowed_ptr(k->task, affinity);
> > + }
> > }
>
> My understanding of kthreads_online_cpu() is that hotplug won't affect the
> affinity returned from kthread_fetch_affinity().
It should. The onlining CPU is considered online at this point and might
be part of the returned kthread_fetch_affinity().
> However, set_cpus_allowed_ptr() will mask out all the offline CPUs. So if the given
> "cpu" to be brought online is in the returned affinity, we should call
> set_cpus_allowed_ptr() to add this cpu into its affinity mask though the
> current code will call it even it is not strictly necessary.
I'm not sure I understand what you mean.
> This change will not do this update to NUMA_NO_NODE kthread with no preferred_affinity,
> is this a problem?
Ah, so unbound kthreads without preferred affinity are already affine to all
possible CPUs (or housekeeping), whether those CPUs are online or not. So we
don't need to add newly online CPUs to them.
kthreads with a preferred affinity or node are different because if none of
their preferred CPUs are online, they must be affine to housekeeping. But as
soon as one of their preferred CPU becomes online, they must be affine to them.
Hence the different treatment. I'm adding a big comment to explain that.
Thanks!
--
Frederic Weisbecker
SUSE Labs
next prev parent reply other threads:[~2025-11-05 16:57 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 20:31 [PATCH 00/33 v3] cpuset/isolation: Honour kthreads preferred affinity Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 01/33] PCI: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-10-14 20:53 ` Bjorn Helgaas
2025-10-31 15:30 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 02/33] cpu: Revert "cpu/hotplug: Prevent self deadlock on CPU hot-unplug" Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 03/33] memcg: Prepare to protect against concurrent isolated cpuset change Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 04/33] mm: vmstat: " Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 05/33] sched/isolation: Save boot defined domain flags Frederic Weisbecker
2025-10-23 15:45 ` Valentin Schneider
2025-10-31 15:36 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 06/33] cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOT Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 07/33] driver core: cpu: Convert /sys/devices/system/cpu/isolated " Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 08/33] net: Keep ignoring isolated cpuset change Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 09/33] block: Protect against concurrent " Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 10/33] cpu: Provide lockdep check for CPU hotplug lock write-held Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 11/33] cpuset: Provide lockdep check for cpuset lock held Frederic Weisbecker
2025-10-14 13:29 ` Chen Ridong
2025-10-31 16:08 ` Frederic Weisbecker
2025-11-03 2:32 ` Chen Ridong
2025-10-13 20:31 ` [PATCH 12/33] sched/isolation: Convert housekeeping cpumasks to rcu pointers Frederic Weisbecker
2025-10-21 1:46 ` Chen Ridong
2025-10-21 1:57 ` Chen Ridong
2025-10-21 4:03 ` Waiman Long
2025-10-31 16:17 ` Frederic Weisbecker
2025-10-31 19:29 ` Waiman Long
2025-11-03 2:22 ` Chen Ridong
2025-11-05 15:18 ` Frederic Weisbecker
2025-10-21 3:49 ` Waiman Long
2025-11-05 15:23 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 13/33] cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset Frederic Weisbecker
2025-10-21 4:10 ` Waiman Long
2025-10-22 1:36 ` Chen Ridong
2025-11-05 15:42 ` Frederic Weisbecker
2025-11-05 19:33 ` Waiman Long
2025-10-21 13:39 ` Waiman Long
2025-11-05 15:45 ` Frederic Weisbecker
2025-11-05 19:39 ` Waiman Long
2025-10-31 12:59 ` Phil Auld
2025-11-05 15:57 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 14/33] sched/isolation: Flush memcg workqueues on cpuset isolated partition change Frederic Weisbecker
2025-10-21 19:16 ` Waiman Long
2025-10-21 19:28 ` Waiman Long
2025-11-05 16:20 ` Frederic Weisbecker
2025-11-05 16:17 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 15/33] sched/isolation: Flush vmstat " Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 16/33] PCI: Flush PCI probe workqueue " Frederic Weisbecker
2025-10-14 20:50 ` Bjorn Helgaas
2025-11-05 16:28 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 17/33] cpuset: Propagate cpuset isolation update to workqueue through housekeeping Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 18/33] cpuset: Remove cpuset_cpu_is_isolated() Frederic Weisbecker
2025-10-29 18:05 ` Waiman Long
2025-11-05 16:36 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 19/33] sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated() Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 20/33] PCI: Remove superfluous HK_TYPE_WQ check Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 21/33] kthread: Refine naming of affinity related fields Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 22/33] kthread: Include unbound kthreads in the managed affinity list Frederic Weisbecker
2025-10-21 22:42 ` Waiman Long
2025-11-05 16:57 ` Frederic Weisbecker [this message]
2025-10-13 20:31 ` [PATCH 23/33] kthread: Include kthreadd to " Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 24/33] kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 25/33] sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAIN Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 26/33] cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 27/33] sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAIN Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 28/33] kthread: Honour kthreads preferred affinity after cpuset changes Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 29/33] kthread: Comment on the purpose and placement of kthread_affine_node() call Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 30/33] kthread: Add API to update preferred affinity on kthread runtime Frederic Weisbecker
2025-10-14 12:35 ` Simon Horman
2025-11-05 17:26 ` Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 31/33] kthread: Document kthread_affine_preferred() Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 32/33] genirq: Correctly handle preferred kthreads affinity Frederic Weisbecker
2025-10-13 20:31 ` [PATCH 33/33] doc: Add housekeeping documentation Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQuB3Oy7i6Z0SJlA@localhost.localdomain \
--to=frederic@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=catalin.marinas@arm.com \
--cc=cgroups@vger.kernel.org \
--cc=dakr@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gmonaco@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=horms@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
--cc=llong@redhat.com \
--cc=marco.crivellari@suse.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox