* [PATCH 11/19] kthread: Make sure kthread hasn't started while binding it
[not found] <20241211154035.75565-1-frederic@kernel.org>
@ 2024-12-11 15:40 ` Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Frederic Weisbecker
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-12-11 15:40 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
Zqiang, rcu, Uladzislau Rezki
Make sure the kthread is sleeping in the schedule_preempt_disabled()
call before calling its handler when kthread_bind[_mask]() is called
on it. This provides a sanity check verifying that the task is not
randomly blocked later at some point within its function handler, in
which case it could be just concurrently awaken, leaving the call to
do_set_cpus_allowed() without any effect until the next voluntary sleep.
Rely on the wake-up ordering to ensure that the newly introduced "started"
field returns the expected value:
TASK A TASK B
------ ------
READ kthread->started
wake_up_process(B)
rq_lock()
...
rq_unlock() // RELEASE
schedule()
rq_lock() // ACQUIRE
// schedule task B
rq_unlock()
WRITE kthread->started
Similarly, writing kthread->started before subsequent voluntary sleeps
will be visible after calling wait_task_inactive() in
__kthread_bind_mask(), reporting potential misuse of the API.
Upcoming patches will make further use of this facility.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/kthread.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index a5ac612b1609..b6f9ce475a4f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -53,6 +53,7 @@ struct kthread_create_info
struct kthread {
unsigned long flags;
unsigned int cpu;
+ int started;
int result;
int (*threadfn)(void *);
void *data;
@@ -382,6 +383,8 @@ static int kthread(void *_create)
schedule_preempt_disabled();
preempt_enable();
+ self->started = 1;
+
ret = -EINTR;
if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
cgroup_kthread_ready();
@@ -540,7 +543,9 @@ static void __kthread_bind(struct task_struct *p, unsigned int cpu, unsigned int
void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
{
+ struct kthread *kthread = to_kthread(p);
__kthread_bind_mask(p, mask, TASK_UNINTERRUPTIBLE);
+ WARN_ON_ONCE(kthread->started);
}
/**
@@ -554,7 +559,9 @@ void kthread_bind_mask(struct task_struct *p, const struct cpumask *mask)
*/
void kthread_bind(struct task_struct *p, unsigned int cpu)
{
+ struct kthread *kthread = to_kthread(p);
__kthread_bind(p, cpu, TASK_UNINTERRUPTIBLE);
+ WARN_ON_ONCE(kthread->started);
}
EXPORT_SYMBOL(kthread_bind);
--
2.46.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node
[not found] <20241211154035.75565-1-frederic@kernel.org>
2024-12-11 15:40 ` [PATCH 11/19] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
@ 2024-12-11 15:40 ` Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 13/19] mm: Create/affine kcompactd to its preferred node Frederic Weisbecker
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-12-11 15:40 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
Uladzislau Rezki, Zqiang, rcu
Kthreads attached to a preferred NUMA node for their task structure
allocation can also be assumed to run preferrably within that same node.
A more precise affinity is usually notified by calling
kthread_create_on_cpu() or kthread_bind[_mask]() before the first wakeup.
For the others, a default affinity to the node is desired and sometimes
implemented with more or less success when it comes to deal with hotplug
events and nohz_full / CPU Isolation interactions:
- kcompactd is affine to its node and handles hotplug but not CPU Isolation
- kswapd is affine to its node and ignores hotplug and CPU Isolation
- A bunch of drivers create their kthreads on a specific node and
don't take care about affining further.
Handle that default node affinity preference at the generic level
instead, provided a kthread is created on an actual node and doesn't
apply any specific affinity such as a given CPU or a custom cpumask to
bind to before its first wake-up.
This generic handling is aware of CPU hotplug events and CPU isolation
such that:
* When a housekeeping CPU goes up that is part of the node of a given
kthread, the related task is re-affined to that own node if it was
previously running on the default last resort online housekeeping set
from other nodes.
* When a housekeeping CPU goes down while it was part of the node of a
kthread, the running task is migrated (or the sleeping task is woken
up) automatically by the scheduler to other housekeepers within the
same node or, as a last resort, to all housekeepers from other nodes.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/cpuhotplug.h | 1 +
kernel/kthread.c | 106 ++++++++++++++++++++++++++++++++++++-
2 files changed, 106 insertions(+), 1 deletion(-)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index a04b73c40173..6cc5e484547c 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -240,6 +240,7 @@ enum cpuhp_state {
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RANDOM_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
+ CPUHP_AP_KTHREADS_ONLINE,
CPUHP_AP_BASE_CACHEINFO_ONLINE,
CPUHP_AP_ONLINE_DYN,
CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 40,
diff --git a/kernel/kthread.c b/kernel/kthread.c
index b6f9ce475a4f..3394ff024a5a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -35,6 +35,9 @@ static DEFINE_SPINLOCK(kthread_create_lock);
static LIST_HEAD(kthread_create_list);
struct task_struct *kthreadd_task;
+static LIST_HEAD(kthreads_hotplug);
+static DEFINE_MUTEX(kthreads_hotplug_lock);
+
struct kthread_create_info
{
/* Information passed to kthread() from kthreadd. */
@@ -53,6 +56,7 @@ struct kthread_create_info
struct kthread {
unsigned long flags;
unsigned int cpu;
+ unsigned int node;
int started;
int result;
int (*threadfn)(void *);
@@ -64,6 +68,8 @@ struct kthread {
#endif
/* To store the full name if task comm is truncated. */
char *full_name;
+ struct task_struct *task;
+ struct list_head hotplug_node;
};
enum KTHREAD_BITS {
@@ -122,8 +128,11 @@ bool set_kthread_struct(struct task_struct *p)
init_completion(&kthread->exited);
init_completion(&kthread->parked);
+ INIT_LIST_HEAD(&kthread->hotplug_node);
p->vfork_done = &kthread->exited;
+ kthread->task = p;
+ kthread->node = tsk_fork_get_node(current);
p->worker_private = kthread;
return true;
}
@@ -314,6 +323,11 @@ void __noreturn kthread_exit(long result)
{
struct kthread *kthread = to_kthread(current);
kthread->result = result;
+ if (!list_empty(&kthread->hotplug_node)) {
+ mutex_lock(&kthreads_hotplug_lock);
+ list_del(&kthread->hotplug_node);
+ mutex_unlock(&kthreads_hotplug_lock);
+ }
do_exit(0);
}
EXPORT_SYMBOL(kthread_exit);
@@ -339,6 +353,48 @@ void __noreturn kthread_complete_and_exit(struct completion *comp, long code)
}
EXPORT_SYMBOL(kthread_complete_and_exit);
+static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask)
+{
+ cpumask_and(cpumask, cpumask_of_node(kthread->node),
+ housekeeping_cpumask(HK_TYPE_KTHREAD));
+
+ if (cpumask_empty(cpumask))
+ cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD));
+}
+
+static void kthread_affine_node(void)
+{
+ struct kthread *kthread = to_kthread(current);
+ cpumask_var_t affinity;
+
+ WARN_ON_ONCE(kthread_is_per_cpu(current));
+
+ if (kthread->node == NUMA_NO_NODE) {
+ housekeeping_affine(current, HK_TYPE_KTHREAD);
+ } else {
+ if (!zalloc_cpumask_var(&affinity, GFP_KERNEL)) {
+ WARN_ON_ONCE(1);
+ return;
+ }
+
+ mutex_lock(&kthreads_hotplug_lock);
+ WARN_ON_ONCE(!list_empty(&kthread->hotplug_node));
+ list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
+ /*
+ * The node cpumask is racy when read from kthread() but:
+ * - a racing CPU going down will either fail on the subsequent
+ * call to set_cpus_allowed_ptr() or be migrated to housekeepers
+ * afterwards by the scheduler.
+ * - a racing CPU going up will be handled by kthreads_online_cpu()
+ */
+ kthread_fetch_affinity(kthread, affinity);
+ set_cpus_allowed_ptr(current, affinity);
+ mutex_unlock(&kthreads_hotplug_lock);
+
+ free_cpumask_var(affinity);
+ }
+}
+
static int kthread(void *_create)
{
static const struct sched_param param = { .sched_priority = 0 };
@@ -369,7 +425,6 @@ static int kthread(void *_create)
* back to default in case they have been changed.
*/
sched_setscheduler_nocheck(current, SCHED_NORMAL, ¶m);
- set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
@@ -385,6 +440,9 @@ static int kthread(void *_create)
self->started = 1;
+ if (!(current->flags & PF_NO_SETAFFINITY))
+ kthread_affine_node();
+
ret = -EINTR;
if (!test_bit(KTHREAD_SHOULD_STOP, &self->flags)) {
cgroup_kthread_ready();
@@ -781,6 +839,52 @@ int kthreadd(void *unused)
return 0;
}
+/*
+ * Re-affine kthreads according to their preferences
+ * and the newly online CPU. The CPU down part is handled
+ * by select_fallback_rq() which default re-affines to
+ * housekeepers in case the preferred affinity doesn't
+ * apply anymore.
+ */
+static int kthreads_online_cpu(unsigned int cpu)
+{
+ cpumask_var_t affinity;
+ struct kthread *k;
+ int ret;
+
+ guard(mutex)(&kthreads_hotplug_lock);
+
+ if (list_empty(&kthreads_hotplug))
+ return 0;
+
+ if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+ return -ENOMEM;
+
+ ret = 0;
+
+ list_for_each_entry(k, &kthreads_hotplug, hotplug_node) {
+ if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) ||
+ kthread_is_per_cpu(k->task) ||
+ k->node == NUMA_NO_NODE)) {
+ ret = -EINVAL;
+ continue;
+ }
+ kthread_fetch_affinity(k, affinity);
+ set_cpus_allowed_ptr(k->task, affinity);
+ }
+
+ free_cpumask_var(affinity);
+
+ return ret;
+}
+
+static int kthreads_init(void)
+{
+ return cpuhp_setup_state(CPUHP_AP_KTHREADS_ONLINE, "kthreads:online",
+ kthreads_online_cpu, NULL);
+}
+early_initcall(kthreads_init);
+
void __kthread_init_worker(struct kthread_worker *worker,
const char *name,
struct lock_class_key *key)
--
2.46.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 13/19] mm: Create/affine kcompactd to its preferred node
[not found] <20241211154035.75565-1-frederic@kernel.org>
2024-12-11 15:40 ` [PATCH 11/19] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Frederic Weisbecker
@ 2024-12-11 15:40 ` Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 14/19] mm: Create/affine kswapd " Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 15/19] kthread: Implement preferred affinity Frederic Weisbecker
4 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-12-11 15:40 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka,
Andrew Morton, linux-mm, Peter Zijlstra, Thomas Gleixner,
Michal Hocko
Kcompactd is dedicated to a specific node. As such it wants to be
preferrably affine to it, memory and CPUs-wise.
Use the proper kthread API to achieve that. As a bonus it takes care of
CPU-hotplug events and CPU-isolation on its behalf.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
mm/compaction.c | 43 +++----------------------------------------
1 file changed, 3 insertions(+), 40 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index a2b16b08cbbf..a31c0f5758cf 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -3154,15 +3154,9 @@ void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx)
static int kcompactd(void *p)
{
pg_data_t *pgdat = (pg_data_t *)p;
- struct task_struct *tsk = current;
long default_timeout = msecs_to_jiffies(HPAGE_FRAG_CHECK_INTERVAL_MSEC);
long timeout = default_timeout;
- const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
-
- if (!cpumask_empty(cpumask))
- set_cpus_allowed_ptr(tsk, cpumask);
-
set_freezable();
pgdat->kcompactd_max_order = 0;
@@ -3233,10 +3227,12 @@ void __meminit kcompactd_run(int nid)
if (pgdat->kcompactd)
return;
- pgdat->kcompactd = kthread_run(kcompactd, pgdat, "kcompactd%d", nid);
+ pgdat->kcompactd = kthread_create_on_node(kcompactd, pgdat, nid, "kcompactd%d", nid);
if (IS_ERR(pgdat->kcompactd)) {
pr_err("Failed to start kcompactd on node %d\n", nid);
pgdat->kcompactd = NULL;
+ } else {
+ wake_up_process(pgdat->kcompactd);
}
}
@@ -3254,30 +3250,6 @@ void __meminit kcompactd_stop(int nid)
}
}
-/*
- * It's optimal to keep kcompactd on the same CPUs as their memory, but
- * not required for correctness. So if the last cpu in a node goes
- * away, we get changed to run anywhere: as the first one comes back,
- * restore their cpu bindings.
- */
-static int kcompactd_cpu_online(unsigned int cpu)
-{
- int nid;
-
- for_each_node_state(nid, N_MEMORY) {
- pg_data_t *pgdat = NODE_DATA(nid);
- const struct cpumask *mask;
-
- mask = cpumask_of_node(pgdat->node_id);
-
- if (cpumask_any_and(cpu_online_mask, mask) < nr_cpu_ids)
- /* One of our CPUs online: restore mask */
- if (pgdat->kcompactd)
- set_cpus_allowed_ptr(pgdat->kcompactd, mask);
- }
- return 0;
-}
-
static int proc_dointvec_minmax_warn_RT_change(const struct ctl_table *table,
int write, void *buffer, size_t *lenp, loff_t *ppos)
{
@@ -3337,15 +3309,6 @@ static struct ctl_table vm_compaction[] = {
static int __init kcompactd_init(void)
{
int nid;
- int ret;
-
- ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
- "mm/compaction:online",
- kcompactd_cpu_online, NULL);
- if (ret < 0) {
- pr_err("kcompactd: failed to register hotplug callbacks.\n");
- return ret;
- }
for_each_node_state(nid, N_MEMORY)
kcompactd_run(nid);
--
2.46.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 14/19] mm: Create/affine kswapd to its preferred node
[not found] <20241211154035.75565-1-frederic@kernel.org>
` (2 preceding siblings ...)
2024-12-11 15:40 ` [PATCH 13/19] mm: Create/affine kcompactd to its preferred node Frederic Weisbecker
@ 2024-12-11 15:40 ` Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 15/19] kthread: Implement preferred affinity Frederic Weisbecker
4 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-12-11 15:40 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Michal Hocko, Vlastimil Babka, linux-mm,
Andrew Morton, Peter Zijlstra, Thomas Gleixner, Michal Hocko
kswapd is dedicated to a specific node. As such it wants to be
preferrably affine to it, memory and CPUs-wise.
Use the proper kthread API to achieve that. As a bonus it takes care of
CPU-hotplug events and CPU-isolation on its behalf.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
mm/vmscan.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 76378bc257e3..ec4eab23fb19 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7182,10 +7182,6 @@ static int kswapd(void *p)
unsigned int highest_zoneidx = MAX_NR_ZONES - 1;
pg_data_t *pgdat = (pg_data_t *)p;
struct task_struct *tsk = current;
- const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id);
-
- if (!cpumask_empty(cpumask))
- set_cpus_allowed_ptr(tsk, cpumask);
/*
* Tell the memory management that we're a "memory allocator",
@@ -7354,13 +7350,15 @@ void __meminit kswapd_run(int nid)
pgdat_kswapd_lock(pgdat);
if (!pgdat->kswapd) {
- pgdat->kswapd = kthread_run(kswapd, pgdat, "kswapd%d", nid);
+ pgdat->kswapd = kthread_create_on_node(kswapd, pgdat, nid, "kswapd%d", nid);
if (IS_ERR(pgdat->kswapd)) {
/* failure at boot is fatal */
pr_err("Failed to start kswapd on node %d,ret=%ld\n",
nid, PTR_ERR(pgdat->kswapd));
BUG_ON(system_state < SYSTEM_RUNNING);
pgdat->kswapd = NULL;
+ } else {
+ wake_up_process(pgdat->kswapd);
}
}
pgdat_kswapd_unlock(pgdat);
--
2.46.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 15/19] kthread: Implement preferred affinity
[not found] <20241211154035.75565-1-frederic@kernel.org>
` (3 preceding siblings ...)
2024-12-11 15:40 ` [PATCH 14/19] mm: Create/affine kswapd " Frederic Weisbecker
@ 2024-12-11 15:40 ` Frederic Weisbecker
4 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-12-11 15:40 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
Uladzislau Rezki, Zqiang, rcu
Affining kthreads follow either of four existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never execute
relevant code on any other CPU. This is currently handled by smpboot
code which takes care of CPU-hotplug operations.
2) Kthreads that _have_ to be affine to a specific set of CPUs and can't
run anywhere else. The affinity is set through kthread_bind_mask()
and the subsystem takes care by itself to handle CPU-hotplug operations.
3) Kthreads that prefer to be affine to a specific NUMA node. That
preferred affinity is applied by default when an actual node ID is
passed on kthread creation, provided the kthread is not per-CPU and
no call to kthread_bind_mask() has been issued before the first
wake-up.
4) Similar to the previous point but kthreads have a preferred affinity
different than a node. It is set manually like any other task and
CPU-hotplug is supposed to be handled by the relevant subsystem so
that the task is properly reaffined whenever a given CPU from the
preferred affinity comes up. Also care must be taken so that the
preferred affinity doesn't cross housekeeping cpumask boundaries.
Provide a function to handle the last usecase, mostly reusing the
current node default affinity infrastructure. kthread_affine_preferred()
is introduced, to be used just like kthread_bind_mask(), right after
kthread creation and before the first wake up. The kthread is then
affine right away to the cpumask passed through the API if it has online
housekeeping CPUs. Otherwise it will be affine to all online
housekeeping CPUs as a last resort.
As with node affinity, it is aware of CPU hotplug events such that:
* When a housekeeping CPU goes up that is part of the preferred affinity
of a given kthread, the related task is re-affined to that preferred
affinity if it was previously running on the default last resort
online housekeeping set.
* When a housekeeping CPU goes down while it was part of the preferred
affinity of a kthread, the running task is migrated (or the sleeping
task is woken up) automatically by the scheduler to other housekeepers
within the preferred affinity or, as a last resort, to all
housekeepers from other nodes.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/kthread.h | 1 +
kernel/kthread.c | 68 ++++++++++++++++++++++++++++++++++++-----
2 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b11f53c1ba2e..30209bdf83a2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,6 +85,7 @@ kthread_run_on_cpu(int (*threadfn)(void *data), void *data,
void free_kthread_struct(struct task_struct *k);
void kthread_bind(struct task_struct *k, unsigned int cpu);
void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask);
int kthread_stop(struct task_struct *k);
int kthread_stop_put(struct task_struct *k);
bool kthread_should_stop(void);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 3394ff024a5a..6bb958a75a0b 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -70,6 +70,7 @@ struct kthread {
char *full_name;
struct task_struct *task;
struct list_head hotplug_node;
+ struct cpumask *preferred_affinity;
};
enum KTHREAD_BITS {
@@ -327,6 +328,11 @@ void __noreturn kthread_exit(long result)
mutex_lock(&kthreads_hotplug_lock);
list_del(&kthread->hotplug_node);
mutex_unlock(&kthreads_hotplug_lock);
+
+ if (kthread->preferred_affinity) {
+ kfree(kthread->preferred_affinity);
+ kthread->preferred_affinity = NULL;
+ }
}
do_exit(0);
}
@@ -355,9 +361,17 @@ EXPORT_SYMBOL(kthread_complete_and_exit);
static void kthread_fetch_affinity(struct kthread *kthread, struct cpumask *cpumask)
{
- cpumask_and(cpumask, cpumask_of_node(kthread->node),
- housekeeping_cpumask(HK_TYPE_KTHREAD));
+ const struct cpumask *pref;
+ if (kthread->preferred_affinity) {
+ pref = kthread->preferred_affinity;
+ } else {
+ if (WARN_ON_ONCE(kthread->node == NUMA_NO_NODE))
+ return;
+ pref = cpumask_of_node(kthread->node);
+ }
+
+ cpumask_and(cpumask, pref, housekeeping_cpumask(HK_TYPE_KTHREAD));
if (cpumask_empty(cpumask))
cpumask_copy(cpumask, housekeeping_cpumask(HK_TYPE_KTHREAD));
}
@@ -440,7 +454,7 @@ static int kthread(void *_create)
self->started = 1;
- if (!(current->flags & PF_NO_SETAFFINITY))
+ if (!(current->flags & PF_NO_SETAFFINITY) && !self->preferred_affinity)
kthread_affine_node();
ret = -EINTR;
@@ -839,12 +853,53 @@ int kthreadd(void *unused)
return 0;
}
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
+{
+ struct kthread *kthread = to_kthread(p);
+ cpumask_var_t affinity;
+ unsigned long flags;
+ int ret;
+
+ if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
+ WARN_ON(1);
+ return -EINVAL;
+ }
+
+ WARN_ON_ONCE(kthread->preferred_affinity);
+
+ if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+ return -ENOMEM;
+
+ kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL);
+ if (!kthread->preferred_affinity) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ mutex_lock(&kthreads_hotplug_lock);
+ cpumask_copy(kthread->preferred_affinity, mask);
+ WARN_ON_ONCE(!list_empty(&kthread->hotplug_node));
+ list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
+ kthread_fetch_affinity(kthread, affinity);
+
+ /* It's safe because the task is inactive. */
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
+ do_set_cpus_allowed(p, affinity);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+ mutex_unlock(&kthreads_hotplug_lock);
+out:
+ free_cpumask_var(affinity);
+
+ return 0;
+}
+
/*
* Re-affine kthreads according to their preferences
* and the newly online CPU. The CPU down part is handled
* by select_fallback_rq() which default re-affines to
- * housekeepers in case the preferred affinity doesn't
- * apply anymore.
+ * housekeepers from other nodes in case the preferred
+ * affinity doesn't apply anymore.
*/
static int kthreads_online_cpu(unsigned int cpu)
{
@@ -864,8 +919,7 @@ static int kthreads_online_cpu(unsigned int cpu)
list_for_each_entry(k, &kthreads_hotplug, hotplug_node) {
if (WARN_ON_ONCE((k->task->flags & PF_NO_SETAFFINITY) ||
- kthread_is_per_cpu(k->task) ||
- k->node == NUMA_NO_NODE)) {
+ kthread_is_per_cpu(k->task))) {
ret = -EINVAL;
continue;
}
--
2.46.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 15/19] kthread: Implement preferred affinity
[not found] <20240916224925.20540-1-frederic@kernel.org>
@ 2024-09-16 22:49 ` Frederic Weisbecker
0 siblings, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-09-16 22:49 UTC (permalink / raw)
To: LKML
Cc: Frederic Weisbecker, Andrew Morton, Kees Cook, Peter Zijlstra,
Thomas Gleixner, Michal Hocko, Vlastimil Babka, linux-mm,
Paul E. McKenney, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
Zqiang, rcu
Affining kthreads follow either of four existing different patterns:
1) Per-CPU kthreads must stay affine to a single CPU and never execute
relevant code on any other CPU. This is currently handled by smpboot
code which takes care of CPU-hotplug operations.
2) Kthreads that _have_ to be affine to a specific set of CPUs and can't
run anywhere else. The affinity is set through kthread_bind_mask()
and the subsystem takes care by itself to handle CPU-hotplug operations.
3) Kthreads that prefer to be affine to a specific NUMA node. That
preferred affinity is applied by default when an actual node ID is
passed on kthread creation, provided the kthread is not per-CPU and
no call to kthread_bind_mask() has been issued before the first
wake-up.
4) Similar to the previous point but kthreads have a preferred affinity
different than a node. It is set manually like any other task and
CPU-hotplug is supposed to be handled by the relevant subsystem so
that the task is properly reaffined whenever a given CPU from the
preferred affinity comes up or down. Also care must be taken so that
the preferred affinity doesn't cross housekeeping cpumask boundaries.
Provide a function to handle the last usecase, mostly reusing the
current node default affinity infrastructure. kthread_affine_preferred()
is introduced, to be used just like kthread_bind_mask(), right after
kthread creation and before the first wake up. The kthread is then
affine right away to the cpumask passed through the API if it has online
housekeeping CPUs. Otherwise it will be affine to all online
housekeeping CPUs as a last resort.
As with node affinity, it is aware of CPU hotplug events such that:
* When a housekeeping CPU goes up and is part of the preferred affinity
of a given kthread, it is added to its applied affinity set (and
possibly the default last resort online housekeeping set is removed
from the set).
* When a housekeeping CPU goes down while it was part of the preferred
affinity of a kthread, it is removed from the kthread's applied
affinity. The last resort is to affine the kthread to all online
housekeeping CPUs.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/kthread.h | 1 +
kernel/kthread.c | 69 ++++++++++++++++++++++++++++++++++++-----
2 files changed, 62 insertions(+), 8 deletions(-)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h
index b11f53c1ba2e..30209bdf83a2 100644
--- a/include/linux/kthread.h
+++ b/include/linux/kthread.h
@@ -85,6 +85,7 @@ kthread_run_on_cpu(int (*threadfn)(void *data), void *data,
void free_kthread_struct(struct task_struct *k);
void kthread_bind(struct task_struct *k, unsigned int cpu);
void kthread_bind_mask(struct task_struct *k, const struct cpumask *mask);
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask);
int kthread_stop(struct task_struct *k);
int kthread_stop_put(struct task_struct *k);
bool kthread_should_stop(void);
diff --git a/kernel/kthread.c b/kernel/kthread.c
index eee5925e7725..e4ffc776928a 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -71,6 +71,7 @@ struct kthread {
char *full_name;
struct task_struct *task;
struct list_head hotplug_node;
+ struct cpumask *preferred_affinity;
};
enum KTHREAD_BITS {
@@ -330,6 +331,11 @@ void __noreturn kthread_exit(long result)
/* Make sure the kthread never gets re-affined globally */
set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_KTHREAD));
mutex_unlock(&kthreads_hotplug_lock);
+
+ if (kthread->preferred_affinity) {
+ kfree(kthread->preferred_affinity);
+ kthread->preferred_affinity = NULL;
+ }
}
do_exit(0);
}
@@ -358,19 +364,25 @@ EXPORT_SYMBOL(kthread_complete_and_exit);
static void kthread_fetch_affinity(struct kthread *k, struct cpumask *mask)
{
- if (k->node == NUMA_NO_NODE) {
- cpumask_copy(mask, housekeeping_cpumask(HK_TYPE_KTHREAD));
- } else {
+ const struct cpumask *pref;
+
+ if (k->preferred_affinity) {
+ pref = k->preferred_affinity;
+ } else if (k->node != NUMA_NO_NODE) {
/*
* The node cpumask is racy when read from kthread() but:
* - a racing CPU going down won't be present in kthread_online_mask
* - a racing CPU going up will be handled by kthreads_online_cpu()
*/
- cpumask_and(mask, cpumask_of_node(k->node), &kthread_online_mask);
- cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_KTHREAD));
- if (cpumask_empty(mask))
- cpumask_copy(mask, housekeeping_cpumask(HK_TYPE_KTHREAD));
+ pref = cpumask_of_node(k->node);
+ } else {
+ pref = housekeeping_cpumask(HK_TYPE_KTHREAD);
}
+
+ cpumask_and(mask, pref, &kthread_online_mask);
+ cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_KTHREAD));
+ if (cpumask_empty(mask))
+ cpumask_copy(mask, housekeeping_cpumask(HK_TYPE_KTHREAD));
}
static int kthread_affine_node(void)
@@ -440,7 +452,7 @@ static int kthread(void *_create)
self->started = 1;
- if (!(current->flags & PF_NO_SETAFFINITY))
+ if (!(current->flags & PF_NO_SETAFFINITY) && !self->preferred_affinity)
kthread_affine_node();
ret = -EINTR;
@@ -837,6 +849,47 @@ int kthreadd(void *unused)
return 0;
}
+int kthread_affine_preferred(struct task_struct *p, const struct cpumask *mask)
+{
+ struct kthread *kthread = to_kthread(p);
+ cpumask_var_t affinity;
+ unsigned long flags;
+ int ret;
+
+ if (!wait_task_inactive(p, TASK_UNINTERRUPTIBLE) || kthread->started) {
+ WARN_ON(1);
+ return -EINVAL;
+ }
+
+ WARN_ON_ONCE(kthread->preferred_affinity);
+
+ if (!zalloc_cpumask_var(&affinity, GFP_KERNEL))
+ return -ENOMEM;
+
+ kthread->preferred_affinity = kzalloc(sizeof(struct cpumask), GFP_KERNEL);
+ if (!kthread->preferred_affinity) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ mutex_lock(&kthreads_hotplug_lock);
+ cpumask_copy(kthread->preferred_affinity, mask);
+ WARN_ON_ONCE(!list_empty(&kthread->hotplug_node));
+ list_add_tail(&kthread->hotplug_node, &kthreads_hotplug);
+ kthread_fetch_affinity(kthread, affinity);
+
+ /* It's safe because the task is inactive. */
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
+ do_set_cpus_allowed(p, affinity);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+
+ mutex_unlock(&kthreads_hotplug_lock);
+out:
+ free_cpumask_var(affinity);
+
+ return 0;
+}
+
static int kthreads_hotplug_update(void)
{
cpumask_var_t affinity;
--
2.46.0
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-11 15:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20241211154035.75565-1-frederic@kernel.org>
2024-12-11 15:40 ` [PATCH 11/19] kthread: Make sure kthread hasn't started while binding it Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 12/19] kthread: Default affine kthread to its preferred NUMA node Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 13/19] mm: Create/affine kcompactd to its preferred node Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 14/19] mm: Create/affine kswapd " Frederic Weisbecker
2024-12-11 15:40 ` [PATCH 15/19] kthread: Implement preferred affinity Frederic Weisbecker
[not found] <20240916224925.20540-1-frederic@kernel.org>
2024-09-16 22:49 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox