* [PATCH 0/3] mm: replace wq users and add WQ_PERCPU to alloc_workqueue() users
@ 2025-09-05 9:03 Marco Crivellari
2025-09-05 9:03 ` [PATCH 1/3] mm: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Marco Crivellari @ 2025-09-05 9:03 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Andrew Morton
Hi!
Below is a summary of a discussion about the Workqueue API and cpu isolation
considerations. Details and more information are available here:
"workqueue: Always use wq_select_unbound_cpu() for WORK_CPU_UNBOUND."
https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
=== Current situation: problems ===
Let's consider a nohz_full system with isolated CPUs: wq_unbound_cpumask is
set to the housekeeping CPUs, for !WQ_UNBOUND the local CPU is selected.
This leads to different scenarios if a work item is scheduled on an isolated
CPU where "delay" value is 0 or greater then 0:
schedule_delayed_work(, 0);
This will be handled by __queue_work() that will queue the work item on the
current local (isolated) CPU, while:
schedule_delayed_work(, 1);
Will move the timer on an housekeeping CPU, and schedule the work there.
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
=== Plan and future plans ===
This patchset is the first stone on a refactoring needed in order to
address the points aforementioned; it will have a positive impact also
on the cpu isolation, in the long term, moving away percpu workqueue in
favor to an unbound model.
These are the main steps:
1) API refactoring (that this patch is introducing)
- Make more clear and uniform the system wq names, both per-cpu and
unbound. This to avoid any possible confusion on what should be
used.
- Introduction of WQ_PERCPU: this flag is the complement of WQ_UNBOUND,
introduced in this patchset and used on all the callers that are not
currently using WQ_UNBOUND.
WQ_UNBOUND will be removed in a future release cycle.
Most users don't need to be per-cpu, because they don't have
locality requirements, because of that, a next future step will be
make "unbound" the default behavior.
2) Check who really needs to be per-cpu
- Remove the WQ_PERCPU flag when is not strictly required.
3) Add a new API (prefer local cpu)
- There are users that don't require a local execution, like mentioned
above; despite that, local execution yeld to performance gain.
This new API will prefer the local execution, without requiring it.
=== Introduced Changes by this series ===
1) [P 1-2] Replace use of system_wq and system_unbound_wq
system_wq is a per-CPU workqueue, but his name is not clear.
system_unbound_wq is to be used when locality is not required.
Because of that, system_wq has been renamed in system_percpu_wq, and
system_unbound_wq has been renamed in system_dfl_wq.
2) [P 3] add WQ_PERCPU to remaining alloc_workqueue() users
Every alloc_workqueue() caller should use one among WQ_PERCPU or
WQ_UNBOUND. This is actually enforced warning if both or none of them
are present at the same time.
WQ_UNBOUND will be removed in a next release cycle.
=== For Maintainers ===
There are prerequisites for this series, already merged in the master branch.
The commits are:
128ea9f6ccfb6960293ae4212f4f97165e42222d ("workqueue: Add system_percpu_wq and
system_dfl_wq")
930c2ea566aff59e962c50b2421d5fcc3b98b8be ("workqueue: Add new WQ_PERCPU flag")
Thanks!
Marco Crivellari (3):
mm: replace use of system_unbound_wq with system_dfl_wq
mm: replace use of system_wq with system_percpu_wq
mm: WQ_PERCPU added to alloc_workqueue users
mm/backing-dev.c | 6 +++---
mm/kfence/core.c | 6 +++---
mm/memcontrol.c | 4 ++--
mm/slub.c | 3 ++-
mm/vmstat.c | 3 ++-
5 files changed, 12 insertions(+), 10 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] mm: replace use of system_unbound_wq with system_dfl_wq
2025-09-05 9:03 [PATCH 0/3] mm: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
@ 2025-09-05 9:03 ` Marco Crivellari
2025-09-05 9:03 ` [PATCH 2/3] mm: replace use of system_wq with system_percpu_wq Marco Crivellari
2025-09-05 9:03 ` [PATCH 3/3] mm: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
2 siblings, 0 replies; 4+ messages in thread
From: Marco Crivellari @ 2025-09-05 9:03 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Andrew Morton
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.
Adding system_dfl_wq to encourage its use when unbound work should be used.
queue_work() / queue_delayed_work() / mod_delayed_work() will now use the
new unbound wq: whether the user still use the old wq a warn will be
printed along with a wq redirect to the new one.
The old system_unbound_wq will be kept for a few release cycles.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
mm/backing-dev.c | 2 +-
mm/kfence/core.c | 6 +++---
mm/memcontrol.c | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 783904d8c5ef..e9f9fdcfe052 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -934,7 +934,7 @@ void wb_memcg_offline(struct mem_cgroup *memcg)
memcg_cgwb_list->next = NULL; /* prevent new wb's */
spin_unlock_irq(&cgwb_lock);
- queue_work(system_unbound_wq, &cleanup_offline_cgwbs_work);
+ queue_work(system_dfl_wq, &cleanup_offline_cgwbs_work);
}
/**
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 102048821c22..f26d87d59296 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -854,7 +854,7 @@ static void toggle_allocation_gate(struct work_struct *work)
/* Disable static key and reset timer. */
static_branch_disable(&kfence_allocation_key);
#endif
- queue_delayed_work(system_unbound_wq, &kfence_timer,
+ queue_delayed_work(system_dfl_wq, &kfence_timer,
msecs_to_jiffies(kfence_sample_interval));
}
@@ -900,7 +900,7 @@ static void kfence_init_enable(void)
atomic_notifier_chain_register(&panic_notifier_list, &kfence_check_canary_notifier);
WRITE_ONCE(kfence_enabled, true);
- queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+ queue_delayed_work(system_dfl_wq, &kfence_timer, 0);
pr_info("initialized - using %lu bytes for %d objects at 0x%p-0x%p\n", KFENCE_POOL_SIZE,
CONFIG_KFENCE_NUM_OBJECTS, (void *)__kfence_pool,
@@ -996,7 +996,7 @@ static int kfence_enable_late(void)
return kfence_init_late();
WRITE_ONCE(kfence_enabled, true);
- queue_delayed_work(system_unbound_wq, &kfence_timer, 0);
+ queue_delayed_work(system_dfl_wq, &kfence_timer, 0);
pr_info("re-enabled\n");
return 0;
}
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 421740f1bcdc..c2944bc83378 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -651,7 +651,7 @@ static void flush_memcg_stats_dwork(struct work_struct *w)
* in latency-sensitive paths is as cheap as possible.
*/
__mem_cgroup_flush_stats(root_mem_cgroup, true);
- queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME);
+ queue_delayed_work(system_dfl_wq, &stats_flush_dwork, FLUSH_TIME);
}
unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx)
@@ -3732,7 +3732,7 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
goto offline_kmem;
if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled())
- queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
+ queue_delayed_work(system_dfl_wq, &stats_flush_dwork,
FLUSH_TIME);
lru_gen_online_memcg(memcg);
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/3] mm: replace use of system_wq with system_percpu_wq
2025-09-05 9:03 [PATCH 0/3] mm: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
2025-09-05 9:03 ` [PATCH 1/3] mm: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
@ 2025-09-05 9:03 ` Marco Crivellari
2025-09-05 9:03 ` [PATCH 3/3] mm: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
2 siblings, 0 replies; 4+ messages in thread
From: Marco Crivellari @ 2025-09-05 9:03 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Andrew Morton
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users.
Make it clear by adding a system_percpu_wq to all the mm subsystem.
The old wq will be kept for a few release cylces.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
mm/backing-dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index e9f9fdcfe052..7e672424f928 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -966,7 +966,7 @@ static int __init cgwb_init(void)
{
/*
* There can be many concurrent release work items overwhelming
- * system_wq. Put them in a separate wq and limit concurrency.
+ * system_percpu_wq. Put them in a separate wq and limit concurrency.
* There's no point in executing many of these in parallel.
*/
cgwb_release_wq = alloc_workqueue("cgwb_release", 0, 1);
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 3/3] mm: WQ_PERCPU added to alloc_workqueue users
2025-09-05 9:03 [PATCH 0/3] mm: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
2025-09-05 9:03 ` [PATCH 1/3] mm: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
2025-09-05 9:03 ` [PATCH 2/3] mm: replace use of system_wq with system_percpu_wq Marco Crivellari
@ 2025-09-05 9:03 ` Marco Crivellari
2 siblings, 0 replies; 4+ messages in thread
From: Marco Crivellari @ 2025-09-05 9:03 UTC (permalink / raw)
To: linux-kernel, linux-mm
Cc: Tejun Heo, Lai Jiangshan, Frederic Weisbecker,
Sebastian Andrzej Siewior, Marco Crivellari, Michal Hocko,
Andrew Morton
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This patch adds a new WQ_PERCPU flag to all the mm subsystem users to
explicitly request the use of the per-CPU behavior. Both flags coexist
for one release cycle to allow callers to transition their calls.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
All existing users have been updated accordingly.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
---
mm/backing-dev.c | 2 +-
mm/slub.c | 3 ++-
mm/vmstat.c | 3 ++-
3 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7e672424f928..3b392de6367e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -969,7 +969,7 @@ static int __init cgwb_init(void)
* system_percpu_wq. Put them in a separate wq and limit concurrency.
* There's no point in executing many of these in parallel.
*/
- cgwb_release_wq = alloc_workqueue("cgwb_release", 0, 1);
+ cgwb_release_wq = alloc_workqueue("cgwb_release", WQ_PERCPU, 1);
if (!cgwb_release_wq)
return -ENOMEM;
diff --git a/mm/slub.c b/mm/slub.c
index b46f87662e71..cac9d5d7c924 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -6364,7 +6364,8 @@ void __init kmem_cache_init(void)
void __init kmem_cache_init_late(void)
{
#ifndef CONFIG_SLUB_TINY
- flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0);
+ flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM | WQ_PERCPU,
+ 0);
WARN_ON(!flushwq);
#endif
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4c268ce39ff2..57bf76b1d9d4 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2244,7 +2244,8 @@ void __init init_mm_internals(void)
{
int ret __maybe_unused;
- mm_percpu_wq = alloc_workqueue("mm_percpu_wq", WQ_MEM_RECLAIM, 0);
+ mm_percpu_wq = alloc_workqueue("mm_percpu_wq",
+ WQ_MEM_RECLAIM | WQ_PERCPU, 0);
#ifdef CONFIG_SMP
ret = cpuhp_setup_state_nocalls(CPUHP_MM_VMSTAT_DEAD, "mm/vmstat:dead",
--
2.51.0
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-09-05 9:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-05 9:03 [PATCH 0/3] mm: replace wq users and add WQ_PERCPU to alloc_workqueue() users Marco Crivellari
2025-09-05 9:03 ` [PATCH 1/3] mm: replace use of system_unbound_wq with system_dfl_wq Marco Crivellari
2025-09-05 9:03 ` [PATCH 2/3] mm: replace use of system_wq with system_percpu_wq Marco Crivellari
2025-09-05 9:03 ` [PATCH 3/3] mm: WQ_PERCPU added to alloc_workqueue users Marco Crivellari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox