* [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets
@ 2026-04-13 7:43 Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 02/12] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
This series introduces Dynamic Housekeeping Management (DHM) to the
Linux kernel.
Previously known as the DHEI (Dynamic Housekeeping Environment Interface)
patchset (RFC and v1), this series has been fundamentally refactored in
response to upstream feedback. The custom sysfs interface has been entirely
dropped. Instead, DHM is now natively integrated into the cgroup v2
cpuset controller.
By exposing `cpuset.housekeeping.cpus` on the root cgroup, system
administrators and workload orchestrators (like Kubernetes) can
dynamically update the kernel's global housekeeping masks at runtime,
without requiring a node reboot.
This version provides dynamic reconfiguration support for the following
subsystems:
- RCU (NOCB offloading)
- Tick/NOHZ (Full dynticks)
- Global Workqueues and Timers
- Managed Interrupts (genirq)
- Hardlockup Detectors (Watchdog)
- Scheduler Domains (Isolation)
- Memory Management (vmstat/lru_add_drain)
- Kthreads and Softirqs (Affinity)
Many thanks to the maintainers for the valuable guidance that led to this
significantly improved and upstream-aligned architecture.
To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Juri Lelli <juri.lelli@redhat.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Steven Rostedt <rostedt@goodmis.org>
To: Ben Segall <bsegall@google.com>
To: Mel Gorman <mgorman@suse.de>
To: Valentin Schneider <vschneid@redhat.com>
To: Paul E. McKenney <paulmck@kernel.org>
To: Frederic Weisbecker <frederic@kernel.org>
To: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
To: Joel Fernandes <joelagnelf@nvidia.com>
To: Josh Triplett <josh@joshtriplett.org>
To: Boqun Feng <boqun@kernel.org>
To: Uladzislau Rezki <urezki@gmail.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Lai Jiangshan <jiangshanlai@gmail.com>
To: Zqiang <qiang.zhang@linux.dev>
To: Anna-Maria Behnsen <anna-maria@linutronix.de>
To: Ingo Molnar <mingo@kernel.org>
To: Thomas Gleixner <tglx@kernel.org>
To: Tejun Heo <tj@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
To: Vlastimil Babka <vbabka@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
To: Michal Hocko <mhocko@suse.com>
To: Brendan Jackman <jackmanb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
To: Zi Yan <ziy@nvidia.com>
To: Waiman Long <longman@redhat.com>
To: Chen Ridong <chenridong@huaweicloud.com>
To: Michal Koutný <mkoutny@suse.com>
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: rcu@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: cgroups@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Changes in v2:
- Rebranded series from DHEI to DHM (Dynamic Housekeeping Management).
- Entirely dropped custom sysfs interface.
- Integrated housekeeping control into cgroup v2 cpuset controller
at the root level.
- Added SMT-aware pipeline logic (cpuset.housekeeping.smt_aware) to
prevent splitting SMT siblings.
- Added comprehensive documentation and cgroup functional selftests for
the DHM APIs.
- Refactored the internal mask transition logic to use RCU-safe
handover.
- Separated patch series into 4 logical phases for review.
v1 Link: https://lore.kernel.org/all/20260325-dhei-v12-final-v1-0-919cca23cadf@gmail.com
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
Qiliang Yuan (12):
sched/isolation: Separate housekeeping types in enum hk_type
sched/isolation: Introduce housekeeping notifier infrastructure
rcu: Support runtime NOCB initialization and dynamic offloading
tick/nohz: Transition to dynamic full dynticks state management
genirq: Support dynamic migration for managed interrupts
watchdog: Allow runtime toggle of lockup detector affinity
sched/core: Dynamically update scheduler domain housekeeping mask
workqueue, mm: Support dynamic housekeeping mask updates
cgroup/cpuset: Introduce CPUSet-driven dynamic housekeeping (DHM)
cgroup/cpuset: Implement SMT-aware grouping and safety guards
Documentation: cgroup-v2: Document dynamic housekeeping (DHM)
selftests: cgroup: Add functional tests for dynamic housekeeping
Documentation/admin-guide/cgroup-v2.rst | 24 +++++
include/linux/sched/isolation.h | 51 ++++++++---
kernel/cgroup/cpuset-internal.h | 2 +
kernel/cgroup/cpuset.c | 73 +++++++++++++++
kernel/irq/manage.c | 49 ++++++++++
kernel/rcu/rcu.h | 4 +
kernel/rcu/tree.c | 75 ++++++++++++++++
kernel/rcu/tree.h | 2 +-
kernel/rcu/tree_nocb.h | 31 ++++---
kernel/sched/core.c | 23 +++++
kernel/sched/isolation.c | 74 ++++++++++++++-
kernel/time/tick-sched.c | 130 +++++++++++++++++++++------
kernel/watchdog.c | 26 ++++++
kernel/workqueue.c | 42 +++++++++
mm/compaction.c | 27 ++++++
tools/testing/selftests/cgroup/test_cpuset.c | 36 ++++++++
16 files changed, 620 insertions(+), 49 deletions(-)
---
base-commit: bfe62a454542cfad3379f6ef5680b125f41e20f4
change-id: 20260408-wujing-dhm-8f43e2d49cd8
Best regards,
--
Qiliang Yuan <realwujing@gmail.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 02/12] sched/isolation: Introduce housekeeping notifier infrastructure
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 03/12] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Subsystems currently rely on static housekeeping masks determined at
boot. Supporting runtime reconfiguration (DHM v2) requires a mechanism
to broadcast mask changes to affected kernel components.
Implement a blocking notifier chain for housekeeping mask updates. This
infrastructure enables subsystems like genirq, workqueues, and RCU to
react dynamically to isolation changes triggered by cpusets.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
include/linux/sched/isolation.h | 21 +++++++++++++++++++++
kernel/sched/isolation.c | 26 ++++++++++++++++++++++++++
2 files changed, 47 insertions(+)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index b9a041247565c..aea1dbc4d7486 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -4,6 +4,7 @@
#include <linux/cpumask.h>
#include <linux/init.h>
#include <linux/tick.h>
+#include <linux/notifier.h>
enum hk_type {
/* Inverse of boot-time isolcpus= argument */
@@ -28,6 +29,13 @@ enum hk_type {
#define HK_TYPE_KERNEL_NOISE HK_TYPE_TICK
+struct housekeeping_update {
+ enum hk_type type;
+ const struct cpumask *new_mask;
+};
+
+#define HK_UPDATE_MASK 0x01
+
#ifdef CONFIG_CPU_ISOLATION
DECLARE_STATIC_KEY_FALSE(housekeeping_overridden);
extern int housekeeping_any_cpu(enum hk_type type);
@@ -38,6 +46,9 @@ extern bool housekeeping_test_cpu(int cpu, enum hk_type type);
extern int housekeeping_update(struct cpumask *isol_mask);
extern void __init housekeeping_init(void);
+extern int housekeeping_register_notifier(struct notifier_block *nb);
+extern int housekeeping_unregister_notifier(struct notifier_block *nb);
+
#else
static inline int housekeeping_any_cpu(enum hk_type type)
@@ -65,6 +76,16 @@ static inline bool housekeeping_test_cpu(int cpu, enum hk_type type)
static inline int housekeeping_update(struct cpumask *isol_mask) { return 0; }
static inline void housekeeping_init(void) { }
+
+static inline int housekeeping_register_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
+
+static inline int housekeeping_unregister_notifier(struct notifier_block *nb)
+{
+ return 0;
+}
#endif /* CONFIG_CPU_ISOLATION */
static inline bool housekeeping_cpu(int cpu, enum hk_type type)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index e05ed5118e651..0462b41807161 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -10,6 +10,7 @@
#include <linux/sched/isolation.h>
#include <linux/pci.h>
#include "sched.h"
+#include <linux/notifier.h>
enum hk_flags {
HK_FLAG_DOMAIN_BOOT = BIT(HK_TYPE_DOMAIN_BOOT),
@@ -26,6 +27,8 @@ enum hk_flags {
#define HK_FLAG_KERNEL_NOISE (HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | \
HK_FLAG_MISC | HK_FLAG_WQ | HK_FLAG_KTHREAD)
+static BLOCKING_NOTIFIER_HEAD(housekeeping_notifier_list);
+
DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -170,6 +173,29 @@ int housekeeping_update(struct cpumask *isol_mask)
return 0;
}
+int housekeeping_register_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&housekeeping_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(housekeeping_register_notifier);
+
+int housekeeping_unregister_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&housekeeping_notifier_list, nb);
+}
+EXPORT_SYMBOL_GPL(housekeeping_unregister_notifier);
+
+int housekeeping_update_notify(enum hk_type type, const struct cpumask *new_mask)
+{
+ struct housekeeping_update update = {
+ .type = type,
+ .new_mask = new_mask,
+ };
+
+ return blocking_notifier_call_chain(&housekeeping_notifier_list, HK_UPDATE_MASK, &update);
+}
+EXPORT_SYMBOL_GPL(housekeeping_update_notify);
+
void __init housekeeping_init(void)
{
enum hk_type type;
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 03/12] rcu: Support runtime NOCB initialization and dynamic offloading
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 02/12] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 04/12] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Context:
The RCU Non-Callback (NOCB) infrastructure traditionally requires
boot-time parameters (e.g., rcu_nocbs) to allocate masks and spawn
management kthreads (rcuog/rcuo). This prevents systems from activating
offloading on-demand without a reboot.
Problem:
Dynamic Housekeeping Management requires CPUs to transition to
NOCB mode at runtime when they are newly isolated. Without boot-time
setup, the NOCB masks are unallocated, and critical kthreads are missing,
preventing effective tick suppression and isolation.
Solution:
Refactor RCU initialization to support dynamic on-demand setup.
- Introduce rcu_init_nocb_dynamic() to allocate masks and organize
kthreads if the system wasn't initially configured for NOCB.
- Introduce rcu_housekeeping_reconfigure() to iterate over CPUs and
perform safe offload/deoffload transitions via hotplug sequences
(cpu_down -> offload -> cpu_up) when a housekeeping cpuset triggers
a notifier event.
- Remove __init from rcu_organize_nocb_kthreads to allow runtime
reconfiguration of the callback management hierarchy.
This enables a true "Zero-Conf" isolation experience where any CPU
can be fully isolated at runtime regardless of boot parameters.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/rcu/rcu.h | 4 +++
kernel/rcu/tree.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/rcu/tree.h | 2 +-
kernel/rcu/tree_nocb.h | 31 +++++++++++++--------
4 files changed, 100 insertions(+), 12 deletions(-)
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 9b10b57b79ada..282874443c96b 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -663,8 +663,12 @@ unsigned long srcu_batches_completed(struct srcu_struct *sp);
#endif // #else // #ifdef CONFIG_TINY_SRCU
#ifdef CONFIG_RCU_NOCB_CPU
+void rcu_init_nocb_dynamic(void);
+void rcu_spawn_cpu_nocb_kthread(int cpu);
void rcu_bind_current_to_nocb(void);
#else
+static inline void rcu_init_nocb_dynamic(void) { }
+static inline void rcu_spawn_cpu_nocb_kthread(int cpu) { }
static inline void rcu_bind_current_to_nocb(void) { }
#endif
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 55df6d37145e8..84c8388cf89a1 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4928,4 +4928,79 @@ void __init rcu_init(void)
#include "tree_stall.h"
#include "tree_exp.h"
#include "tree_nocb.h"
+
+#ifdef CONFIG_SMP
+static int rcu_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ struct task_struct *t;
+ int cpu;
+
+ if (action != HK_UPDATE_MASK || upd->type != HK_TYPE_RCU)
+ return NOTIFY_OK;
+
+ rcu_init_nocb_dynamic();
+
+ for_each_possible_cpu(cpu) {
+ struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+ bool isolated = !cpumask_test_cpu(cpu, upd->new_mask);
+ bool offloaded = rcu_rdp_is_offloaded(rdp);
+
+ if (isolated && !offloaded) {
+ /* Transition to NOCB */
+ pr_info("rcu: CPU %d transitioning to NOCB mode\n", cpu);
+ if (cpu_online(cpu)) {
+ remove_cpu(cpu);
+ rcu_spawn_cpu_nocb_kthread(cpu);
+ rcu_nocb_cpu_offload(cpu);
+ add_cpu(cpu);
+ } else {
+ rcu_spawn_cpu_nocb_kthread(cpu);
+ rcu_nocb_cpu_offload(cpu);
+ }
+ } else if (!isolated && offloaded) {
+ /* Transition to CB */
+ pr_info("rcu: CPU %d transitioning to CB mode\n", cpu);
+ if (cpu_online(cpu)) {
+ remove_cpu(cpu);
+ rcu_nocb_cpu_deoffload(cpu);
+ add_cpu(cpu);
+ } else {
+ rcu_nocb_cpu_deoffload(cpu);
+ }
+ }
+ }
+
+ t = READ_ONCE(rcu_state.gp_kthread);
+ if (t)
+ housekeeping_affine(t, HK_TYPE_RCU);
+
+#ifdef CONFIG_TASKS_RCU
+ t = get_rcu_tasks_gp_kthread();
+ if (t)
+ housekeeping_affine(t, HK_TYPE_RCU);
+#endif
+
+#ifdef CONFIG_TASKS_RUDE_RCU
+ t = get_rcu_tasks_rude_gp_kthread();
+ if (t)
+ housekeeping_affine(t, HK_TYPE_RCU);
+#endif
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block rcu_housekeeping_nb = {
+ .notifier_call = rcu_housekeeping_reconfigure,
+};
+
+static int __init rcu_init_housekeeping_notifier(void)
+{
+ housekeeping_register_notifier(&rcu_housekeeping_nb);
+ return 0;
+}
+late_initcall(rcu_init_housekeeping_notifier);
+#endif
+
#include "tree_plugin.h"
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 7dfc57e9adb18..f3d31918ea322 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -517,7 +517,7 @@ static void rcu_nocb_unlock_irqrestore(struct rcu_data *rdp,
unsigned long flags);
static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
#ifdef CONFIG_RCU_NOCB_CPU
-static void __init rcu_organize_nocb_kthreads(void);
+static void rcu_organize_nocb_kthreads(void);
/*
* Disable IRQs before checking offloaded state so that local
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index b3337c7231ccb..36f6c9be937aa 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1259,6 +1259,22 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
}
#endif // #ifdef CONFIG_RCU_LAZY
+void rcu_init_nocb_dynamic(void)
+{
+ if (rcu_state.nocb_is_setup)
+ return;
+
+ if (!cpumask_available(rcu_nocb_mask)) {
+ if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
+ pr_info("rcu_nocb_mask allocation failed, dynamic offloading disabled.\n");
+ return;
+ }
+ }
+
+ rcu_state.nocb_is_setup = true;
+ rcu_organize_nocb_kthreads();
+}
+
void __init rcu_init_nohz(void)
{
int cpu;
@@ -1276,15 +1292,8 @@ void __init rcu_init_nohz(void)
cpumask = cpu_possible_mask;
if (cpumask) {
- if (!cpumask_available(rcu_nocb_mask)) {
- if (!zalloc_cpumask_var(&rcu_nocb_mask, GFP_KERNEL)) {
- pr_info("rcu_nocb_mask allocation failed, callback offloading disabled.\n");
- return;
- }
- }
-
+ rcu_init_nocb_dynamic();
cpumask_or(rcu_nocb_mask, rcu_nocb_mask, cpumask);
- rcu_state.nocb_is_setup = true;
}
if (!rcu_state.nocb_is_setup)
@@ -1344,7 +1353,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
* rcuo CB kthread, spawn it. Additionally, if the rcuo GP kthread
* for this CPU's group has not yet been created, spawn it as well.
*/
-static void rcu_spawn_cpu_nocb_kthread(int cpu)
+void rcu_spawn_cpu_nocb_kthread(int cpu)
{
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
struct rcu_data *rdp_gp;
@@ -1416,7 +1425,7 @@ module_param(rcu_nocb_gp_stride, int, 0444);
/*
* Initialize GP-CB relationships for all no-CBs CPU.
*/
-static void __init rcu_organize_nocb_kthreads(void)
+static void rcu_organize_nocb_kthreads(void)
{
int cpu;
bool firsttime = true;
@@ -1668,7 +1677,7 @@ static bool do_nocb_deferred_wakeup(struct rcu_data *rdp)
return false;
}
-static void rcu_spawn_cpu_nocb_kthread(int cpu)
+void rcu_spawn_cpu_nocb_kthread(int cpu)
{
}
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 04/12] tick/nohz: Transition to dynamic full dynticks state management
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 02/12] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 03/12] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 05/12] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Context:
Full dynticks (NOHZ_FULL) is typically a static configuration determined
at boot time. DHEI extends this to support runtime activation.
Problem:
Switching to NOHZ_FULL at runtime requires careful synchronization
of context tracking and housekeeping states. Re-invoking setup logic
multiple times could lead to inconsistencies or warnings, and RCU
dependency checks often prevented tick suppression in Zero-Conf setups.
Solution:
- Replace the static tick_nohz_full_enabled() checks with a dynamic
tick_nohz_full_running state variable.
- Refactor tick_nohz_full_setup to be safe for runtime invocation,
adding guards against re-initialization and ensuring IRQ work
interrupt support.
- Implement boot-time pre-activation of context tracking (shadow
init) for all possible CPUs to avoid instruction flow issues during
dynamic transitions.
- Hook into housekeeping_notifier_list to update NO_HZ states dynamically.
This provides the core state machine for reliable, on-demand tick
suppression and high-performance isolation.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/time/tick-sched.c | 130 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 105 insertions(+), 25 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f7907fadd63f2..23d69d7d44538 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -27,6 +27,7 @@
#include <linux/posix-timers.h>
#include <linux/context_tracking.h>
#include <linux/mm.h>
+#include <linux/sched/isolation.h>
#include <asm/irq_regs.h>
@@ -624,13 +625,25 @@ void __tick_nohz_task_switch(void)
/* Get the boot-time nohz CPU list from the kernel parameters. */
void __init tick_nohz_full_setup(cpumask_var_t cpumask)
{
- alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+ if (!tick_nohz_full_mask) {
+ if (!slab_is_available())
+ alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+ else
+ zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL);
+ }
cpumask_copy(tick_nohz_full_mask, cpumask);
tick_nohz_full_running = true;
}
bool tick_nohz_cpu_hotpluggable(unsigned int cpu)
{
+ /*
+ * Allow all CPUs to go down during shutdown/reboot to avoid
+ * interfering with the final power-off sequence.
+ */
+ if (system_state > SYSTEM_RUNNING)
+ return true;
+
/*
* The 'tick_do_timer_cpu' CPU handles housekeeping duty (unbound
* timers, workqueues, timekeeping, ...) on behalf of full dynticks
@@ -646,45 +659,112 @@ static int tick_nohz_cpu_down(unsigned int cpu)
return tick_nohz_cpu_hotpluggable(cpu) ? 0 : -EBUSY;
}
+static int tick_nohz_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ int cpu;
+
+ if (action == HK_UPDATE_MASK && upd->type == HK_TYPE_TICK) {
+ cpumask_var_t non_housekeeping_mask;
+
+ if (!alloc_cpumask_var(&non_housekeeping_mask, GFP_KERNEL))
+ return NOTIFY_BAD;
+
+ cpumask_andnot(non_housekeeping_mask, cpu_possible_mask, upd->new_mask);
+
+ if (!tick_nohz_full_mask) {
+ if (!zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL)) {
+ free_cpumask_var(non_housekeeping_mask);
+ return NOTIFY_BAD;
+ }
+ }
+
+ /* Kick all CPUs to re-evaluate tick dependency before change */
+ for_each_online_cpu(cpu)
+ tick_nohz_full_kick_cpu(cpu);
+
+ cpumask_copy(tick_nohz_full_mask, non_housekeeping_mask);
+ tick_nohz_full_running = !cpumask_empty(tick_nohz_full_mask);
+
+ /*
+ * If nohz_full is running, the timer duty must be on a housekeeper.
+ * If the current timer CPU is not a housekeeper, or no duty is assigned,
+ * pick the first housekeeper and assign it.
+ */
+ if (tick_nohz_full_running) {
+ int timer_cpu = READ_ONCE(tick_do_timer_cpu);
+ if (timer_cpu == TICK_DO_TIMER_NONE ||
+ !cpumask_test_cpu(timer_cpu, upd->new_mask)) {
+ int next_timer = cpumask_first(upd->new_mask);
+ if (next_timer < nr_cpu_ids)
+ WRITE_ONCE(tick_do_timer_cpu, next_timer);
+ }
+ }
+
+ /* Kick all CPUs again to apply new nohz full state */
+ for_each_online_cpu(cpu)
+ tick_nohz_full_kick_cpu(cpu);
+
+ free_cpumask_var(non_housekeeping_mask);
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block tick_nohz_housekeeping_nb = {
+ .notifier_call = tick_nohz_housekeeping_reconfigure,
+};
+
void __init tick_nohz_init(void)
{
int cpu, ret;
- if (!tick_nohz_full_running)
- return;
-
- /*
- * Full dynticks uses IRQ work to drive the tick rescheduling on safe
- * locking contexts. But then we need IRQ work to raise its own
- * interrupts to avoid circular dependency on the tick.
- */
- if (!arch_irq_work_has_interrupt()) {
- pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ work self-IPIs\n");
- cpumask_clear(tick_nohz_full_mask);
- tick_nohz_full_running = false;
- return;
+ if (!tick_nohz_full_mask) {
+ if (!slab_is_available())
+ alloc_bootmem_cpumask_var(&tick_nohz_full_mask);
+ else
+ zalloc_cpumask_var(&tick_nohz_full_mask, GFP_KERNEL);
}
- if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) &&
- !IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) {
- cpu = smp_processor_id();
+ housekeeping_register_notifier(&tick_nohz_housekeeping_nb);
- if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
- pr_warn("NO_HZ: Clearing %d from nohz_full range "
- "for timekeeping\n", cpu);
- cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+ if (tick_nohz_full_running) {
+ /*
+ * Full dynticks uses IRQ work to drive the tick rescheduling on safe
+ * locking contexts. But then we need IRQ work to raise its own
+ * interrupts to avoid circular dependency on the tick.
+ */
+ if (!arch_irq_work_has_interrupt()) {
+ pr_warn("NO_HZ: Can't run full dynticks because arch doesn't support IRQ work self-IPIs\n");
+ cpumask_clear(tick_nohz_full_mask);
+ tick_nohz_full_running = false;
+ goto out;
}
+
+ if (IS_ENABLED(CONFIG_PM_SLEEP_SMP) &&
+ !IS_ENABLED(CONFIG_PM_SLEEP_SMP_NONZERO_CPU)) {
+ cpu = smp_processor_id();
+
+ if (cpumask_test_cpu(cpu, tick_nohz_full_mask)) {
+ pr_warn("NO_HZ: Clearing %d from nohz_full range "
+ "for timekeeping\n", cpu);
+ cpumask_clear_cpu(cpu, tick_nohz_full_mask);
+ }
+ }
+
+ pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
+ cpumask_pr_args(tick_nohz_full_mask));
}
- for_each_cpu(cpu, tick_nohz_full_mask)
+out:
+ for_each_possible_cpu(cpu)
ct_cpu_track_user(cpu);
ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
"kernel/nohz:predown", NULL,
tick_nohz_cpu_down);
WARN_ON(ret < 0);
- pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
- cpumask_pr_args(tick_nohz_full_mask));
}
#endif /* #ifdef CONFIG_NO_HZ_FULL */
@@ -1209,7 +1289,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
if (unlikely(report_idle_softirq()))
return false;
- if (tick_nohz_full_enabled()) {
+ if (tick_nohz_full_running) {
int tick_cpu = READ_ONCE(tick_do_timer_cpu);
/*
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 05/12] genirq: Support dynamic migration for managed interrupts
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (2 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 04/12] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 06/12] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Managed interrupts currently have their affinity determined once,
honoring boot-time isolation settings. There is no mechanism to migrate
them when housekeeping boundaries change at runtime.
Enable managed interrupts to respond dynamically to housekeeping updates.
This ensures that managed interrupts are migrated away from newly
isolated CPUs or redistributed when housekeeping CPUs are added.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/irq/manage.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 2e80724378267..31e263d9f40d0 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -2801,3 +2801,52 @@ bool irq_check_status_bit(unsigned int irq, unsigned int bitmask)
return res;
}
EXPORT_SYMBOL_GPL(irq_check_status_bit);
+
+#ifdef CONFIG_SMP
+static int irq_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ unsigned int irq;
+
+ if (action != HK_UPDATE_MASK || upd->type != HK_TYPE_MANAGED_IRQ)
+ return NOTIFY_OK;
+
+ irq_lock_sparse();
+ for_each_active_irq(irq) {
+ struct irq_data *irqd;
+ struct irq_desc *desc;
+
+ desc = irq_to_desc(irq);
+ if (!desc)
+ continue;
+
+ scoped_guard(raw_spinlock_irqsave, &desc->lock) {
+ irqd = irq_desc_get_irq_data(desc);
+ if (!irqd_affinity_is_managed(irqd) || !desc->action ||
+ !irq_data_get_irq_chip(irqd))
+ continue;
+
+ /*
+ * Re-apply existing affinity to honor the new
+ * housekeeping mask via __irq_set_affinity() logic.
+ */
+ irq_set_affinity_locked(irqd, irq_data_get_affinity_mask(irqd), false);
+ }
+ }
+ irq_unlock_sparse();
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block irq_housekeeping_nb = {
+ .notifier_call = irq_housekeeping_reconfigure,
+};
+
+static int __init irq_init_housekeeping_notifier(void)
+{
+ housekeeping_register_notifier(&irq_housekeeping_nb);
+ return 0;
+}
+core_initcall(irq_init_housekeeping_notifier);
+#endif
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 06/12] watchdog: Allow runtime toggle of lockup detector affinity
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (3 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 05/12] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 07/12] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
The hardlockup detector threads are affined to CPUs based on the
HK_TYPE_TIMER housekeeping mask at boot. If this mask is updated at
runtime, these threads remain on their original CPUs, potentially
running on isolated cores.
Synchronize watchdog thread affinity with HK_TYPE_TIMER updates.
This ensures that hardlockup detector threads correctly follow the
dynamic housekeeping boundaries for timers.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/watchdog.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7d675781bc917..bcd8373038126 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -26,6 +26,7 @@
#include <linux/sysctl.h>
#include <linux/tick.h>
#include <linux/sys_info.h>
+#include <linux/sched/isolation.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
@@ -1361,6 +1362,30 @@ static int __init lockup_detector_check(void)
}
late_initcall_sync(lockup_detector_check);
+static int watchdog_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ if (action == HK_UPDATE_MASK) {
+ struct housekeeping_update *upd = data;
+ unsigned int type = upd->type;
+
+ if (type == HK_TYPE_TIMER) {
+ mutex_lock(&watchdog_mutex);
+ cpumask_copy(&watchdog_cpumask,
+ housekeeping_cpumask(HK_TYPE_TIMER));
+ cpumask_and(&watchdog_cpumask, &watchdog_cpumask, cpu_possible_mask);
+ __lockup_detector_reconfigure(false);
+ mutex_unlock(&watchdog_mutex);
+ }
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block watchdog_housekeeping_nb = {
+ .notifier_call = watchdog_housekeeping_reconfigure,
+};
+
void __init lockup_detector_init(void)
{
if (tick_nohz_full_enabled())
@@ -1375,4 +1400,5 @@ void __init lockup_detector_init(void)
allow_lockup_detector_init_retry = true;
lockup_detector_setup();
+ housekeeping_register_notifier(&watchdog_housekeeping_nb);
}
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 07/12] sched/core: Dynamically update scheduler domain housekeeping mask
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (4 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 06/12] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 08/12] workqueue, mm: Support dynamic housekeeping mask updates Qiliang Yuan
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Scheduler domains rely on HK_TYPE_DOMAIN to identify which CPUs are
isolated from general load balancing. Currently, these boundaries are
static and determined only during boot-time domain initialization.
Trigger a scheduler domain rebuild when the HK_TYPE_DOMAIN mask changes.
This ensures that scheduler isolation boundaries can be reconfigured
at runtime via the DHEI sysfs or cpuset interface.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/sched/core.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 496dff740dcaf..b71c433bbc420 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -39,6 +39,7 @@
#include <linux/sched/nohz.h>
#include <linux/sched/rseq_api.h>
#include <linux/sched/rt.h>
+#include <linux/sched/topology.h>
#include <linux/blkdev.h>
#include <linux/context_tracking.h>
@@ -10959,3 +10960,25 @@ void sched_change_end(struct sched_change_ctx *ctx)
p->sched_class->prio_changed(rq, p, ctx->prio);
}
}
+
+static int sched_housekeeping_update(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *update = data;
+
+ if (action == HK_UPDATE_MASK && update->type == HK_TYPE_DOMAIN)
+ rebuild_sched_domains();
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block sched_housekeeping_nb = {
+ .notifier_call = sched_housekeeping_update,
+};
+
+static int __init sched_housekeeping_init(void)
+{
+ housekeeping_register_notifier(&sched_housekeeping_nb);
+ return 0;
+}
+late_initcall(sched_housekeeping_init);
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 08/12] workqueue, mm: Support dynamic housekeeping mask updates
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (5 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 07/12] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 09/12] cgroup/cpuset: Introduce CPUSet-driven dynamic housekeeping (DHM) Qiliang Yuan
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Unbound workqueues and kcompactd threads determine their default CPU
affinity from housekeeping masks (HK_TYPE_WQ, HK_TYPE_DOMAIN, and
HK_TYPE_KTHREAD) at boot. Currently, these boundaries are static and
are not updated if housekeeping is reconfigured at runtime.
Implement housekeeping notifiers for both workqueue and mm compaction.
This ensures that unbound workqueue tasks and background compaction
threads honor dynamic isolation boundaries configured via sysfs or
cpuset at runtime.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/workqueue.c | 42 ++++++++++++++++++++++++++++++++++++++++++
mm/compaction.c | 27 +++++++++++++++++++++++++++
2 files changed, 69 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index eda756556341a..354e788004b48 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -8008,6 +8008,47 @@ static void __init wq_cpu_intensive_thresh_init(void)
wq_cpu_intensive_thresh_us = thresh;
}
+static int wq_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ if (action == HK_UPDATE_MASK) {
+ struct housekeeping_update *upd = data;
+ unsigned int type = upd->type;
+
+ if (type == HK_TYPE_WQ || type == HK_TYPE_DOMAIN) {
+ cpumask_var_t cpumask;
+
+ if (!alloc_cpumask_var(&cpumask, GFP_KERNEL)) {
+ pr_warn("workqueue: failed to allocate cpumask for housekeeping update\n");
+ return NOTIFY_BAD;
+ }
+
+ cpumask_copy(cpumask, cpu_possible_mask);
+ if (!cpumask_empty(housekeeping_cpumask(HK_TYPE_WQ)))
+ cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_WQ));
+ if (!cpumask_empty(housekeeping_cpumask(HK_TYPE_DOMAIN)))
+ cpumask_and(cpumask, cpumask, housekeeping_cpumask(HK_TYPE_DOMAIN));
+
+ workqueue_set_unbound_cpumask(cpumask);
+
+ if (type == HK_TYPE_DOMAIN) {
+ apply_wqattrs_lock();
+ cpumask_andnot(wq_isolated_cpumask, cpu_possible_mask,
+ housekeeping_cpumask(HK_TYPE_DOMAIN));
+ apply_wqattrs_unlock();
+ }
+
+ free_cpumask_var(cpumask);
+ }
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block wq_housekeeping_nb = {
+ .notifier_call = wq_housekeeping_reconfigure,
+};
+
/**
* workqueue_init - bring workqueue subsystem fully online
*
@@ -8068,6 +8109,7 @@ void __init workqueue_init(void)
wq_online = true;
wq_watchdog_init();
+ housekeeping_register_notifier(&wq_housekeeping_nb);
}
/*
diff --git a/mm/compaction.c b/mm/compaction.c
index 1e8f8eca318c6..574ee3c6dc942 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -24,6 +24,7 @@
#include <linux/page_owner.h>
#include <linux/psi.h>
#include <linux/cpuset.h>
+#include <linux/sched/isolation.h>
#include "internal.h"
#ifdef CONFIG_COMPACTION
@@ -3246,6 +3247,7 @@ void __meminit kcompactd_run(int nid)
pr_err("Failed to start kcompactd on node %d\n", nid);
pgdat->kcompactd = NULL;
} else {
+ housekeeping_affine(pgdat->kcompactd, HK_TYPE_KTHREAD);
wake_up_process(pgdat->kcompactd);
}
}
@@ -3320,6 +3322,30 @@ static const struct ctl_table vm_compaction[] = {
},
};
+static int kcompactd_housekeeping_reconfigure(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct housekeeping_update *upd = data;
+ unsigned int type = upd->type;
+
+ if (action == HK_UPDATE_MASK && type == HK_TYPE_KTHREAD) {
+ int nid;
+
+ for_each_node_state(nid, N_MEMORY) {
+ pg_data_t *pgdat = NODE_DATA(nid);
+
+ if (pgdat->kcompactd)
+ housekeeping_affine(pgdat->kcompactd, HK_TYPE_KTHREAD);
+ }
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block kcompactd_housekeeping_nb = {
+ .notifier_call = kcompactd_housekeeping_reconfigure,
+};
+
static int __init kcompactd_init(void)
{
int nid;
@@ -3327,6 +3353,7 @@ static int __init kcompactd_init(void)
for_each_node_state(nid, N_MEMORY)
kcompactd_run(nid);
register_sysctl_init("vm", vm_compaction);
+ housekeeping_register_notifier(&kcompactd_housekeeping_nb);
return 0;
}
subsys_initcall(kcompactd_init)
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 09/12] cgroup/cpuset: Introduce CPUSet-driven dynamic housekeeping (DHM)
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (6 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 08/12] workqueue, mm: Support dynamic housekeeping mask updates Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 10/12] cgroup/cpuset: Implement SMT-aware grouping and safety guards Qiliang Yuan
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Currently, subsystem housekeeping masks are generally static and can
only be configured via boot-time parameters (e.g., isolcpus, nohz_full).
This inflexible approach forces a system reboot whenever an orchestrator
needs to change workload isolation boundaries.
This patch introduces CPUSet-driven Dynamic Housekeeping Management (DHM)
by exposing the `cpuset.housekeeping.cpus` control file on the root cgroup.
Writing a new cpumask to this file dynamically updates the housekeeping
masks of all registered subsystems (scheduler, RCU, timers, tick, workqueues,
and managed IRQs) simultaneously, without restarting the node.
At the cpuset and isolation core level, this change implements:
1. `housekeeping_update_all_types(const struct cpumask *new_mask)` API inside
`isolation.c` to safely allocate, update, and replace all enabled hk_type masks.
2. The `cpuset.housekeeping.cpus` attribute in `dfl_files` for the root cpuset.
3. Hooking the write operation to iterate over enabled housekeeping types
and invoke `housekeeping_update_notify()` (the DHM notifier chain) to
push these configuration changes live into individual kernel subsystems.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
include/linux/sched/isolation.h | 12 ++++++++++++
kernel/cgroup/cpuset-internal.h | 1 +
kernel/cgroup/cpuset.c | 36 ++++++++++++++++++++++++++++++++++++
kernel/sched/isolation.c | 38 ++++++++++++++++++++++++++++++++++++++
4 files changed, 87 insertions(+)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index aea1dbc4d7486..299167f627895 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -48,6 +48,8 @@ extern void __init housekeeping_init(void);
extern int housekeeping_register_notifier(struct notifier_block *nb);
extern int housekeeping_unregister_notifier(struct notifier_block *nb);
+extern int housekeeping_update_notify(enum hk_type type, const struct cpumask *new_mask);
+extern int housekeeping_update_all_types(const struct cpumask *new_mask);
#else
@@ -86,6 +88,16 @@ static inline int housekeeping_unregister_notifier(struct notifier_block *nb)
{
return 0;
}
+
+static inline int housekeeping_update_notify(enum hk_type type, const struct cpumask *new_mask)
+{
+ return 0;
+}
+
+static inline int housekeeping_update_all_types(const struct cpumask *new_mask)
+{
+ return 0;
+}
#endif /* CONFIG_CPU_ISOLATION */
static inline bool housekeeping_cpu(int cpu, enum hk_type type)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index fd7d19842ded7..3ab437f54ecdf 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -60,6 +60,7 @@ typedef enum {
FILE_EXCLUSIVE_CPULIST,
FILE_EFFECTIVE_XCPULIST,
FILE_ISOLATED_CPULIST,
+ FILE_HOUSEKEEPING_CPULIST,
FILE_CPU_EXCLUSIVE,
FILE_MEM_EXCLUSIVE,
FILE_MEM_HARDWALL,
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1335e437098e8..5df19dc9bfa89 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3201,6 +3201,30 @@ static void cpuset_attach(struct cgroup_taskset *tset)
mutex_unlock(&cpuset_mutex);
}
+/*
+ * DHM interface: root cpuset allows updating global housekeeping cpumask.
+ */
+static ssize_t cpuset_write_housekeeping_cpus(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ cpumask_var_t new_mask;
+ int retval;
+
+ if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ buf = strstrip(buf);
+ retval = cpulist_parse(buf, new_mask);
+ if (retval)
+ goto out_free;
+
+ retval = housekeeping_update_all_types(new_mask);
+
+out_free:
+ free_cpumask_var(new_mask);
+ return retval ?: nbytes;
+}
+
/*
* Common handling for a write to a "cpus" or "mems" file.
*/
@@ -3290,6 +3314,9 @@ int cpuset_common_seq_show(struct seq_file *sf, void *v)
case FILE_ISOLATED_CPULIST:
seq_printf(sf, "%*pbl\n", cpumask_pr_args(isolated_cpus));
break;
+ case FILE_HOUSEKEEPING_CPULIST:
+ seq_printf(sf, "%*pbl\n", cpumask_pr_args(housekeeping_cpumask(HK_TYPE_DOMAIN)));
+ break;
default:
ret = -EINVAL;
}
@@ -3428,6 +3455,15 @@ static struct cftype dfl_files[] = {
.flags = CFTYPE_ONLY_ON_ROOT,
},
+ {
+ .name = "housekeeping.cpus",
+ .seq_show = cpuset_common_seq_show,
+ .write = cpuset_write_housekeeping_cpus,
+ .max_write_len = (100U + 6 * NR_CPUS),
+ .private = FILE_HOUSEKEEPING_CPULIST,
+ .flags = CFTYPE_ONLY_ON_ROOT,
+ },
+
{ } /* terminate */
};
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 0462b41807161..a92b0bb41de3a 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -27,6 +27,7 @@ enum hk_flags {
#define HK_FLAG_KERNEL_NOISE (HK_FLAG_TICK | HK_FLAG_TIMER | HK_FLAG_RCU | \
HK_FLAG_MISC | HK_FLAG_WQ | HK_FLAG_KTHREAD)
+static DEFINE_MUTEX(housekeeping_mutex);
static BLOCKING_NOTIFIER_HEAD(housekeeping_notifier_list);
DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
@@ -196,6 +197,43 @@ int housekeeping_update_notify(enum hk_type type, const struct cpumask *new_mask
}
EXPORT_SYMBOL_GPL(housekeeping_update_notify);
+int housekeeping_update_all_types(const struct cpumask *new_mask)
+{
+ enum hk_type type;
+ struct cpumask *old_masks[HK_TYPE_MAX] = { NULL };
+
+ if (cpumask_empty(new_mask) || !cpumask_intersects(new_mask, cpu_online_mask))
+ return -EINVAL;
+
+ if (!housekeeping.flags)
+ static_branch_enable(&housekeeping_overridden);
+
+ mutex_lock(&housekeeping_mutex);
+ for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) {
+ struct cpumask *nmask = kmalloc(cpumask_size(), GFP_KERNEL);
+
+ if (!nmask) {
+ mutex_unlock(&housekeeping_mutex);
+ return -ENOMEM;
+ }
+
+ cpumask_copy(nmask, new_mask);
+ old_masks[type] = housekeeping_cpumask_dereference(type);
+ rcu_assign_pointer(housekeeping.cpumasks[type], nmask);
+ }
+ mutex_unlock(&housekeeping_mutex);
+
+ synchronize_rcu();
+
+ for_each_set_bit(type, &housekeeping.flags, HK_TYPE_MAX) {
+ housekeeping_update_notify(type, new_mask);
+ kfree(old_masks[type]);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(housekeeping_update_all_types);
+
void __init housekeeping_init(void)
{
enum hk_type type;
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 10/12] cgroup/cpuset: Implement SMT-aware grouping and safety guards
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (7 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 09/12] cgroup/cpuset: Introduce CPUSet-driven dynamic housekeeping (DHM) Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 11/12] Documentation: cgroup-v2: Document dynamic housekeeping (DHM) Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 12/12] selftests: cgroup: Add functional tests for dynamic housekeeping Qiliang Yuan
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Dynamic Housekeeping Management allows runtime configuration of kernel overhead
isolation boundaries. However, configuring CPUMASKs that separate SMT siblings
(e.g., placing one hardware thread in the housekeeping mask and leaving
the other isolated) can lead to severe performance degradation due to
shared L1 caches and pipeline resources.
This patch introduces `cpuset.housekeeping.smt_aware`, a robust safety guard
to prevent user-space from splitting SMT sibling pairs across isolation boundaries.
When `cpuset.housekeeping.smt_aware` is enabled (1):
- Any write to `cpuset.housekeeping.cpus` must include all SMT siblings
for each CPU presented in the new mask (verified via `topology_sibling_cpumask`).
- If an invalid mask is supplied, the write operation is aborted with `-EINVAL`.
This ensures the kernel's housekeeping constraints are met while maintaining
maximum hardware thread efficiency.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
kernel/cgroup/cpuset-internal.h | 1 +
kernel/cgroup/cpuset.c | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 3ab437f54ecdf..162594eaf8467 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -61,6 +61,7 @@ typedef enum {
FILE_EFFECTIVE_XCPULIST,
FILE_ISOLATED_CPULIST,
FILE_HOUSEKEEPING_CPULIST,
+ FILE_HOUSEKEEPING_SMT_AWARE,
FILE_CPU_EXCLUSIVE,
FILE_MEM_EXCLUSIVE,
FILE_MEM_HARDWALL,
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5df19dc9bfa89..4272bb298ec3d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -37,6 +37,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <linux/task_work.h>
+#include <linux/topology.h>
DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key);
DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key);
@@ -156,6 +157,9 @@ static bool update_housekeeping; /* RWCS */
*/
static cpumask_var_t isolated_hk_cpus; /* T */
+/* DHM: Enable SMT-aware boundary checks */
+static bool cpuset_housekeeping_smt_aware = false;
+
/*
* A flag to force sched domain rebuild at the end of an operation.
* It can be set in
@@ -3218,6 +3222,16 @@ static ssize_t cpuset_write_housekeeping_cpus(struct kernfs_open_file *of,
if (retval)
goto out_free;
+ if (cpuset_housekeeping_smt_aware) {
+ int cpu;
+ for_each_cpu(cpu, new_mask) {
+ if (!cpumask_subset(topology_sibling_cpumask(cpu), new_mask)) {
+ retval = -EINVAL;
+ goto out_free;
+ }
+ }
+ }
+
retval = housekeeping_update_all_types(new_mask);
out_free:
@@ -3225,6 +3239,18 @@ static ssize_t cpuset_write_housekeeping_cpus(struct kernfs_open_file *of,
return retval ?: nbytes;
}
+static ssize_t cpuset_write_housekeeping_smt_aware(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ bool val;
+
+ if (kstrtobool(buf, &val))
+ return -EINVAL;
+
+ cpuset_housekeeping_smt_aware = val;
+ return nbytes;
+}
+
/*
* Common handling for a write to a "cpus" or "mems" file.
*/
@@ -3317,6 +3343,9 @@ int cpuset_common_seq_show(struct seq_file *sf, void *v)
case FILE_HOUSEKEEPING_CPULIST:
seq_printf(sf, "%*pbl\n", cpumask_pr_args(housekeeping_cpumask(HK_TYPE_DOMAIN)));
break;
+ case FILE_HOUSEKEEPING_SMT_AWARE:
+ seq_printf(sf, "%d\n", cpuset_housekeeping_smt_aware);
+ break;
default:
ret = -EINVAL;
}
@@ -3464,6 +3493,14 @@ static struct cftype dfl_files[] = {
.flags = CFTYPE_ONLY_ON_ROOT,
},
+ {
+ .name = "housekeeping.smt_aware",
+ .seq_show = cpuset_common_seq_show,
+ .write = cpuset_write_housekeeping_smt_aware,
+ .private = FILE_HOUSEKEEPING_SMT_AWARE,
+ .flags = CFTYPE_ONLY_ON_ROOT,
+ },
+
{ } /* terminate */
};
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 11/12] Documentation: cgroup-v2: Document dynamic housekeeping (DHM)
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (8 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 10/12] cgroup/cpuset: Implement SMT-aware grouping and safety guards Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 12/12] selftests: cgroup: Add functional tests for dynamic housekeeping Qiliang Yuan
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
Update the admin-guide for cgroup-v2 to explicitly document the newly introduced
cpuset.housekeeping.cpus and cpuset.housekeeping.smt_aware files.
The documentation explains the use of the DHM framework for reconfiguring
kernel subsystem isolation masks natively through the root cpuset without
incurring system reboots, and describes the functional restrictions of
SMT grouping safety constraints.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
Documentation/admin-guide/cgroup-v2.rst | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 91beaa6798ce0..deb644b88509f 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2592,6 +2592,30 @@ Cpuset Interface Files
isolated partitions. It will be empty if no isolated partition
is created.
+ cpuset.housekeeping.cpus
+ A read-write multiple values file that exists only on the root cgroup.
+
+ This file is part of the Dynamic Housekeeping Management (DHM)
+ framework. It allows dynamic reconfiguration of the global
+ kernel housekeeping CPU mask without a system reboot.
+
+ By writing a mask of CPUs (e.g. "0-3,8"), DHM will update all internal
+ housekeeping subsystem masks (scheduler domains, RCU NOCB, tick offload,
+ timers, unbound workqueues, and managed IRQs) in real time.
+
+ The new mask must have at least one online CPU. The value stays constant
+ until changed or affected by CPU hot-unplug.
+
+ cpuset.housekeeping.smt_aware
+ A read-write single value file that exists only on the root cgroup.
+ It accepts "0" or "1". The default value is "0" (false).
+
+ This file enables the SMT-aware pipeline logic for DHM. When enabled (1),
+ any update to "cpuset.housekeeping.cpus" is strictly validated to ensure
+ Hardware Threads (SMT siblings) are kept together. If an SMT sibling pair
+ is split across the housekeeping boundary, the mask update is rejected
+ with an error to avoid severe cache and pipeline contention penalties.
+
cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 12/12] selftests: cgroup: Add functional tests for dynamic housekeeping
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
` (9 preceding siblings ...)
2026-04-13 7:43 ` [PATCH v2 11/12] Documentation: cgroup-v2: Document dynamic housekeeping (DHM) Qiliang Yuan
@ 2026-04-13 7:43 ` Qiliang Yuan
10 siblings, 0 replies; 12+ messages in thread
From: Qiliang Yuan @ 2026-04-13 7:43 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Paul E. McKenney, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng,
Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Anna-Maria Behnsen, Ingo Molnar, Thomas Gleixner, Tejun Heo,
Andrew Morton, Vlastimil Babka, Suren Baghdasaryan, Michal Hocko,
Brendan Jackman, Johannes Weiner, Zi Yan, Waiman Long,
Chen Ridong, Michal Koutný,
Jonathan Corbet, Shuah Khan, Shuah Khan
Cc: linux-kernel, rcu, linux-mm, cgroups, linux-doc, linux-kselftest,
Qiliang Yuan
This extends the cgroup v2 testing framework in selftests to validate the
newly added Dynamic Housekeeping Management (DHM) cpuset interface:
`cpuset.housekeeping.cpus` and `cpuset.housekeeping.smt_aware`.
The `test_cpuset_housekeeping` functional test verifies:
- Validation of DHM's SMT safety guard (`cpuset.housekeeping.smt_aware`)
by ensuring writing to it behaves as expected.
- Basic read and write capabilities of `cpuset.housekeeping.cpus` using
the base CPU mask.
If the DHM functionality is not present in the kernel, the selftest skips gracefully.
Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
tools/testing/selftests/cgroup/test_cpuset.c | 36 ++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_cpuset.c b/tools/testing/selftests/cgroup/test_cpuset.c
index c5cf8b56ceb8f..b2a032be4407a 100644
--- a/tools/testing/selftests/cgroup/test_cpuset.c
+++ b/tools/testing/selftests/cgroup/test_cpuset.c
@@ -232,6 +232,41 @@ static int test_cpuset_perms_subtree(const char *root)
return ret;
}
+static int test_cpuset_housekeeping(const char *root)
+{
+ char buf[PAGE_SIZE];
+ int ret = KSFT_FAIL;
+
+ /* If the kernel doesn't have DHM patch, skip */
+ if (cg_read(root, "cpuset.housekeeping.cpus", buf, sizeof(buf)))
+ return KSFT_SKIP;
+
+ /* Test writing 1 and 0 to smt_aware */
+ if (cg_write(root, "cpuset.housekeeping.smt_aware", "1"))
+ goto cleanup;
+
+ if (cg_read_strstr(root, "cpuset.housekeeping.smt_aware", "1"))
+ goto cleanup;
+
+ if (cg_write(root, "cpuset.housekeeping.smt_aware", "0"))
+ goto cleanup;
+
+ if (cg_read_strstr(root, "cpuset.housekeeping.smt_aware", "0"))
+ goto cleanup;
+
+ /* Read root cpuset.cpus.effective */
+ if (cg_read(root, "cpuset.cpus.effective", buf, sizeof(buf)))
+ goto cleanup;
+
+ /* Write it back to housekeeping.cpus */
+ if (cg_write(root, "cpuset.housekeeping.cpus", buf))
+ goto cleanup;
+
+ ret = KSFT_PASS;
+
+cleanup:
+ return ret;
+}
#define T(x) { x, #x }
struct cpuset_test {
@@ -241,6 +276,7 @@ struct cpuset_test {
T(test_cpuset_perms_object_allow),
T(test_cpuset_perms_object_deny),
T(test_cpuset_perms_subtree),
+ T(test_cpuset_housekeeping),
};
#undef T
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-13 7:45 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-13 7:43 [PATCH v2 00/12] Dynamic Housekeeping Management (DHM) via CPUSets Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 02/12] sched/isolation: Introduce housekeeping notifier infrastructure Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 03/12] rcu: Support runtime NOCB initialization and dynamic offloading Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 04/12] tick/nohz: Transition to dynamic full dynticks state management Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 05/12] genirq: Support dynamic migration for managed interrupts Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 06/12] watchdog: Allow runtime toggle of lockup detector affinity Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 07/12] sched/core: Dynamically update scheduler domain housekeeping mask Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 08/12] workqueue, mm: Support dynamic housekeeping mask updates Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 09/12] cgroup/cpuset: Introduce CPUSet-driven dynamic housekeeping (DHM) Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 10/12] cgroup/cpuset: Implement SMT-aware grouping and safety guards Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 11/12] Documentation: cgroup-v2: Document dynamic housekeeping (DHM) Qiliang Yuan
2026-04-13 7:43 ` [PATCH v2 12/12] selftests: cgroup: Add functional tests for dynamic housekeeping Qiliang Yuan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox