From: Frederic Weisbecker <frederic@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Michal Hocko <mhocko@kernel.org>, Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Valentin Schneider <vschneid@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org
Subject: [PATCH 5/6] sched/isolation: Introduce isolated task work
Date: Thu, 3 Jul 2025 16:07:16 +0200 [thread overview]
Message-ID: <20250703140717.25703-6-frederic@kernel.org> (raw)
In-Reply-To: <20250703140717.25703-1-frederic@kernel.org>
Some asynchronous kernel work may be pending upon resume to userspace
and execute later on. On isolated workload this becomes problematic once
the process is done with preparatory work involving syscalls and wants
to run in userspace without being interrupted.
Provide an infrastructure to queue a work to be executed from the current
isolated task context right before resuming to userspace. This goes with
the assumption that isolated tasks are pinned to a single nohz_full CPU.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
include/linux/sched.h | 4 ++++
include/linux/sched/isolation.h | 17 +++++++++++++++++
kernel/sched/core.c | 1 +
kernel/sched/isolation.c | 23 +++++++++++++++++++++++
kernel/sched/sched.h | 1 +
kernel/time/Kconfig | 12 ++++++++++++
6 files changed, 58 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 117aa20b8fb6..931065b5744f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1448,6 +1448,10 @@ struct task_struct {
atomic_t tick_dep_mask;
#endif
+#ifdef CONFIG_NO_HZ_FULL_WORK
+ struct callback_head nohz_full_work;
+#endif
+
#ifdef CONFIG_FAULT_INJECTION
int make_it_fail;
unsigned int fail_nth;
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index d8501f4709b5..9481b7d152c9 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -77,4 +77,21 @@ static inline bool cpu_is_isolated(int cpu)
cpuset_cpu_is_isolated(cpu);
}
+#if defined(CONFIG_NO_HZ_FULL_WORK)
+extern int __isolated_task_work_queue(void);
+
+static inline int isolated_task_work_queue(void)
+{
+ if (!housekeeping_cpu(raw_smp_processor_id(), HK_TYPE_KERNEL_NOISE))
+ return -ENOTSUPP;
+
+ return __isolated_task_work_queue();
+}
+
+extern void isolated_task_work_init(struct task_struct *tsk);
+#else
+static inline int isolated_task_work_queue(void) { return -ENOTSUPP; }
+static inline void isolated_task_work_init(struct task_struct *tsk) { }
+#endif /* CONFIG_NO_HZ_FULL_WORK */
+
#endif /* _LINUX_SCHED_ISOLATION_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 35783a486c28..eca8242bd81d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4538,6 +4538,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
p->migration_pending = NULL;
#endif
init_sched_mm_cid(p);
+ isolated_task_work_init(p);
}
DEFINE_STATIC_KEY_FALSE(sched_numa_balancing);
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 93b038d48900..d74c4ef91ce2 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -249,3 +249,26 @@ static int __init housekeeping_isolcpus_setup(char *str)
return housekeeping_setup(str, flags);
}
__setup("isolcpus=", housekeeping_isolcpus_setup);
+
+#ifdef CONFIG_NO_HZ_FULL_WORK
+static void isolated_task_work(struct callback_head *head)
+{
+}
+
+int __isolated_task_work_queue(void)
+{
+ if (current->flags & (PF_KTHREAD | PF_USER_WORKER | PF_IO_WORKER))
+ return -EINVAL;
+
+ guard(irqsave)();
+ if (task_work_queued(¤t->nohz_full_work))
+ return 0;
+
+ return task_work_add(current, ¤t->nohz_full_work, TWA_RESUME);
+}
+
+void isolated_task_work_init(struct task_struct *tsk)
+{
+ init_task_work(&tsk->nohz_full_work, isolated_task_work);
+}
+#endif /* CONFIG_NO_HZ_FULL_WORK */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 475bb5998295..50e0cada1e1b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -60,6 +60,7 @@
#include <linux/stop_machine.h>
#include <linux/syscalls_api.h>
#include <linux/syscalls.h>
+#include <linux/task_work.h>
#include <linux/tick.h>
#include <linux/topology.h>
#include <linux/types.h>
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index b0b97a60aaa6..34591fc50ab1 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -146,6 +146,18 @@ config NO_HZ_FULL
endchoice
+config NO_HZ_FULL_WORK
+ bool "Full dynticks work flush on kernel exit"
+ depends on NO_HZ_FULL
+ help
+ Selectively flush pending asynchronous kernel work upon user exit.
+ Assuming userspace is not performing any critical isolated work while
+ issuing syscalls, some per-CPU kernel works are flushed before resuming
+ to userspace so that they don't get remotely queued later when the CPU
+ doesn't want to be disturbed.
+
+ If in doubt say N.
+
config CONTEXT_TRACKING_USER
bool
depends on HAVE_CONTEXT_TRACKING_USER
--
2.48.1
next prev parent reply other threads:[~2025-07-03 14:07 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 14:07 [PATCH 0/6 v4] sched/mm: LRU drain flush on nohz_full Frederic Weisbecker
2025-07-03 14:07 ` [PATCH 1/6] task_work: Provide means to check if a work is queued Frederic Weisbecker
2025-07-03 14:07 ` [PATCH 2/6] sched/fair: Use task_work_queued() on numa_work Frederic Weisbecker
2025-07-03 14:07 ` [PATCH 3/6] sched: Use task_work_queued() on cid_work Frederic Weisbecker
2025-07-17 16:32 ` Valentin Schneider
2025-07-03 14:07 ` [PATCH 4/6] tick/nohz: Move nohz_full related fields out of hot task struct's places Frederic Weisbecker
2025-07-17 16:32 ` Valentin Schneider
2025-07-03 14:07 ` Frederic Weisbecker [this message]
2025-07-17 17:29 ` [PATCH 5/6] sched/isolation: Introduce isolated task work Vlastimil Babka
2025-07-18 9:52 ` Valentin Schneider
2025-07-18 14:23 ` Frederic Weisbecker
2025-07-03 14:07 ` [PATCH 6/6] mm: Drain LRUs upon resume to userspace on nohz_full CPUs Frederic Weisbecker
2025-07-03 14:24 ` Michal Hocko
2025-07-03 14:28 ` Matthew Wilcox
2025-07-03 16:12 ` Michal Hocko
2025-07-17 19:33 ` Vlastimil Babka
-- strict thread matches above, loose matches on Subject: below --
2025-04-10 15:23 [PATCH 0/6 v3] sched/mm: LRU drain flush on nohz_full Frederic Weisbecker
2025-04-10 15:23 ` [PATCH 5/6] sched/isolation: Introduce isolated task work Frederic Weisbecker
2025-04-11 10:25 ` Oleg Nesterov
2025-04-11 22:00 ` Frederic Weisbecker
2025-04-12 5:12 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250703140717.25703-6-frederic@kernel.org \
--to=frederic@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=mtosatti@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox