From: Marcelo Tosatti <mtosatti@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.com>, Michal Hocko <mhocko@suse.com>,
Leonardo Bras <leobras.c@gmail.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Vlastimil Babka <vbabka@suse.cz>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Leonardo Bras <leobras@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
Frederic Weisbecker <fweisbecker@suse.de>,
Waiman Long <llong@redhat.com>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
Date: Tue, 24 Feb 2026 15:12:32 -0300 [thread overview]
Message-ID: <aZ3qEHzgI8Zuv7IU@tpad> (raw)
In-Reply-To: <aZ24eAiQpo64-0Kz@pavilion.home>
On Tue, Feb 24, 2026 at 03:40:56PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 20, 2026 at 02:35:41PM -0300, Marcelo Tosatti a écrit :
> >
> > I am not sure its safe to assume that. Ask Gemini about isolcpus use
>
> Erm... ok fine let's see that :-)
>
> > cases and:
> >
> > 1. High-Frequency Trading (HFT)
> > In the world of HFT, microseconds are the difference between profit and loss.
> > Traders use isolcpus to pin their execution engines to specific cores.
> >
> > The Goal: Eliminate "jitter" caused by the OS moving other processes onto the same core.
> >
> > The Benefit: Guaranteed execution time and ultra-low latency.
>
> That would be full isolation (aka nohz_full) because the goal here is to beat
> the competitors. As such the software latency must tend toward hardware latency.
>
> I wouldn't expect any syscall here but a full userspace stack with DPDK for
> example.
>
> I put that in the 5g uRLLC (or similar low latency networking) usecase family.
>
> >
> > 2. Real-Time Audio & Video Processing
> > If you are running a Digital Audio Workstation (DAW) or a live video encoding rig, a tiny "hiccup" in CPU availability results in an audible pop or a dropped frame.
> >
> > The Goal: Reserve cores specifically for the Digital Signal Processor (DSP) or the encoder.
> >
> > The Benefit: Smooth, glitch-free media streams even when the rest of the
> > system is busy.
>
> Here I expect weaker isolation requirements with syscalls involved. Scheduler
> domain isolation alone (aka isolcpus=[domain]) would fit.
>
> >
> > 3. Network Function Virtualization (NFV) & DPDK
> > For high-speed networking (like 10Gbps+ traffic), the Data Plane Development Kit (DPDK) uses "poll mode" drivers. These drivers constantly loop to check for new packets rather than waiting for interrupts.
> >
> > The Goal: Isolate cores so they can run at 100% utilization just checking for network packets.
> >
> > The Benefit: Maximum throughput and zero packet loss in high-traffic
> > environments.
>
> I put that in the 5g uRLLC usecase family as well (again or similar low latency networking).
>
> > 4. Gaming & Simulation
> > Competitive gamers or flight simulator enthusiasts sometimes isolate a few cores to handle the game's main thread, while leaving the rest of the OS (Discord, Chrome, etc.) to the remaining cores.
> >
> > The Goal: Prevent background Windows/Linux tasks from stealing cycles from the game engine.
> >
> > The Benefit: More consistent 1% low FPS and reduced input lag.
>
> That's domain isolation because frequent syscalls are unavoidable.
>
> >
> > 5. Deterministic Scientific Computing
> > If you're running a simulation that needs to take exactly the same amount of time every time it runs (for benchmarking or safety-critical testing), you can't have the OS interference messing with your metrics.
> >
> > The Goal: Remove the variability of the Linux scheduler.
> >
> > The Benefit: Highly repeatable, deterministic results.
>
> I guess here there are plenty of flavours. The only one I know of is this
> power simulator that relies of nohz_full. Not sure about the implementation
> relying on syscalls or not:
>
> https://dpsim.fein-aachen.org/docs/getting-started/real-time/
>
> > For example, AF_XDP bypass uses system calls (and wants isolcpus):
> >
> > https://www.quantvps.com/blog/kernel-bypass-in-hft?srsltid=AfmBOoryeSxuuZjzTJIC9O-Ag8x4gSwjs-V4Xukm2wQpGmwDJ6t4szuE
>
> That's HFT again and they state that they rely on polling userspace drivers so
> I don't expect syscalls.
>
> But anyway here is a summary I would propose:
>
> * Domain isolation alone is a good fit when some glitches must be avoided but
> kernel work is still necessary: non critical high volume networking or data
> capture, video games, etc...
>
> * Full isolation is a better fit for ultra low latency requirement, in this case
> the kernel is only good for preparatory work and interface layout between
> userspace and the hardware (VFIO).
>
> I've observed 3 patterns so far:
>
> - Low latency networking with DPDK, eg: 5g uRLLC (should be syscalls free)
> - Scientific simulation (not sure about syscalls)
> - HPC computation such as LLM (not sure about syscalls).
>
> Is flushing work only relevant for full isolation? If so I can't say which is
> the best solution between flushing pending work on syscall exit and doing that
> remotely. But if it's relevant also for domain isolation, then the remote
> work is better because it doesn't add unecessary work on syscalls which still
> happen in this mode.
Yes, see my last email about HPC.
> At least doing things remotely should be free of any surprising side-effects.
> But we must determine how to properly activate the isolated mode (switch to
> spinlocks) depending on the isolation mode which can be not only defined
> on boot but also on runtime (at least for domain isolation through cpusets
> but it will be the case as well with nohz_full in the future).
>
> Thanks.
If you boot with remote spinlocks (qpw=1) today, then you can't change
that.
You could, because its a static key:
#define qpw_lock(lock, cpu) \
do { \
if (static_branch_maybe(CONFIG_QPW_DEFAULT, &qpw_sl)) \
spin_lock(per_cpu_ptr(lock.sl, cpu)); \
else \
local_lock(lock.ll); \
} while (0)
But haven't thought about switching on runtime (and don't see why it
would be necessary to switch on runtime). It is independent of
switching CPUs to/from being isolated (or nohz_full).
OK will address the remaining comments and repost.
next prev parent reply other threads:[~2026-02-24 18:26 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 14:34 Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 1/4] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-02-06 15:20 ` Marcelo Tosatti
2026-02-07 0:16 ` Leonardo Bras
2026-02-11 12:09 ` Marcelo Tosatti
2026-02-14 21:32 ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 2/4] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 3/4] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-02-07 1:06 ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 4/4] slub: " Marcelo Tosatti
2026-02-07 1:27 ` Leonardo Bras
2026-02-06 23:56 ` [PATCH 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2026-02-10 14:01 ` Michal Hocko
2026-02-11 12:01 ` Marcelo Tosatti
2026-02-11 12:11 ` Marcelo Tosatti
2026-02-14 21:35 ` Leonardo Bras
2026-02-11 16:38 ` Michal Hocko
2026-02-11 16:50 ` Marcelo Tosatti
2026-02-11 16:59 ` Vlastimil Babka
2026-02-11 17:07 ` Michal Hocko
2026-02-14 22:02 ` Leonardo Bras
2026-02-16 11:00 ` Michal Hocko
2026-02-19 15:27 ` Marcelo Tosatti
2026-02-19 19:30 ` Michal Hocko
2026-02-20 14:30 ` Marcelo Tosatti
2026-02-23 9:18 ` Michal Hocko
2026-02-23 21:56 ` Frederic Weisbecker
2026-02-24 17:23 ` Marcelo Tosatti
2026-02-20 10:48 ` Vlastimil Babka
2026-02-20 12:31 ` Michal Hocko
2026-02-20 17:35 ` Marcelo Tosatti
2026-02-20 17:58 ` Vlastimil Babka
2026-02-20 19:01 ` Marcelo Tosatti
2026-02-23 9:11 ` Michal Hocko
2026-02-23 11:20 ` Marcelo Tosatti
2026-02-24 14:40 ` Frederic Weisbecker
2026-02-24 18:12 ` Marcelo Tosatti [this message]
2026-02-20 16:51 ` Marcelo Tosatti
2026-02-20 16:55 ` Marcelo Tosatti
2026-02-20 22:38 ` Leonardo Bras
2026-02-23 18:09 ` Vlastimil Babka
2026-02-20 21:58 ` Leonardo Bras
2026-02-23 9:06 ` Michal Hocko
2026-02-19 13:15 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZ3qEHzgI8Zuv7IU@tpad \
--to=mtosatti@redhat.com \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=frederic@kernel.org \
--cc=fweisbecker@suse.de \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=leobras.c@gmail.com \
--cc=leobras@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=llong@redhat.com \
--cc=longman@redhat.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox