Re: [PATCH 0/4] Introduce QPW for per-cpu operations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Frederic Weisbecker <frederic@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.com>, Michal Hocko <mhocko@suse.com>,
	Leonardo Bras <leobras.c@gmail.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Leonardo Bras <leobras@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Frederic Weisbecker <fweisbecker@suse.de>,
	Waiman Long <llong@redhat.com>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
Date: Tue, 24 Feb 2026 15:40:56 +0100	[thread overview]
Message-ID: <aZ24eAiQpo64-0Kz@pavilion.home> (raw)
In-Reply-To: <aZibbYH7yrDZlnJh@tpad>

Le Fri, Feb 20, 2026 at 02:35:41PM -0300, Marcelo Tosatti a écrit :
> 
> I am not sure its safe to assume that. Ask Gemini about isolcpus use

Erm... ok fine let's see that :-)

> cases and:
> 
> 1. High-Frequency Trading (HFT)
> In the world of HFT, microseconds are the difference between profit and loss. 
> Traders use isolcpus to pin their execution engines to specific cores.
> 
> The Goal: Eliminate "jitter" caused by the OS moving other processes onto the same core.
> 
> The Benefit: Guaranteed execution time and ultra-low latency.

That would be full isolation (aka nohz_full) because the goal here is to beat
the competitors. As such the software latency must tend toward hardware latency.

I wouldn't expect any syscall here but a full userspace stack with DPDK for
example.

I put that in the 5g uRLLC (or similar low latency networking) usecase family.

> 
> 2. Real-Time Audio & Video Processing
> If you are running a Digital Audio Workstation (DAW) or a live video encoding rig, a tiny "hiccup" in CPU availability results in an audible pop or a dropped frame.
> 
> The Goal: Reserve cores specifically for the Digital Signal Processor (DSP) or the encoder.
> 
> The Benefit: Smooth, glitch-free media streams even when the rest of the
> system is busy.

Here I expect weaker isolation requirements with syscalls involved. Scheduler
domain isolation alone (aka isolcpus=[domain]) would fit.

> 
> 3. Network Function Virtualization (NFV) & DPDK
> For high-speed networking (like 10Gbps+ traffic), the Data Plane Development Kit (DPDK) uses "poll mode" drivers. These drivers constantly loop to check for new packets rather than waiting for interrupts.
> 
> The Goal: Isolate cores so they can run at 100% utilization just checking for network packets.
> 
> The Benefit: Maximum throughput and zero packet loss in high-traffic
> environments.

I put that in the 5g uRLLC usecase family as well (again or similar low latency networking).

> 4. Gaming & Simulation
> Competitive gamers or flight simulator enthusiasts sometimes isolate a few cores to handle the game's main thread, while leaving the rest of the OS (Discord, Chrome, etc.) to the remaining cores.
> 
> The Goal: Prevent background Windows/Linux tasks from stealing cycles from the game engine.
> 
> The Benefit: More consistent 1% low FPS and reduced input lag.

That's domain isolation because frequent syscalls are unavoidable.

> 
> 5. Deterministic Scientific Computing
> If you're running a simulation that needs to take exactly the same amount of time every time it runs (for benchmarking or safety-critical testing), you can't have the OS interference messing with your metrics.
> 
> The Goal: Remove the variability of the Linux scheduler.
> 
> The Benefit: Highly repeatable, deterministic results.

I guess here there are plenty of flavours. The only one I know of is this
power simulator that relies of nohz_full. Not sure about the implementation
relying on syscalls or not:

https://dpsim.fein-aachen.org/docs/getting-started/real-time/

> For example, AF_XDP bypass uses system calls (and wants isolcpus):
> 
> https://www.quantvps.com/blog/kernel-bypass-in-hft?srsltid=AfmBOoryeSxuuZjzTJIC9O-Ag8x4gSwjs-V4Xukm2wQpGmwDJ6t4szuE

That's HFT again and they state that they rely on polling userspace drivers so
I don't expect syscalls.

But anyway here is a summary I would propose:

* Domain isolation alone is a good fit when some glitches must be avoided but
  kernel work is still necessary: non critical high volume networking or data
  capture, video games, etc...

* Full isolation is a better fit for ultra low latency requirement, in this case
  the kernel is only good for preparatory work and interface layout between
  userspace and the hardware (VFIO).

  I've observed 3 patterns so far:

    - Low latency networking with DPDK, eg: 5g uRLLC (should be syscalls free)
    - Scientific simulation (not sure about syscalls)
    - HPC computation such as LLM (not sure about syscalls).

Is flushing work only relevant for full isolation? If so I can't say which is
the best solution between flushing pending work on syscall exit and doing that
remotely. But if it's relevant also for domain isolation, then the remote
work is better because it doesn't add unecessary work on syscalls which still
happen in this mode.

At least doing things remotely should be free of any surprising side-effects.
But we must determine how to properly activate the isolated mode (switch to
spinlocks) depending on the isolation mode which can be not only defined
on boot but also on runtime (at least for domain isolation through cpusets
but it will be the case as well with nohz_full in the future).

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

next prev parent reply	other threads:[~2026-02-24 14:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 14:34 Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 1/4] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-02-06 15:20   ` Marcelo Tosatti
2026-02-07  0:16   ` Leonardo Bras
2026-02-11 12:09     ` Marcelo Tosatti
2026-02-14 21:32       ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 2/4] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 3/4] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-02-07  1:06   ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 4/4] slub: " Marcelo Tosatti
2026-02-07  1:27   ` Leonardo Bras
2026-02-06 23:56 ` [PATCH 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2026-02-10 14:01 ` Michal Hocko
2026-02-11 12:01   ` Marcelo Tosatti
2026-02-11 12:11     ` Marcelo Tosatti
2026-02-14 21:35       ` Leonardo Bras
2026-02-11 16:38     ` Michal Hocko
2026-02-11 16:50       ` Marcelo Tosatti
2026-02-11 16:59         ` Vlastimil Babka
2026-02-11 17:07         ` Michal Hocko
2026-02-14 22:02       ` Leonardo Bras
2026-02-16 11:00         ` Michal Hocko
2026-02-19 15:27           ` Marcelo Tosatti
2026-02-19 19:30             ` Michal Hocko
2026-02-20 14:30               ` Marcelo Tosatti
2026-02-23  9:18                 ` Michal Hocko
2026-02-23 21:56               ` Frederic Weisbecker
2026-02-24 17:23                 ` Marcelo Tosatti
2026-02-20 10:48             ` Vlastimil Babka
2026-02-20 12:31               ` Michal Hocko
2026-02-20 17:35               ` Marcelo Tosatti
2026-02-20 17:58                 ` Vlastimil Babka
2026-02-20 19:01                   ` Marcelo Tosatti
2026-02-23  9:11                     ` Michal Hocko
2026-02-23 11:20                       ` Marcelo Tosatti
2026-02-24 14:40                 ` Frederic Weisbecker [this message]
2026-02-24 18:12                   ` Marcelo Tosatti
2026-02-20 16:51           ` Marcelo Tosatti
2026-02-20 16:55             ` Marcelo Tosatti
2026-02-20 22:38               ` Leonardo Bras
2026-02-23 18:09               ` Vlastimil Babka
2026-02-20 21:58           ` Leonardo Bras
2026-02-23  9:06             ` Michal Hocko
2026-02-19 13:15       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ24eAiQpo64-0Kz@pavilion.home \
    --to=frederic@kernel.org \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=fweisbecker@suse.de \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=leobras.c@gmail.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=llong@redhat.com \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=mtosatti@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox