Re: [PATCH 0/4] Introduce QPW for per-cpu operations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.com>, Michal Hocko <mhocko@suse.com>,
	Leonardo Bras <leobras.c@gmail.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Leonardo Bras <leobras@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Frederic Weisbecker <fweisbecker@suse.de>,
	Waiman Long <llong@redhat.com>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
Date: Tue, 24 Feb 2026 15:12:32 -0300	[thread overview]
Message-ID: <aZ3qEHzgI8Zuv7IU@tpad> (raw)
In-Reply-To: <aZ24eAiQpo64-0Kz@pavilion.home>

On Tue, Feb 24, 2026 at 03:40:56PM +0100, Frederic Weisbecker wrote:
> Le Fri, Feb 20, 2026 at 02:35:41PM -0300, Marcelo Tosatti a écrit :
> > 
> > I am not sure its safe to assume that. Ask Gemini about isolcpus use
> 
> Erm... ok fine let's see that :-)
> 
> > cases and:
> > 
> > 1. High-Frequency Trading (HFT)
> > In the world of HFT, microseconds are the difference between profit and loss. 
> > Traders use isolcpus to pin their execution engines to specific cores.
> > 
> > The Goal: Eliminate "jitter" caused by the OS moving other processes onto the same core.
> > 
> > The Benefit: Guaranteed execution time and ultra-low latency.
> 
> That would be full isolation (aka nohz_full) because the goal here is to beat
> the competitors. As such the software latency must tend toward hardware latency.
> 
> I wouldn't expect any syscall here but a full userspace stack with DPDK for
> example.
> 
> I put that in the 5g uRLLC (or similar low latency networking) usecase family.
> 
> > 
> > 2. Real-Time Audio & Video Processing
> > If you are running a Digital Audio Workstation (DAW) or a live video encoding rig, a tiny "hiccup" in CPU availability results in an audible pop or a dropped frame.
> > 
> > The Goal: Reserve cores specifically for the Digital Signal Processor (DSP) or the encoder.
> > 
> > The Benefit: Smooth, glitch-free media streams even when the rest of the
> > system is busy.
> 
> Here I expect weaker isolation requirements with syscalls involved. Scheduler
> domain isolation alone (aka isolcpus=[domain]) would fit.
> 
> > 
> > 3. Network Function Virtualization (NFV) & DPDK
> > For high-speed networking (like 10Gbps+ traffic), the Data Plane Development Kit (DPDK) uses "poll mode" drivers. These drivers constantly loop to check for new packets rather than waiting for interrupts.
> > 
> > The Goal: Isolate cores so they can run at 100% utilization just checking for network packets.
> > 
> > The Benefit: Maximum throughput and zero packet loss in high-traffic
> > environments.
> 
> I put that in the 5g uRLLC usecase family as well (again or similar low latency networking).
> 
> > 4. Gaming & Simulation
> > Competitive gamers or flight simulator enthusiasts sometimes isolate a few cores to handle the game's main thread, while leaving the rest of the OS (Discord, Chrome, etc.) to the remaining cores.
> > 
> > The Goal: Prevent background Windows/Linux tasks from stealing cycles from the game engine.
> > 
> > The Benefit: More consistent 1% low FPS and reduced input lag.
> 
> That's domain isolation because frequent syscalls are unavoidable.
> 
> > 
> > 5. Deterministic Scientific Computing
> > If you're running a simulation that needs to take exactly the same amount of time every time it runs (for benchmarking or safety-critical testing), you can't have the OS interference messing with your metrics.
> > 
> > The Goal: Remove the variability of the Linux scheduler.
> > 
> > The Benefit: Highly repeatable, deterministic results.
> 
> I guess here there are plenty of flavours. The only one I know of is this
> power simulator that relies of nohz_full. Not sure about the implementation
> relying on syscalls or not:
> 
> https://dpsim.fein-aachen.org/docs/getting-started/real-time/
> 
> > For example, AF_XDP bypass uses system calls (and wants isolcpus):
> > 
> > https://www.quantvps.com/blog/kernel-bypass-in-hft?srsltid=AfmBOoryeSxuuZjzTJIC9O-Ag8x4gSwjs-V4Xukm2wQpGmwDJ6t4szuE
> 
> That's HFT again and they state that they rely on polling userspace drivers so
> I don't expect syscalls.
> 
> But anyway here is a summary I would propose:
> 
> * Domain isolation alone is a good fit when some glitches must be avoided but
>   kernel work is still necessary: non critical high volume networking or data
>   capture, video games, etc...
> 
> * Full isolation is a better fit for ultra low latency requirement, in this case
>   the kernel is only good for preparatory work and interface layout between
>   userspace and the hardware (VFIO).
> 
>   I've observed 3 patterns so far:
> 
>     - Low latency networking with DPDK, eg: 5g uRLLC (should be syscalls free)
>     - Scientific simulation (not sure about syscalls)
>     - HPC computation such as LLM (not sure about syscalls).
> 
> Is flushing work only relevant for full isolation? If so I can't say which is
> the best solution between flushing pending work on syscall exit and doing that
> remotely. But if it's relevant also for domain isolation, then the remote
> work is better because it doesn't add unecessary work on syscalls which still
> happen in this mode.

Yes, see my last email about HPC.

> At least doing things remotely should be free of any surprising side-effects.
> But we must determine how to properly activate the isolated mode (switch to
> spinlocks) depending on the isolation mode which can be not only defined
> on boot but also on runtime (at least for domain isolation through cpusets
> but it will be the case as well with nohz_full in the future).
> 
> Thanks.

If you boot with remote spinlocks (qpw=1) today, then you can't change
that.

You could, because its a static key:

#define qpw_lock(lock, cpu)                                                             \
        do {                                                                            \
                if (static_branch_maybe(CONFIG_QPW_DEFAULT, &qpw_sl))                   \
                        spin_lock(per_cpu_ptr(lock.sl, cpu));                           \
                else                                                                    \
                        local_lock(lock.ll);                                            \
        } while (0)

But haven't thought about switching on runtime (and don't see why it
would be necessary to switch on runtime). It is independent of 
switching CPUs to/from being isolated (or nohz_full).

OK will address the remaining comments and repost.

next prev parent reply	other threads:[~2026-02-24 18:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 14:34 Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 1/4] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-02-06 15:20   ` Marcelo Tosatti
2026-02-07  0:16   ` Leonardo Bras
2026-02-11 12:09     ` Marcelo Tosatti
2026-02-14 21:32       ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 2/4] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 3/4] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-02-07  1:06   ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 4/4] slub: " Marcelo Tosatti
2026-02-07  1:27   ` Leonardo Bras
2026-02-06 23:56 ` [PATCH 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2026-02-10 14:01 ` Michal Hocko
2026-02-11 12:01   ` Marcelo Tosatti
2026-02-11 12:11     ` Marcelo Tosatti
2026-02-14 21:35       ` Leonardo Bras
2026-02-11 16:38     ` Michal Hocko
2026-02-11 16:50       ` Marcelo Tosatti
2026-02-11 16:59         ` Vlastimil Babka
2026-02-11 17:07         ` Michal Hocko
2026-02-14 22:02       ` Leonardo Bras
2026-02-16 11:00         ` Michal Hocko
2026-02-19 15:27           ` Marcelo Tosatti
2026-02-19 19:30             ` Michal Hocko
2026-02-20 14:30               ` Marcelo Tosatti
2026-02-23  9:18                 ` Michal Hocko
2026-02-23 21:56               ` Frederic Weisbecker
2026-02-24 17:23                 ` Marcelo Tosatti
2026-02-20 10:48             ` Vlastimil Babka
2026-02-20 12:31               ` Michal Hocko
2026-02-20 17:35               ` Marcelo Tosatti
2026-02-20 17:58                 ` Vlastimil Babka
2026-02-20 19:01                   ` Marcelo Tosatti
2026-02-23  9:11                     ` Michal Hocko
2026-02-23 11:20                       ` Marcelo Tosatti
2026-02-24 14:40                 ` Frederic Weisbecker
2026-02-24 18:12                   ` Marcelo Tosatti [this message]
2026-02-20 16:51           ` Marcelo Tosatti
2026-02-20 16:55             ` Marcelo Tosatti
2026-02-20 22:38               ` Leonardo Bras
2026-02-23 18:09               ` Vlastimil Babka
2026-02-20 21:58           ` Leonardo Bras
2026-02-23  9:06             ` Michal Hocko
2026-02-19 13:15       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZ3qEHzgI8Zuv7IU@tpad \
    --to=mtosatti@redhat.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=frederic@kernel.org \
    --cc=fweisbecker@suse.de \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=leobras.c@gmail.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=llong@redhat.com \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox