Re: [PATCH 0/4] Introduce QPW for per-cpu operations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Marcelo Tosatti <mtosatti@redhat.com>
To: Vlastimil Babka <vbabka@suse.com>
Cc: Michal Hocko <mhocko@suse.com>,
	Leonardo Bras <leobras.c@gmail.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	Leonardo Bras <leobras@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Frederic Weisbecker <fweisbecker@suse.de>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
Date: Fri, 20 Feb 2026 16:01:59 -0300	[thread overview]
Message-ID: <aZivpwJnIGKdAMYE@tpad> (raw)
In-Reply-To: <a1c11a09-da88-4edd-9571-0f792b59e9c3@suse.com>

On Fri, Feb 20, 2026 at 06:58:10PM +0100, Vlastimil Babka wrote:
> On 2/20/26 18:35, Marcelo Tosatti wrote:
> > 
> > Only call rcu_free_sheaf_nobarn if pcs->rcu_free is not NULL.
> > 
> > So it seems safe?
> 
> I guess it is.
> 
> >> How would this work with houskeeping on return to userspace approach?
> >> 
> >> - Would we just walk the list of all caches to flush them? could be
> >> expensive. Would we somehow note only those that need it? That would make
> >> the fast paths do something extra?
> >> 
> >> - If some other CPU executed kmem_cache_destroy(), it would have to wait for
> >> the isolated cpu returning to userspace. Do we have the means for
> >> synchronizing on that? Would that risk a deadlock? We used to have a
> >> deferred finishing of the destroy for other reasons but were glad to get rid
> >> of it when it was possible, now it might be necessary to revive it?
> > 
> > I don't think you can expect system calls to return to userspace in 
> > a given amount of time. Could be in kernel mode for long periods of
> > time.
> > 
> >> How would this work with QPW?
> >> 
> >> - probably fast paths more expensive due to spin lock vs local_trylock_t
> >> 
> >> - flush_rcu_sheaves_on_cache() needs to be solved safely (see above)
> >> 
> >> What if we avoid percpu sheaves completely on isolated cpus and instead
> >> allocate/free using the slowpaths?
> >> 
> >> - It could probably be achieved without affecting fastpaths, as we already
> >> handle bootstrap without sheaves, so it's implemented in a way to not affect
> >> fastpaths.
> >> 
> >> - Would it slow the isolcpu workloads down too much when they do a syscall?
> >>   - compared to "houskeeping on return to userspace" flushing, maybe not?
> >> Because in that case the syscall starts with sheaves flushed from previous
> >> return, it has to do something expensive to get the initial sheaf, then
> >> maybe will use only on or few objects, then on return has to flush
> >> everything. Likely the slowpath might be faster, unless it allocates/frees
> >> many objects from the same cache.
> >>   - compared to QPW - it would be slower as QPW would mostly retain sheaves
> >> populated, the need for flushes should be very rare
> >> 
> >> So if we can assume that workloads on isolated cpus make syscalls only
> >> rarely, and when they do they can tolerate them being slower, I think the
> >> "avoid sheaves on isolated cpus" would be the best way here.
> > 
> > I am not sure its safe to assume that. Ask Gemini about isolcpus use
> > cases and:
> 
> I don't think it's answering the question about syscalls. But didn't read
> too closely given the nature of it.

People use isolcpus with all kinds of programs. 

> > For example, AF_XDP bypass uses system calls (and wants isolcpus):
> > 
> > https://www.quantvps.com/blog/kernel-bypass-in-hft?srsltid=AfmBOoryeSxuuZjzTJIC9O-Ag8x4gSwjs-V4Xukm2wQpGmwDJ6t4szuE
> 
> Didn't spot system calls mentioned TBH.

I don't see why you want to reduce performance of applications that 
execute on isolcpus=, if you can avoid that.

Also, won't bypassing the per-CPU caches increase contention on the 
global locks, say kmem_cache_node->list_lock.

But if you prefer disabling the per-CPU caches for isolcpus
(or a separate option other than isolcpus), then see if 
people complain about that... works for me.

Two examples:

1)

https://github.com/xdp-project/bpf-examples/blob/main/AF_XDP-example/README.org

Busy-Poll mode
Busy-poll mode. In this mode both the application and the driver can be run efficiently on the same core. The kernel driver is explicitly invoked by the application by calling either recvmsg() or sendto(). Invoke this by setting the -B option. The -b option can be used to set the batch size that the driver will use. For example:

sudo taskset -c 2 ./xdpsock -i <interface> -q 2 -l -N -B -b 256

2)

https://vstinner.github.io/journey-to-stable-benchmark-system.html

Example of effect of CPU isolation on a microbenchmark
Example with Linux parameters:

isolcpus=2,3,6,7 nohz_full=2,3,6,7
Microbenchmark on an idle system (without CPU isolation):

$ python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 229 msec per loop
Result on a busy system using system_load.py 10 and find / commands running in other terminals:

$ python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 372 msec per loop
The microbenchmark is 56% slower because of the high system load!

Result on the same busy system but using isolated CPUs. The taskset command allows to pin an application to specific CPUs:

$ taskset -c 1,3 python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 230 msec per loop
Just to check, new run without CPU isolation:

$ python3 -m timeit 'sum(range(10**7))'
10 loops, best of 3: 357 msec per loop
The result with CPU isolation on a busy system is the same than the result an idle system! CPU isolation removes most of the noise of the system.

next prev parent reply	other threads:[~2026-02-20 19:02 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 14:34 Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 1/4] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-02-06 15:20   ` Marcelo Tosatti
2026-02-07  0:16   ` Leonardo Bras
2026-02-11 12:09     ` Marcelo Tosatti
2026-02-14 21:32       ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 2/4] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 3/4] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-02-07  1:06   ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 4/4] slub: " Marcelo Tosatti
2026-02-07  1:27   ` Leonardo Bras
2026-02-06 23:56 ` [PATCH 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2026-02-10 14:01 ` Michal Hocko
2026-02-11 12:01   ` Marcelo Tosatti
2026-02-11 12:11     ` Marcelo Tosatti
2026-02-14 21:35       ` Leonardo Bras
2026-02-11 16:38     ` Michal Hocko
2026-02-11 16:50       ` Marcelo Tosatti
2026-02-11 16:59         ` Vlastimil Babka
2026-02-11 17:07         ` Michal Hocko
2026-02-14 22:02       ` Leonardo Bras
2026-02-16 11:00         ` Michal Hocko
2026-02-19 15:27           ` Marcelo Tosatti
2026-02-19 19:30             ` Michal Hocko
2026-02-20 14:30               ` Marcelo Tosatti
2026-02-20 10:48             ` Vlastimil Babka
2026-02-20 12:31               ` Michal Hocko
2026-02-20 17:35               ` Marcelo Tosatti
2026-02-20 17:58                 ` Vlastimil Babka
2026-02-20 19:01                   ` Marcelo Tosatti [this message]
2026-02-20 16:51           ` Marcelo Tosatti
2026-02-20 16:55             ` Marcelo Tosatti
2026-02-20 22:38               ` Leonardo Bras
2026-02-20 21:58           ` Leonardo Bras
2026-02-19 13:15       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZivpwJnIGKdAMYE@tpad \
    --to=mtosatti@redhat.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=fweisbecker@suse.de \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=leobras.c@gmail.com \
    --cc=leobras@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox