From: Frederic Weisbecker <frederic@kernel.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>,
Leonardo Bras <leobras.c@gmail.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org, Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Vlastimil Babka <vbabka@suse.cz>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
Leonardo Bras <leobras@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
Frederic Weisbecker <fweisbecker@suse.de>
Subject: Re: [PATCH 0/4] Introduce QPW for per-cpu operations
Date: Tue, 3 Mar 2026 12:08:09 +0100 [thread overview]
Message-ID: <aabBGcuhVaFVwtus@pavilion.home> (raw)
In-Reply-To: <aaAxVbt/DQyTgIs3@tpad>
Le Thu, Feb 26, 2026 at 08:41:09AM -0300, Marcelo Tosatti a écrit :
> On Wed, Feb 25, 2026 at 10:49:54PM +0100, Frederic Weisbecker wrote:
>
> <snip>
>
> > > There are specific parts of a simulation that are intensive, but
> > > researchers try to minimize them:
> > >
> > > I/O Operations: Writing "checkpoints" or large trajectory files to disk
> > > (using write()). This is why high-end HPC systems use Asynchronous I/O
> > > or dedicated I/O nodes—to keep the compute cores from getting bogged
> > > down in system calls.
> > >
> > > Memory Allocation: Constantly calling malloc/free involves the brk or
> > > mmap system calls. Optimized simulation tools pre-allocate all the
> > > memory they need at startup to avoid this.
> >
> > Ok. I asked a similar question and got this (you made me use an LLM for the
> > first time btw, I held out for 4 years... I'm sure I can wait 4 more years until
> > the next usage :o)
>
> You should use it more often, it can save a significant amount of time
> :-)
I fear the earth doesn't have the resources to serve daily use of LLM to us
all. Meanwhile it was a pleasant surprise to see it in action and answer questions
I had to myself for a long while. And I might use it again on the rare occasions
where a simple search engine request doesn't do the job.
> > ### 2. The "Slow Path" (System Calls / Syscalls)
> >
> > Passing through the kernel (a syscall) is necessary in certain situations, but it is "expensive" because it forces a **context switch**, which flushes CPU caches.
> >
> > * **Initialization:** During startup (`MPI_Init`), many syscalls are used to create sockets, map shared memory (`mmap`), and configure network interfaces.
> > * **Standard TCP/IP:** If you are not using a high-performance network (RDMA) but simple Ethernet instead, MPI must call `send()` and `recv()`, which are syscalls. The Linux kernel then takes over to manage the TCP/IP stack.
> > * **Sleep Mode (Blocking):** If an MPI process waits for a message for too long, it may decide to "go to sleep" to yield the CPU to another task via syscalls like `futex()` or `poll()`.
> >
> > **In summary:** MPI synchronization aims to be **100% User-Space** (via memory polling) to avoid syscall latency. It is precisely because MPI tries to bypass the kernel that we use `nohz_full`: we are asking the kernel not to even "knock on the CPU's door" with its clock interruptions.
>
> Of course, there is a cost to system calls. However, considering
> "low latency applications must necessarily remain in userspace,
> therefore lets optimize only for that case" is limiting IMHO.
>
> Should avoid interruptions whenever possible, for isolated CPUs
> (in userspace _and_ kernelspace).
Very low latency requirements really should bend toward full userspace.
But you're right that isolation (even full with nohz_full) should probably
not be limited to that. HPC shows such a usecase where the workload is not
perfectly isolated and yet nohz_full brings improvements.
Thanks.
--
Frederic Weisbecker
SUSE Labs
next prev parent reply other threads:[~2026-03-03 11:08 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-06 14:34 Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 1/4] Introducing qpw_lock() and per-cpu queue & flush work Marcelo Tosatti
2026-02-06 15:20 ` Marcelo Tosatti
2026-02-07 0:16 ` Leonardo Bras
2026-02-11 12:09 ` Marcelo Tosatti
2026-02-14 21:32 ` Leonardo Bras
2026-02-06 14:34 ` [PATCH 2/4] mm/swap: move bh draining into a separate workqueue Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 3/4] swap: apply new queue_percpu_work_on() interface Marcelo Tosatti
2026-02-07 1:06 ` Leonardo Bras
2026-02-26 15:49 ` Marcelo Tosatti
2026-02-06 14:34 ` [PATCH 4/4] slub: " Marcelo Tosatti
2026-02-07 1:27 ` Leonardo Bras
2026-02-06 23:56 ` [PATCH 0/4] Introduce QPW for per-cpu operations Leonardo Bras
2026-02-10 14:01 ` Michal Hocko
2026-02-11 12:01 ` Marcelo Tosatti
2026-02-11 12:11 ` Marcelo Tosatti
2026-02-14 21:35 ` Leonardo Bras
2026-02-11 16:38 ` Michal Hocko
2026-02-11 16:50 ` Marcelo Tosatti
2026-02-11 16:59 ` Vlastimil Babka
2026-02-11 17:07 ` Michal Hocko
2026-02-14 22:02 ` Leonardo Bras
2026-02-16 11:00 ` Michal Hocko
2026-02-19 15:27 ` Marcelo Tosatti
2026-02-19 19:30 ` Michal Hocko
2026-02-20 14:30 ` Marcelo Tosatti
2026-02-23 9:18 ` Michal Hocko
2026-03-03 10:55 ` Frederic Weisbecker
2026-02-23 21:56 ` Frederic Weisbecker
2026-02-24 17:23 ` Marcelo Tosatti
2026-02-25 21:49 ` Frederic Weisbecker
2026-02-26 7:06 ` Michal Hocko
2026-02-26 11:41 ` Marcelo Tosatti
2026-03-03 11:08 ` Frederic Weisbecker [this message]
2026-02-20 10:48 ` Vlastimil Babka
2026-02-20 12:31 ` Michal Hocko
2026-02-20 17:35 ` Marcelo Tosatti
2026-02-20 17:58 ` Vlastimil Babka
2026-02-20 19:01 ` Marcelo Tosatti
2026-02-23 9:11 ` Michal Hocko
2026-02-23 11:20 ` Marcelo Tosatti
2026-02-24 14:40 ` Frederic Weisbecker
2026-02-24 18:12 ` Marcelo Tosatti
2026-02-20 16:51 ` Marcelo Tosatti
2026-02-20 16:55 ` Marcelo Tosatti
2026-02-20 22:38 ` Leonardo Bras
2026-02-23 18:09 ` Vlastimil Babka
2026-02-26 18:24 ` Marcelo Tosatti
2026-02-20 21:58 ` Leonardo Bras
2026-02-23 9:06 ` Michal Hocko
2026-02-28 1:23 ` Leonardo Bras
2026-03-03 0:19 ` Marcelo Tosatti
2026-02-19 13:15 ` Marcelo Tosatti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aabBGcuhVaFVwtus@pavilion.home \
--to=frederic@kernel.org \
--cc=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=cgroups@vger.kernel.org \
--cc=cl@linux.com \
--cc=fweisbecker@suse.de \
--cc=hannes@cmpxchg.org \
--cc=iamjoonsoo.kim@lge.com \
--cc=leobras.c@gmail.com \
--cc=leobras@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mhocko@suse.com \
--cc=mtosatti@redhat.com \
--cc=muchun.song@linux.dev \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox