From: Mateusz Guzik <mjguzik@gmail.com>
To: oleg@redhat.com
Cc: brauner@kernel.org, linux-kernel@vger.kernel.org,
akpm@linux-foundation.org, linux-mm@kvack.org,
willy@infradead.org
Subject: Re: [PATCH v3 0/2] further damage-control lack of clone scalability
Date: Sun, 18 Jan 2026 13:51:48 +0100 [thread overview]
Message-ID: <CAGudoHETe9=un510HBh6-rwwyoE+qHYNLoxDK8-suTgbTfN++w@mail.gmail.com> (raw)
In-Reply-To: <20251206131955.780557-1-mjguzik@gmail.com>
On Sat, Dec 6, 2025 at 2:20 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> When spawning and killing threads in separate processes in parallel the
> primary bottleneck on the stock kernel is pidmap_lock, largely because
> of a back-to-back acquire in the common case.
>
> Benchmark code at the end.
>
> With this patchset alloc_pid() only takes the lock once and consequently
> alleviates the problem. While scalability improves, the lock remains the
> primary bottleneck by a large margin.
>
> I believe idr is a poor choice for the task at hand to begin with, but
> sorting out that out beyond the scope of this patchset. At the same time
> any replacement would be best evaluated against a state where the
> above relock problem is fixed.
>
> Performance improvement varies between reboots. When benchmarking with
> 20 processes creating and killing threads in a loop, the unpatched
> baseline hovers around 465k ops/s, while patched is anything between
> ~510k ops/s and ~560k depending on false-sharing (which I only minimally
> sanitized). So this is at least 10% if you are unlucky.
>
I had another look at the problem concerning steps after this patchset.
I found the primary problem is pidfs support -- commenting it out
gives me about 40% boost, afterwards top of the profile is cgroups
with its 3 lock acquires per thread life cycle:
@[
__pv_queued_spin_lock_slowpath+1
_raw_spin_lock_irqsave+45
cgroup_task_dead+33
finish_task_switch.isra.0+555
schedule_tail+11
ret_from_fork+27
ret_from_fork_asm+26
]: 2550200
@[
__pv_queued_spin_lock_slowpath+1
_raw_spin_lock_irq+38
cgroup_post_fork+57
copy_process+5993
kernel_clone+148
__do_sys_clone3+188
do_syscall_64+78
entry_SYSCALL_64_after_hwframe+118
]: 3486368
@[
__pv_queued_spin_lock_slowpath+1
_raw_spin_lock_irq+38
cgroup_can_fork+110
copy_process+4940
kernel_clone+148
__do_sys_clone3+188
do_syscall_64+78
entry_SYSCALL_64_after_hwframe+118
]: 3487665
currently the pidfs thing is implemented with a red black tree.
Whatever the replacement it should be faster and have its own
non-global locking.
I don't know what's available in the kernel to deal with it instead.
Is it rhashtable?
I would not mind whatsoever if someone else dealt with it. :-)
> bench from will-it-scale:
>
> #include <assert.h>
> #include <pthread.h>
>
> char *testcase_description = "Thread creation and teardown";
>
> static void *worker(void *arg)
> {
> return (NULL);
> }
>
> void testcase(unsigned long long *iterations, unsigned long nr)
> {
> pthread_t thread[1];
> int error;
>
> while (1) {
> for (int i = 0; i < 1; i++) {
> error = pthread_create(&thread[i], NULL, worker, NULL);
> assert(error == 0);
> }
> for (int i = 0; i < 1; i++) {
> error = pthread_join(thread[i], NULL);
> assert(error == 0);
> }
> (*iterations)++;
> }
> }
>
> v3:
> - fix some whitespace and one typo
> - slightly reword the ENOMEM comment
> - move i-- in the first loop towards the end for consistency with the
> other loop
> - 2 extra unlikely for initial error conditions
>
> I retained Oleg's r-b as the changes don't affect behavior
>
> v2:
> - cosmetic fixes from Oleg
> - drop idr_preload_many, relock pidmap + call idr_preload again instead
> - write a commit message
>
> Mateusz Guzik (2):
> ns: pad refcount
> pid: only take pidmap_lock once on alloc
>
> include/linux/ns/ns_common_types.h | 4 +-
> kernel/pid.c | 134 ++++++++++++++++++-----------
> 2 files changed, 89 insertions(+), 49 deletions(-)
>
> --
> 2.48.1
>
next prev parent reply other threads:[~2026-01-18 12:52 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 13:19 Mateusz Guzik
2025-12-06 13:19 ` [PATCH v3 1/2] ns: pad refcount Mateusz Guzik
2025-12-06 13:19 ` [PATCH v3 2/2] pid: only take pidmap_lock once on alloc Mateusz Guzik
2025-12-07 7:21 ` Matthew Wilcox
2025-12-07 9:21 ` Mateusz Guzik
2025-12-07 10:37 ` David Laight
2025-12-16 17:09 ` Tycho Andersen
2025-12-16 18:19 ` Mateusz Guzik
2026-01-18 12:51 ` Mateusz Guzik [this message]
2026-01-19 11:09 ` [PATCH v3 0/2] further damage-control lack of clone scalability Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGudoHETe9=un510HBh6-rwwyoE+qHYNLoxDK8-suTgbTfN++w@mail.gmail.com' \
--to=mjguzik@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=oleg@redhat.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox