linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: oleg@redhat.com
Cc: brauner@kernel.org, linux-kernel@vger.kernel.org,
	 akpm@linux-foundation.org, linux-mm@kvack.org,
	willy@infradead.org
Subject: Re: [PATCH v3 0/2] further damage-control lack of clone scalability
Date: Sun, 18 Jan 2026 13:51:48 +0100	[thread overview]
Message-ID: <CAGudoHETe9=un510HBh6-rwwyoE+qHYNLoxDK8-suTgbTfN++w@mail.gmail.com> (raw)
In-Reply-To: <20251206131955.780557-1-mjguzik@gmail.com>

On Sat, Dec 6, 2025 at 2:20 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> When spawning and killing threads in separate processes in parallel the
> primary bottleneck on the stock kernel is pidmap_lock, largely because
> of a back-to-back acquire in the common case.
>
> Benchmark code at the end.
>
> With this patchset alloc_pid() only takes the lock once and consequently
> alleviates the problem. While scalability improves, the lock remains the
> primary bottleneck by a large margin.
>
> I believe idr is a poor choice for the task at hand to begin with, but
> sorting out that out beyond the scope of this patchset. At the same time
> any replacement would be best evaluated against a state where the
> above relock problem is fixed.
>
> Performance improvement varies between reboots. When benchmarking with
> 20 processes creating and killing threads in a loop, the unpatched
> baseline hovers around 465k ops/s, while patched is anything between
> ~510k ops/s and ~560k depending on false-sharing (which I only minimally
> sanitized). So this is at least 10% if you are unlucky.
>

I had another look at the problem concerning steps after this patchset.

I found the primary problem is pidfs support -- commenting it out
gives me about 40% boost, afterwards top of the profile is cgroups
with its 3 lock acquires per thread life cycle:

@[
    __pv_queued_spin_lock_slowpath+1
    _raw_spin_lock_irqsave+45
    cgroup_task_dead+33
    finish_task_switch.isra.0+555
    schedule_tail+11
    ret_from_fork+27
    ret_from_fork_asm+26
]: 2550200
@[
    __pv_queued_spin_lock_slowpath+1
    _raw_spin_lock_irq+38
    cgroup_post_fork+57
    copy_process+5993
    kernel_clone+148
    __do_sys_clone3+188
    do_syscall_64+78
    entry_SYSCALL_64_after_hwframe+118
]: 3486368
@[
    __pv_queued_spin_lock_slowpath+1
    _raw_spin_lock_irq+38
    cgroup_can_fork+110
    copy_process+4940
    kernel_clone+148
    __do_sys_clone3+188
    do_syscall_64+78
    entry_SYSCALL_64_after_hwframe+118
]: 3487665

currently the pidfs thing is implemented with a red black tree.

Whatever the replacement it should be faster and have its own
non-global locking.

I don't know what's available in the kernel to deal with it instead.
Is it rhashtable?

I would not mind whatsoever if someone else dealt with it. :-)

> bench from will-it-scale:
>
> #include <assert.h>
> #include <pthread.h>
>
> char *testcase_description = "Thread creation and teardown";
>
> static void *worker(void *arg)
> {
>         return (NULL);
> }
>
> void testcase(unsigned long long *iterations, unsigned long nr)
> {
>         pthread_t thread[1];
>         int error;
>
>         while (1) {
>                 for (int i = 0; i < 1; i++) {
>                         error = pthread_create(&thread[i], NULL, worker, NULL);
>                         assert(error == 0);
>                 }
>                 for (int i = 0; i < 1; i++) {
>                         error = pthread_join(thread[i], NULL);
>                         assert(error == 0);
>                 }
>                 (*iterations)++;
>         }
> }
>
> v3:
> - fix some whitespace and one typo
> - slightly reword the ENOMEM comment
> - move i-- in the first loop towards the end for consistency with the
>   other loop
> - 2 extra unlikely for initial error conditions
>
> I retained Oleg's r-b as the changes don't affect behavior
>
> v2:
> - cosmetic fixes from Oleg
> - drop idr_preload_many, relock pidmap + call idr_preload again instead
> - write a commit message
>
> Mateusz Guzik (2):
>   ns: pad refcount
>   pid: only take pidmap_lock once on alloc
>
>  include/linux/ns/ns_common_types.h |   4 +-
>  kernel/pid.c                       | 134 ++++++++++++++++++-----------
>  2 files changed, 89 insertions(+), 49 deletions(-)
>
> --
> 2.48.1
>


  parent reply	other threads:[~2026-01-18 12:52 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-06 13:19 Mateusz Guzik
2025-12-06 13:19 ` [PATCH v3 1/2] ns: pad refcount Mateusz Guzik
2025-12-06 13:19 ` [PATCH v3 2/2] pid: only take pidmap_lock once on alloc Mateusz Guzik
2025-12-07  7:21   ` Matthew Wilcox
2025-12-07  9:21     ` Mateusz Guzik
2025-12-07 10:37       ` David Laight
2025-12-16 17:09   ` Tycho Andersen
2025-12-16 18:19     ` Mateusz Guzik
2026-01-18 12:51 ` Mateusz Guzik [this message]
2026-01-19 11:09 ` [PATCH v3 0/2] further damage-control lack of clone scalability Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGudoHETe9=un510HBh6-rwwyoE+qHYNLoxDK8-suTgbTfN++w@mail.gmail.com' \
    --to=mjguzik@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox