From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B24C83F11 for ; Sat, 26 Aug 2023 18:33:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B2C18E0003; Sat, 26 Aug 2023 14:33:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 13B7B8D0001; Sat, 26 Aug 2023 14:33:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF73F8E0003; Sat, 26 Aug 2023 14:33:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D94D88D0001 for ; Sat, 26 Aug 2023 14:33:24 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8EBB64030D for ; Sat, 26 Aug 2023 18:33:24 +0000 (UTC) X-FDA: 81167103528.10.7CD529C Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) by imf27.hostedemail.com (Postfix) with ESMTP id D0C5C40012 for ; Sat, 26 Aug 2023 18:33:22 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=hsYzgD0f; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.210.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693074802; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n+z/gBamPq/Xr3Inj2gAsNhcxuw0UOxejNz9OQS7tcU=; b=gcvyRntkcY/1yJEP22FShmwAaVgSMG3NbZn9E+ldg4IGcAjZCn7jvGzc37xMeGgxwLDXRk n2OVB+PByMqJ39nubzlAlKtJemd8Sqma0MRe0aLL5MkqnJiNku8uF7Yk7zmlUMzEssqs6u r76io3CXdeSHehgBES7XmUjTGwJhzYk= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=hsYzgD0f; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.210.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693074802; a=rsa-sha256; cv=none; b=bchZRt5NMqr4eNbD0gRUTOkqC7LVpdCRm817ujZ0YEvC0RJAtB/JvbS0pPZZJAaUi1csCs 3ca9H+fm/M6HirVE4JU4WFFDwnnb+3PGuA2DQCeaNRAMiruBusMBRoBr+lkdm+Xa5kroKz lBq6maSfBCL6HG5mGjSGWss2O34vynA= Received: by mail-ot1-f51.google.com with SMTP id 46e09a7af769-6bcae8c4072so1355190a34.1 for ; Sat, 26 Aug 2023 11:33:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693074802; x=1693679602; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=n+z/gBamPq/Xr3Inj2gAsNhcxuw0UOxejNz9OQS7tcU=; b=hsYzgD0fMZYMgny5rOU+xzD6RP5D1s2F38QXvrhnvzRommZB7UmjeuTZIbZc2aSfBz Aensp5yuJEBH3NpZpRq+oVRHNvjMsi6e6cnr88VYHGMhKKwBC9ihO4KWCWECAC0eGqvC pUl+mQWkKFxB4f/9jiVAV2CYinGzLGXtwsUhOm3L77X2JNGt36/fI3mSE0Rua+7XSC6n jCqcD/bDSmjK7m1oqwWEJEAOKzbGLVGMW2cNsZ7yWSETZtuO1i8jv9mCXvqIVWWiqTgf MEVff+GyxVnlCLnXAdhSKpM+n6tFUIQ6Y98ruizZ5ZSHUCnmVUMlkfpm7DBq+1n0gGBn 8HYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693074802; x=1693679602; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=n+z/gBamPq/Xr3Inj2gAsNhcxuw0UOxejNz9OQS7tcU=; b=gZq5BKGhXSPnc7hUYL7KiNAQODwBtWge0a6yl7VnDF7RdVEUh+cAxzK9/XUwYUxyGJ fmNpEfLL2c6T5FfQQN4Y7R2Kwujsm6s2ne00+P/GwFkllnXMsoGn6lgN9Ehc7XVaG7NB y3Pguu2umr0u19iUEt2ZBHkeGS4uK2stQ6XNTO9+7GHGGFrztdTyuGluwHL75wIjzX4v LTb9Wi2w0vRCf2ixkzhQ1gj+NOm7F8rAUYcEjpnkfwkourRQu8T8GHJiICHyl8VUAGbX d4DERSvd19pi44kRg/yzgqDXMzWWXDD7IFTzKHsHhDGeg8oKhLdCn4hyS2NnGxDU9NzD JIgQ== X-Gm-Message-State: AOJu0YyEdTG5vkkk2b6o6cl5/lT7EvelY2pDRQ854WVF/u1tkq6QSfl1 f8O9yoFIhxA1KE8nFDDWJxshU6s7AiXLSplUwaU= X-Google-Smtp-Source: AGHT+IFOFXxE3hhZsvvuJJHkNUN5OCtKBU8Ou+XCUZnlS5M9mCThljAPJikRLCLTLXbgwiPMKWbmq28zOcFhFqXqecQ= X-Received: by 2002:a05:6870:46a9:b0:1b3:ef56:270a with SMTP id a41-20020a05687046a900b001b3ef56270amr6121334oap.29.1693074801769; Sat, 26 Aug 2023 11:33:21 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a8a:1141:0:b0:4f0:1250:dd51 with HTTP; Sat, 26 Aug 2023 11:33:21 -0700 (PDT) In-Reply-To: <20230821202829.2163744-1-mjguzik@gmail.com> References: <20230821202829.2163744-1-mjguzik@gmail.com> From: Mateusz Guzik Date: Sat, 26 Aug 2023 20:33:21 +0200 Message-ID: Subject: Re: [PATCH 0/2] execve scalability issues, part 1 To: linux-kernel@vger.kernel.org Cc: dennis@kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org, jack@suse.cz Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: D0C5C40012 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: k8piq8rqroiu3nh9qdygoupreh4znpan X-HE-Tag: 1693074802-391048 X-HE-Meta: U2FsdGVkX1/qYAjMNRzfxBzhxnKGYkX/vkspbrdk9xBLS4wtpF9MICjoTcGo/W2lfI/RsIzWDU8zyUtPVu3CAof1PnIFdNYG/hKVxEljH2C1uIyxKUCxBceA88UaEzPPEnIOVSkdH9r+yg6R66b8RyBmcJ3ZfGwExPcTQP7xMciO3G4a1sbMhROP47Ui21feQR5csPtcruUP4ovwkW3PS3uyRzsFMls23XGDmpQUn+g8icF48Q7NziVqPN34xehqMmH5Rspx+hnvIOXwZwTUMQ4+B2uMjfVrz/IfD8ZaXnR9HmVfdzX+27tgGyPzykN034LEIkIkyhhWNoJL1p9fpNBFQSclQbPTwkJ4tXwjVaDGNW/QMQQFzTtQnjE4xcM87bdxj+XKjW73sRuOxPQjOzLuH2tfcRBzaIFs5oVJlCoQcPdwknKyhg5WiVEx+qdQFvXt6AUwBziUaLdNlK8dykWVmqIyiCtBuBYeGeRbrBIJ45m8fXwfeKXfSEo42W+tmRfXDUMIcc31vx5AQTs8/6mUjC/tZJJs/pZw3G/B4BjQOe/jeXf2R4toULTef7VvFbL9XAjxT5HGJFEB5FaP6lIY2Uut/eKze0+odNEuBQ3SqlDDlS4VafC+jeTfooC2DzGNKfjDzK7Ix94K/h2Ffby3nbrKzxwzaEeh5R8z2+unkk+LwqvBqtbK6PQKvCUu3uvCfY57rlQnos0j0ws8rhcly3zsrjnNvJbO/Xvnl2jr+gtJEv+pPZVsDPt2dqjrrHI25FALYBZvN/H3uxwK5ZHLIZKcGSTdAWGnz2dZfwBBchtyFIBOHMbM/2VJ1atDhPF4J5qBHA/LWxP7k8lJA30sTDHewuCVjEgesk8ShNbfifGplBYP5UXlEU6cpJ92g2tb9jqdIvJa01+vfZ25DrmLcu+r4luFwq3o4haVHoQ72+T8ou1bUtHldJPXB9cmo4IAQ9VkzLzxzOPj7ah JUlpA3qx fzuQThlpz24anHQq+8+jRkQbXXeUvoMzBS3PKA4/65NVyoOG9IqivNANg3Nk7slOULuQ1Sx88NyfXr3uZJduguIiC1N/HBYK/IvucPxXiXEzCjRfSosySdMFsJ85eaIdBH5T3+o16vOjHttpB9ojTv0pel2Qfj3sfPuz6O5QSzg/s/QOfbPfBG7kKsdGEyOsQtVvEPGyVEeEnGIC18eQhVcw+DqYBi1qVx4gL9M4msNzZwO8y4FmzW+83v6TMRR4baD9w5l6NzNLF6tmbKH6p+N64W9/LfqpjNiRX542t46VFKmOXr+fLl0RE+zYnZ2lkKIX7dEf+aGhgS5n/q6JxvcqjR2ArSRnBqG6jTEsWypoTfy5ro0VLHhGRbU6llw0H+44zuvH+6xFVAYbTw6rQAB7+6h7nayRkgOJRDCToczN8Z3geaBApwA1HWilBjj/fxzKwSz++Hy3MocM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/21/23, Mateusz Guzik wrote: > To start I figured I'm going to bench about as friendly case as it gets > -- statically linked *separate* binaries all doing execve in a loop. > > I borrowed the bench from found here: > http://apollo.backplane.com/DFlyMisc/doexec.c > > $ cc -static -O2 -o static-doexec doexec.c > $ ./static-doexec $(nproc) > > It prints a result every second (warning: first line is garbage). > > My test box is temporarily only 26 cores and even at this scale I run > into massive lock contention stemming from back-to-back calls to > percpu_counter_init (and _destroy later). > > While not a panacea, one simple thing to do here is to batch these ops. > Since the term "batching" is already used in the file, I decided to > refer to it as "grouping" instead. > > Even if this code could be patched to dodge these counters, I would > argue a high-traffic alloc/free consumer is only a matter of time so it > makes sense to facilitate it. > > With the fix I get an ok win, to quote from the commit: >> Even at a very modest scale of 26 cores (ops/s): >> before: 133543.63 >> after: 186061.81 (+39%) > So to sum up, a v3 of the patchset is queued up here: https://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu.git/log/?h=for-next For interested I temporarily got my hands on something exceeding the hand watch scale benched above -- a 192-way AMD EPYC 7R13 box (2 sockets x 48 cores x 2 threads). A 6.5 kernel + the patchset only gets south of 140k execs/s when running ./static-doexec 192 According to perf top: 51.04% [kernel] [k] osq_lock 6.82% [kernel] [k] __raw_callee_save___kvm_vcpu_is_preempted 2.98% [kernel] [k] _atomic_dec_and_lock_irqsave 1.62% [kernel] [k] rcu_cblist_dequeue 1.54% [kernel] [k] refcount_dec_not_one 1.51% [kernel] [k] __mod_lruvec_page_state 1.46% [kernel] [k] put_cred_rcu 1.34% [kernel] [k] native_queued_spin_lock_slowpath 0.94% [kernel] [k] srso_alias_safe_ret 0.81% [kernel] [k] memset_orig 0.77% [kernel] [k] unmap_page_range 0.73% [kernel] [k] _compound_head 0.72% [kernel] [k] kmem_cache_free Then bpftrace -e 'kprobe:osq_lock { @[kstack()] = count(); }' shows: @[ osq_lock+1 __mutex_lock_killable_slowpath+19 mutex_lock_killable+62 pcpu_alloc+1219 __alloc_percpu_gfp+18 __percpu_counter_init_many+43 mm_init+727 mm_alloc+78 alloc_bprm+138 do_execveat_common.isra.0+103 __x64_sys_execve+55 do_syscall_64+54 entry_SYSCALL_64_after_hwframe+110 ]: 637370 @[ osq_lock+1 __mutex_lock_killable_slowpath+19 mutex_lock_killable+62 pcpu_alloc+1219 __alloc_percpu+21 mm_init+577 mm_alloc+78 alloc_bprm+138 do_execveat_common.isra.0+103 __x64_sys_execve+55 do_syscall_64+54 entry_SYSCALL_64_after_hwframe+110 ]: 638036 That is per-cpu allocation is still on top at this scale. But more importantly there are *TWO* unrelated back-to-back per-cpu allocs -- one by rss counters and one by mm_alloc_cid. That is to say per-cpu alloc scalability definitely needs to get fixed, I'll ponder about it. -- Mateusz Guzik