From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB86DEE4996 for ; Mon, 21 Aug 2023 22:29:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51C85940017; Mon, 21 Aug 2023 18:29:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CC9294000B; Mon, 21 Aug 2023 18:29:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36C99940017; Mon, 21 Aug 2023 18:29:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 22A3194000B for ; Mon, 21 Aug 2023 18:29:53 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C8F1AB122E for ; Mon, 21 Aug 2023 22:29:52 +0000 (UTC) X-FDA: 81149555424.30.751FD74 Received: from mail-oo1-f51.google.com (mail-oo1-f51.google.com [209.85.161.51]) by imf27.hostedemail.com (Postfix) with ESMTP id 0859C40008 for ; Mon, 21 Aug 2023 22:29:50 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=GLc0jDAV; spf=pass (imf27.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.161.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692656991; a=rsa-sha256; cv=none; b=LKMxDQKm3KHoBD+Ik5hp3ERu6E5NZL1EhNMjZGtrarwFefZm2XMHDaQ84qrOyXH6yNLYeO wZ5nk4huhF3W0fmp/9HQa2R4tn0V7bNgHdN1EHEFNQd+D4kbH5GjUhOi3D5bKSTDaM8plv A03MQHI3jd42ZpPA2myjYh8anTETsuw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=GLc0jDAV; spf=pass (imf27.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.161.51 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692656991; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bByHTAoSMwXQhgutrmKi7vY3xVY9axwow/bBYsVP5PU=; b=Pe8rGXPXiYAF7MBDag5HegZKiG2E0zDW4Did57Ozv+NLPU5ChLPG6XV9fa5Ibr90H8MaWr eOsbzrojw97BM/Ya3P7tRmF8+18bZ6R0yXhQBHP4eQaGFb1a+Z0+0nYS+eY2kSjuMch5OB VpJXjWTJkJCBPBrC6OHRuld3BuR/sl8= Received: by mail-oo1-f51.google.com with SMTP id 006d021491bc7-570deae2594so681496eaf.1 for ; Mon, 21 Aug 2023 15:29:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692656990; x=1693261790; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bByHTAoSMwXQhgutrmKi7vY3xVY9axwow/bBYsVP5PU=; b=GLc0jDAVC+hAEQCGZab9r02F2j9te7aJ1K7varId7cx3zQuR+KC7jUWEtaaHvltpS4 fq5N3Z2urcg5FpCb0jqenUa7vOQSP9kGGWq/8aUsgjAPQRKlTya02jub9hrYIfe6byW4 BB36Y6W1NYTmUZo8c8H42sdZGrclb4yu7y1P1PCtahreIX+PxiP1xtw/NoEGeSoyGR9p L9fWQiDipbXy9O1XirvpMjp6D/2qTfuinCf6YLKOJAbHx9vS1oKJPWnDGp/hb92rJCtb iX74QoGlA0Zt9i+vwhenpUkgP7s/tQc9JMiY72sGmyfIJOngvLpZM/ovoFVRwUc6FQis oqSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692656990; x=1693261790; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bByHTAoSMwXQhgutrmKi7vY3xVY9axwow/bBYsVP5PU=; b=cM/VeqRpFUP539q9ViXov+9G0qw0zDs/wTurO65izRcJQfJsQMQZ4Rz/6Kqek83bv+ TqPXDOLhq8y2IzEWUpjkrWvAs3UZDti++LDylINPRfEw3GbiDwdgWgzkU1zb0FCG9VeG DHBGJEmA0agWuKrFGnNVrfuZGJIb2dpMY5ynKYOssQjR127vvpKUzm0QcPRHFRrFxXpd QkI2Y+ibERPu4UptnF0EqsgaAviBxETSSDLoQAE6Lm4pv2/k9300F2tc0+uEj25zLutI dO+gGFyM6xftWf001HPvItW4IwFxqciLSm3vGam1csdMn1/K/LsqL3ZtjLeSRxcWEOf5 omdA== X-Gm-Message-State: AOJu0YwqU2mkHO+8rpQIKtUp3A1nppkByrlgXsUB+YuSfcTCKLfbLkD1 /wMcneEWmGI/RmuoYhK3Wkztj5C2eiLTMYSGSBk= X-Google-Smtp-Source: AGHT+IFr8rgYrKmJd9wvFbzfrOc9GGx70WJc7FI9jmUVB/CQViecPImO3BRlUxxgt5hBgJh5ngGfClmuqIt7Np+gFNk= X-Received: by 2002:a4a:d1c3:0:b0:570:c0c3:8319 with SMTP id a3-20020a4ad1c3000000b00570c0c38319mr5851057oos.3.1692656989992; Mon, 21 Aug 2023 15:29:49 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac9:6647:0:b0:4f0:1250:dd51 with HTTP; Mon, 21 Aug 2023 15:29:49 -0700 (PDT) In-Reply-To: <20230821213951.bx3yyqh7omdvpyae@f> References: <20230821202829.2163744-1-mjguzik@gmail.com> <20230821213951.bx3yyqh7omdvpyae@f> From: Mateusz Guzik Date: Tue, 22 Aug 2023 00:29:49 +0200 Message-ID: Subject: Re: [PATCH 0/2] execve scalability issues, part 1 To: Dennis Zhou Cc: linux-kernel@vger.kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org, jack@suse.cz Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0859C40008 X-Stat-Signature: ny5dunosffsf11f671mymnmqc44gqjyz X-Rspam-User: X-HE-Tag: 1692656990-495064 X-HE-Meta: U2FsdGVkX1+KPs0ClKQoqXNi+r8oV6crOEdJUDZsNadGrCc8OaGIFpIE+7ATC5N/k7FMbhp0GQsuDbxIeU88kDeQV7BRB3kLQXW+dSLQNw8cLxnYk0uY7EJNALKKh8IcjhuhLyk/iSxR8Q4LiWm7XlxugaFfnzejIlb7C6xT5wNuKB6nSCzeWhwCSp3NNR2+TyIhsFK2zgyj4SUFObW/RZfrEil+leOl33+wjrsYK/Cfe6LWMfo28HFCAwVLJPDvfvv5SWhr/T846WNTyD6gWHER7n1g1LFH6xfQBm4Jp/tDeaM8kbuUJHpNF2zb2RlJRv5Bc44z1I0CbN4ReVOwvxElWPb6q1kI/5+ADjaiK+QD/Qk0nlq7cRlF9sLcgAzc8kNjTJU1TRzqonUSowNeSgNkJLMA6GuKyafiSU2zEZTPmuwiehHuPAAFEWtLbi9C0mXfB2U2siNLzniDj01Go7+8jVQ67Zj1/xucULDqahr3E5w6+h1Oormg8ydf3D4cOENBOaE0+qmfd6IU1ZYt1yIgJWB1Ht2YScKCFPTJ+1oZkUN6hBIyHDjg99vafbsTTErzShH/AwOniWbGzFoqbss5IQXRLmUxnPlFUaJ/PQDM7EAirjVT2yaKZRlJ/U91bdmpxc+0ybR9ns96834VKey2LRqDe0gLas7izgeFFf1Q8tdrufolkWpFgR5n3KvBZP1K4LkHN5bWGWq3aU1aLo0fNyN7LbKf4VrgJgjp1JLPr7Pnvu9aYALNpngnD4zIBoGC6lA6QgMJhGYuW/eNvfP5MfGxdrKtkaVOenxJoOhfnK6xEzV5g9Lr2dabrCpd0ScUUKi26ZLWwyjL41OAy1TGJsbI5cpXQb2pt4Zy1SPR9LdQ19zrx8npZjtG55IvNue+cTwluENrd7gwmxjD7NWYpKjyvx61W3ImuyebboYLgLWz7w3WIchAC3zCwen163Zl7zxpiXiODZE6AE1 Auwrr0C7 wpvLPyj7qq/BNkJ/T6v/hFv6jcvzmK15qm3gqN/jRSUhoNy3xbRCcn5C/W/gfR/OkGj6rX5HX2SH0Wspcz7u4cCaQCFihH2sFNxgR2M2dO2uXDeIzxhgORyv6hEVvmHAKq6ksMri71fQOLXGL/6njKFwIHeKRO62LSW4NiZKd1+shpS7PwIvWDh1kr2s22ili/IY9HzdFyvWGxI+lFexFuUyP32PbtBsTKPniQnSTDdSDZe997UzK+clXOrrWl2fNGILJRf4CDY0Te/sY0Aa3hFnlNQrcnPl+T+e8/vAxDD2msWKLmfu7kNPhCEbwZSSrNa9Dhp7gv+NQI0J1Bm6HUxCzpUBPDOkCeJBKm+crD0xFe7rc67QAaJC/PRBj19L6rnowJdx8pXjvHkYnKPq959pPnf/FLTczm1Qp/cQ6nexcMhgK7OdXnjEbrQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/21/23, Mateusz Guzik wrote: > On Mon, Aug 21, 2023 at 02:07:28PM -0700, Dennis Zhou wrote: >> On Mon, Aug 21, 2023 at 10:28:27PM +0200, Mateusz Guzik wrote: >> > With this out of the way I'll be looking at some form of caching to >> > eliminate these allocs as a problem. >> > >> >> I'm not against caching, this is just my first thought. Caching will >> have an impact on the backing pages of percpu. All it takes is 1 >> allocation on a page for the current allocator to pin n pages of memory. >> A few years ago percpu depopulation was implemented so that limits the >> amount of resident backing pages. >> > > I'm painfully aware. > >> Maybe the right thing to do is preallocate pools of common sized >> allocations so that way they can be recycled such that we don't have to >> think too hard about fragmentation that can occur if we populate these >> pools over time? >> > > This is what I was going to suggest :) > > FreeBSD has a per-cpu allocator which pretends to be the same as the > slab allocator, except handing out per-cpu bufs. So far it has sizes 4, > 8, 16, 32 and 64 and you can act as if you are mallocing in that size. > > Scales perfectly fine of course since it caches objs per-CPU, but there > is some waste and I have 0 idea how it compares to what Linux is doing > on that front. > > I stress though that even if you were to carve out certain sizes, a > global lock to handle ops will still kill scalability. > > Perhaps granularity better than global, but less than per-CPU would be a > sweet spot for scalabability vs memory waste. > > That said... > >> Also as you've pointed out, it wasn't just the percpu allocation being >> the bottleneck, but percpu_counter's global lock too for hotplug >> support. I'm hazarding a guess most use cases of percpu might have >> additional locking requirements too such as percpu_counter. >> > > True Fix(tm) is a longer story. > > Maybe let's sort out this patchset first, whichever way. :) > So I found the discussion around the original patch with a perf regression report. https://lore.kernel.org/linux-mm/20230608111408.s2minsenlcjow7q3@quack3/ The reporter suggests dodging the problem by only allocating per-cpu counters when the process is going multithreaded. Given that there is still plenty of forever single-threaded procs out there I think that's does sound like a great plan regardless of what happens with this patchset. Almost all access is already done using dedicated routines, so this should be an afternoon churn to sort out, unless I missed a showstopper. (maybe there is no good place to stuff a flag/whatever other indicator about the state of counters?) That said I'll look into it some time this or next week. >> Thanks, >> Dennis >> >> > Thoughts? >> > >> > Mateusz Guzik (2): >> > pcpcntr: add group allocation/free >> > fork: group allocation of per-cpu counters for mm struct >> > >> > include/linux/percpu_counter.h | 19 ++++++++--- >> > kernel/fork.c | 13 ++------ >> > lib/percpu_counter.c | 61 ++++++++++++++++++++++++---------- >> > 3 files changed, 60 insertions(+), 33 deletions(-) >> > >> > -- >> > 2.39.2 >> > > -- Mateusz Guzik