From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DCC7EE4993 for ; Wed, 23 Aug 2023 12:13:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94976900017; Wed, 23 Aug 2023 08:13:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F94A8E0011; Wed, 23 Aug 2023 08:13:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C1FF900017; Wed, 23 Aug 2023 08:13:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6E6108E0011 for ; Wed, 23 Aug 2023 08:13:25 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3FB74C0614 for ; Wed, 23 Aug 2023 12:13:25 +0000 (UTC) X-FDA: 81155259570.01.D451274 Received: from mail-oa1-f45.google.com (mail-oa1-f45.google.com [209.85.160.45]) by imf27.hostedemail.com (Postfix) with ESMTP id 7DCDC40014 for ; Wed, 23 Aug 2023 12:13:22 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=OEgf0Z3O; spf=pass (imf27.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.160.45 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692792802; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V5n8dcsSkaPczsJ91/cZeyLOa5xNAMTik18nOZGKjWM=; b=lJ9QW3TF5VIzUEtNeeCBY983RrmrfVd5kxTdTjp69e8iQU08nGnlxb+RXVypi2zOtusjII dsWZDr078pvsvosakP9WhGnRv7yVa2rwdKcuUvLW3JOk/zGgXKQxKBv8mLWXJ1KbIYRSMm rJz3BEsYPRD8wDYXVQocRc0F+Zql1ww= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692792802; a=rsa-sha256; cv=none; b=WG0yfNh/VwqAkGXmLxvbU3fajYbPVRCWZMEQQ1DTSAc+ppI7n+X108yDAlSZ0G5+D2QdP1 mrouFOe4u4HaUaNDNzYoxig3hOs5iMXXb79BDaIAwH/GgsjRLzzmfzU7lM31ud6+dhQxnn SFdz6YmgCSYFY5CQfnk2yKNON9kfU0k= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=OEgf0Z3O; spf=pass (imf27.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.160.45 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oa1-f45.google.com with SMTP id 586e51a60fabf-1ccb6a69b13so807721fac.2 for ; Wed, 23 Aug 2023 05:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692792801; x=1693397601; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=V5n8dcsSkaPczsJ91/cZeyLOa5xNAMTik18nOZGKjWM=; b=OEgf0Z3OHDaaXNtAxW3skcnZ59oOFouD7wcjrCygmYINhiNMc8YVggS8fLVuDefEaN MIhoYKNb/NKdbVnTLIfOYQysavScjZMoaXDeB4sWlm6NmDL8KhyY8GcsHXouqkDKLxn0 dgyNisSQhQitQn539mS2FknBWqLQGjUm3tpUbMFupNIFnvUUuqm7ugs4xgBgMU0b/9Mn ddi0W5aFVqYcXV9DJKey+GuVTUIgTCiNOvTfopcy8LcVVwG2278JFuSPrZFK3dB8V9Y8 48ErYdwk08HLUXvsJ1K61DBCvF6by21aDL218hS8PYVVUGEYMGiE5u6BJSZ7Tu72v4tX NLPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692792801; x=1693397601; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=V5n8dcsSkaPczsJ91/cZeyLOa5xNAMTik18nOZGKjWM=; b=LcB5flBOqaTvKUk27bxg2rdv0EGZRpsdSLCN6NVBuJahwMJ2DWBXKXAE9ZaGSBSVvJ CrqkBbbLDRwRg56veMiBv3k/GZDJbhQx4poxdyJDsaFgwu8M3xubuDRvaJkz+37EsNZT G2o6yx8NSwxzYR0IYtCUgbgG8wImdFH4EOaAm9qmPtDtmq+Dk7t8AC/rFkmdVUWVoqhb i8ZjU0I8XPcTxXEWXh06it+4eAvshY/XCM2BAUSxm8vsLxYEzR14iIEbICaPmkaF/NFC vWn52mEYUUSvv7j6OU4Eqe6Hj1wA47k0maHmUqsPPvdMGEW7665ayuvnoFNu9upZgeTy 4lVg== X-Gm-Message-State: AOJu0Ywuzr/rSpfRBdP5XcZVxC0ypgA5g2ME/egehJcdylGwtl4qee1o kb1/bQ5eip7ux6gAZ9O40mp1I1htk9OdpsxTXH4= X-Google-Smtp-Source: AGHT+IGQaEdMlIEr6H7kHKXBL8qte+h+BZNTPdhFpPOR9eOlH9CsTDitZGCEWNydlmq3igWfADFk3sNZNl3lf0hg63g= X-Received: by 2002:a05:6870:9a25:b0:1b0:5b04:ebec with SMTP id fo37-20020a0568709a2500b001b05b04ebecmr14292816oab.59.1692792801408; Wed, 23 Aug 2023 05:13:21 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac9:5797:0:b0:4f0:1250:dd51 with HTTP; Wed, 23 Aug 2023 05:13:20 -0700 (PDT) In-Reply-To: <20230823094915.ggv3spzevgyoov6i@quack3> References: <20230821202829.2163744-1-mjguzik@gmail.com> <20230821213951.bx3yyqh7omdvpyae@f> <20230822095154.7cr5ofogw552z3jk@quack3> <20230823094915.ggv3spzevgyoov6i@quack3> From: Mateusz Guzik Date: Wed, 23 Aug 2023 14:13:20 +0200 Message-ID: Subject: Re: [PATCH 0/2] execve scalability issues, part 1 To: Jan Kara Cc: Dennis Zhou , linux-kernel@vger.kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 7DCDC40014 X-Rspam-User: X-Stat-Signature: i97itr7fjearz1ood6zr69aa436f8km1 X-Rspamd-Server: rspam03 X-HE-Tag: 1692792802-867981 X-HE-Meta: U2FsdGVkX1+8fEOcbZs4LkeSKDjJQKKLs30W779BonWR/bxy3qf7wekIvV/jxme2XE/CtvLj/cynctxvcTOgP0OZfgDivCj/q/8GNJ/Sa4wBlek9bZUeBAwAllhZDdKaReWKsKttk4pMnOLMOS/014jPtFgbBZX/zlntRj17lz5cfw58YrGjxyGz2aAH8heh2SrnUQi6tZiXngQUye2XlW8VaoNVyUJLZ5JXw9v792qOJ5Ksw6Gyk73rcBX2QkPCOoj2QbglOvo3tWIiDsvpMzOR0zU2Owb8DTgJgq/r5PArGit5Sh9fHj3MQ4QqKd4sv33GYwQ8Hv6HKy1SlGrXBm6DG5E4Pn+ae815GV5iRbFgfqixlGYS6JFN3SNUh86QNj1zaVSSGIwbj5NHympFtlAWV3ir+PgrLRE2SXr5nsQBKwv4Pr+BpfoO3t1DgSQIaukcZqaxcyYZODECjb+mQyzna8LZcaw8FgozXNOZ2671r8byUc6LGUJxsUSzXtnfhPb/sDlkGHqc9I1R7ClNKzbk46OC4/08sdd1uq+wrjMANbZbHgJa82eJt6Pg2GcAbKBTAIPq6UG5W+iQkAWBxdn3JkkpsCUnpHd6mm0rvt33f8j8gSpcYD3IE2Lp1v30fCLai6kejFCEfNR8w7OyUJK9qWKYtHkwY0zsWTMVpC0O7FzatS+TaER4YSkcDLbbFZpmiEZFJcJlc7XItcDjjmIcUJAM7Q1clG9KN2JmhEBYSPRL33pBlo41IHGHdQUPtOQwWBB0ZPdqOrebS+sq7CfsfqU7QApEc7GqgHhnbu5oSX/yDemqViIY11qj8X5WH2eP9v/n+W0svtzk8vsUP4NIpK7ZaVYohI6nIL85Xb5oIZQVBrul8PBvqSN0gMrVTNegKUDGthCPSPXnER7tboOOi6QTBWs6nbKSZEv1E2Pe8GKOsWkff5q5ChA13q96zOQYHFZ7aMAAAc1fXVx /eilMhKd Gd6e45DplA4NXlm0++GGNTwn4ISDHn3v7Oh/KGnRjcFVuKcf1ob901V6Hc4StDh4BStFimgtJD3GT3y56McLy6M6fiPyjyW1/9AprGh6ojZpcBwBYbJh8LpI1l5P2llMuRrdCsQ2Q+fZ8Z4EeyhK1Iwu5rbaWo2OwPlw1wUymf1uU0k8XnV1MCr0sAhtsmGCps3WbQARPkRAh+McHDcSGD4lRnyXopFLlFDOWsJBh3O8Icmzb+ZZEIxZsimgvsmu4ttncwq7p8IkgKZ2/6WKOGXnmVmEPB9L5O2kS0CG8eq3nNeCu38UPCROoR5aSaM8NDOmtGKrK2fdUuZFgtxHZcCGTX1LfFj8zwwTozAyw7Tqf/VelJPjoG2EcWQVWo4YvwdqqWCfZJfrz5L62tL1wUKJr/9/3LHxaCvwkkSlXpEeJxn8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/23/23, Jan Kara wrote: > On Tue 22-08-23 16:24:56, Mateusz Guzik wrote: >> On 8/22/23, Jan Kara wrote: >> > On Tue 22-08-23 00:29:49, Mateusz Guzik wrote: >> >> On 8/21/23, Mateusz Guzik wrote: >> >> > True Fix(tm) is a longer story. >> >> > >> >> > Maybe let's sort out this patchset first, whichever way. :) >> >> > >> >> >> >> So I found the discussion around the original patch with a perf >> >> regression report. >> >> >> >> https://lore.kernel.org/linux-mm/20230608111408.s2minsenlcjow7q3@quack3/ >> >> >> >> The reporter suggests dodging the problem by only allocating per-cpu >> >> counters when the process is going multithreaded. Given that there is >> >> still plenty of forever single-threaded procs out there I think that's >> >> does sound like a great plan regardless of what happens with this >> >> patchset. >> >> >> >> Almost all access is already done using dedicated routines, so this >> >> should be an afternoon churn to sort out, unless I missed a >> >> showstopper. (maybe there is no good place to stuff a flag/whatever >> >> other indicator about the state of counters?) >> >> >> >> That said I'll look into it some time this or next week. >> > >> > Good, just let me know how it went, I also wanted to start looking into >> > this to come up with some concrete patches :). What I had in mind was >> > that >> > we could use 'counters == NULL' as an indication that the counter is >> > still >> > in 'single counter mode'. >> > >> >> In the current state there are only pointers to counters in mm_struct >> and there is no storage for them in task_struct. So I don't think >> merely null-checking the per-cpu stuff is going to cut it -- where >> should the single-threaded counters land? > > I think you misunderstood. What I wanted to do it to provide a new flavor > of percpu_counter (sharing most of code and definitions) which would have > an option to start as simple counter (indicated by pcc->counters == NULL > and using pcc->count for counting) and then be upgraded by a call to real > percpu thing. Because I think such counters would be useful also on other > occasions than as rss counters. > Indeed I did -- I had tunnel vision on dodging atomics for current given remote modifications, which wont be done in your proposal. I concede your idea solves the problem at hand, I question whether it is the right to do though. Not my call to make. >> Then for single-threaded case an area is allocated for NR_MM_COUNTERS >> countes * 2 -- first set updated without any synchro by current >> thread. Second set only to be modified by others and protected with >> mm->arg_lock. The lock protects remote access to the union to begin >> with. > > arg_lock seems a bit like a hack. How is it related to rss_stat? The scheme > with two counters is clever but I'm not 100% convinced the complexity is > really worth it. I'm not sure the overhead of always using an atomic > counter would really be measurable as atomic counter ops in local CPU cache > tend to be cheap. Did you try to measure the difference? > arg_lock is not as is, it would have to be renamed to something more generic. Atomics on x86-64 are very expensive to this very day. Here is a sample measurement of 2 atomics showing up done by someone else: https://lore.kernel.org/oe-lkp/202308141149.d38fdf91-oliver.sang@intel.com/T/#u tl;dr it is *really* bad. > If the second counter proves to be worth it, we could make just that one > atomic to avoid the need for abusing some spinlock. > The spinlock would be there to synchronize against the transition to per-cpu -- any trickery is avoided and we trivially know for a fact the remote party either sees the per-cpu state if transitioned, or local if not. Then one easily knows no updates have been lost and the buf for 2 sets of counters can be safely freed. While writing down the idea previously I did not realize the per-cpu counter ops disable interrupts around the op. That's already very slow and the trip should be comparable to paying for an atomic (as in the patch which introduced percpu counters here slowed things down for single-threaded processes). With your proposal the atomic would be there, but interrupt trip could be avoided. This would roughly maintain the current cost of doing the op (as in it would not get /worse/). My patch would make it lower. All that said, I'm going to refrain from writing a patch for the time being. If powers to be decide on your approach, I'm not going to argue -- I don't think either is a clear winner over the other. -- Mateusz Guzik