From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03847C3DA6F for ; Wed, 23 Aug 2023 20:28:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D928E280055; Wed, 23 Aug 2023 16:28:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D428928003B; Wed, 23 Aug 2023 16:28:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0A59280055; Wed, 23 Aug 2023 16:28:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AF7E028003B for ; Wed, 23 Aug 2023 16:28:02 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7CF1CC02C2 for ; Wed, 23 Aug 2023 20:28:02 +0000 (UTC) X-FDA: 81156506004.21.7EC9461 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf15.hostedemail.com (Postfix) with ESMTP id A70B3A0007 for ; Wed, 23 Aug 2023 20:28:00 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692822480; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NOFQJKFHeWeaDdjT7tL752Uv3UvOkwmfZ3hHz0dcMOo=; b=ZQq5LCr3QkhHzcEsOq+oTnSm/CGGwqTU166HgJSg5U13Ogb2x/0guUCWpWt2D6Em97Q6oI SDoV6xMZF03Gji7rg/TTEfaca/NQSBIYB3naCaE25N9dpuFTS3JAkZkNOhadUbTyNtviXj u4aGFDsMgUHqRBFozD0HC9z+Q7FR3k4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692822480; a=rsa-sha256; cv=none; b=dLipOXzNF+ez7lWAiGicXiJGDN23Gb1UtqrWxeTdAGoSFWApsQRScdI3Nx+gV9GfGch1cF hN+VwDasim5HrnmfB70XipeZStywvbcndy32wVDeHhyB9arqU6mVF3uLMWSBj/Hkj2jG74 L0LnNksncTt/QwSIxfQdu8RUOGIlLrg= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of dennisszhou@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=dennisszhou@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-26d49cf1811so3139501a91.0 for ; Wed, 23 Aug 2023 13:28:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692822479; x=1693427279; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=NOFQJKFHeWeaDdjT7tL752Uv3UvOkwmfZ3hHz0dcMOo=; b=NZlIGMsVZ9322m2c+HckFEZ+0+TX/P27NXrSz9sNH8fueBTxYHceDrV2rv4SCSlANg 5kd8l7SbB/r2BiQ4TES589+VUQNj/uC2dOizlI3wB6hdKcaAiwuV0xmnCvXiSKHZkqjr jYVlMjBQ2A4SoGFBigz9ZMhZ40sjsBCunJjMMPVBmjQ0isCXTm+FWDmv4koRLTUZNAIU UiXtEwZVxxu7y3IblvGXsxxDaD1yf7+SaNjOF0ZHq7cOMyO7pzQTlOMDF5OqgZMIeWt6 WVLbIuuzMsKpyjitVqht1lhxnSMHHaKrXb1KwIwuoCBgBP6BuEbYMgtFbyLDBK0CxelM 9cbA== X-Gm-Message-State: AOJu0Yx2CJp4FsxwYUJ/p0NtdyENMsV4M86rqqvVnwm2zo1vO/HJYcG9 vhqj5MVuki5Lz8AGDRY/vVg= X-Google-Smtp-Source: AGHT+IF5n0k8B2+8yg9Yqfc5kblL6ty6PmgZqB7/RsCQCsYM1xqQARwOAGlozcpSP6F7OVGy1UzWkQ== X-Received: by 2002:a17:90b:20d:b0:26b:455b:657e with SMTP id fy13-20020a17090b020d00b0026b455b657emr10659945pjb.41.1692822479199; Wed, 23 Aug 2023 13:27:59 -0700 (PDT) Received: from snowbird ([199.73.127.3]) by smtp.gmail.com with ESMTPSA id 6-20020a17090a01c600b00265c742a262sm211975pjd.4.2023.08.23.13.27.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Aug 2023 13:27:58 -0700 (PDT) Date: Wed, 23 Aug 2023 13:27:56 -0700 From: Dennis Zhou To: Jan Kara , Mateusz Guzik Cc: linux-kernel@vger.kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org Subject: Re: [PATCH 0/2] execve scalability issues, part 1 Message-ID: References: <20230821202829.2163744-1-mjguzik@gmail.com> <20230821213951.bx3yyqh7omdvpyae@f> <20230822095154.7cr5ofogw552z3jk@quack3> <20230823094915.ggv3spzevgyoov6i@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230823094915.ggv3spzevgyoov6i@quack3> X-Rspamd-Queue-Id: A70B3A0007 X-Rspam-User: X-Stat-Signature: a4rwcf1bqmox49xxp5r983eb56o6cfpc X-Rspamd-Server: rspam03 X-HE-Tag: 1692822480-746882 X-HE-Meta: U2FsdGVkX181opqwnTjIEbLheSaCNnjIk9uF2ObxjwoioXOeL7hLmVnBrum0/knkfqSdJwxnLTx1C19hxk4Slv9BcKSE5JGvT4GakX8Nfx2LjDTxpAa0PKS3lh+OkfcLSYcBxWFsRuohe2Vu+SbO5Akw8BCSDPOWWMKjW5Fxl7a+su7E1tA2yRcxa2jWqReNU76lVqSqp3tAknuzTojyhBVvD5+UUohbVZu7RiK50EbYksKELaoxAbVI0wUDnkUI+fxlYuN/CY/LSd1YghGOammRydpIZcHQW4oyrygmb7c52s9Y3Pxyeb5s6G3FK8jmzHFgnMo8b1SF63bxYFWwjLqk6cVC/0G6LH5INf6Aw5u1o/NCz7NbZr0u9N1sHhS/HA+7t9pU5CbRPhQp7jzJyKS9IlknxZQmfQuVIrm1Vea6ZFjTjdAZjCVBYr8cJGNpMDq/5qLHZ10/B1OrNWYy98l/XUeghQyDL6y7q3Y4jCGRvm6F+SAxe72tvxpwosZw2PW3ziWX2HHOdb3ln3JmA5jCwPO9AC74/a3DIU/0/9NuyXrDtoJCsJ2K5oe2zrjQiuGdg8azXl5VCL4TlDGPWI8iZ5uzjmn4gdsNDv7oVElOKib3uP8e7EBN1cITiTNlpLkWKZd4EGYPXwdEdRd64XUHCQGSTUON6Ox4Fnqi6w36xaSfJowOHoo87ioWHuzp1DWbiIJqRR8s02Dv/WMsu3EFc4dPj2JdSCghYCl3ylzbosjv8mDn+sDUFb0mOqlx8CoogcpBW3Rv15xOa2/EUX5L53obvdXpikVNxkAa48x8dYY1Wo1gpX/yXHlIJgu+0G8orM2TqAEB1uj47/jkCLIQ6nzsDx0IY7pT7jNPG3cLLQyOvc9JX+xaVMx1Moqzk6X6laVV6IKr1/f20+w91mrdm1NAxj1yfmldHVC5MQ+AG4yIz2mac5EiaLiHFpbP0Hk7Cer4Ika8+NMM+Pp WtWpEYQG pEgScLomtCI6FvQNvGjW+B0zqP9wXvJei6PsSeljv6Y8FJgHfKSmaWICONdPWrqvgIeuLpVYi9/JlME5r/b7/W0INPAorKjkVQ4oixojvc56WcXVcnX0Uwxi9vDdA58rZ5BEtK6HMHxhHH5f7R2pS2vIqxTEYH2syNvRek/nqD5ITV4neCvApsR4GaZHr6ON0mOx8FGM1cZrphEeLGFhMZTR/t4jxn9pyvUw4X+n5ZCreWh0eW57kCgd+J6RZ47keRF3PQHK3nPBwFqiFcIx13DVzCxgmPun/FY9s7EPWU9vLNFRCCXb2CJQ27BIK26M/5/MABwfC9LWzW6ZN4PvutABtcHbwBbH/9HSNTikGXM5FYPizokysOh9czzAwwmvkJZgBAirDjjFw81UhWFPnop0YIxZyD4v/tDlLHtrN64wMublSI4LVxkT72ctI54EAHaj4rhlYGmKtAaMTshj+GFL7bO9mUvAjUzcCpy+W7YlgTKnn76OBk6pn+SNPKG6hX6OacDjzenfcyZ9C5se4ziCANG76Uk2uN5biOfZY7nVDI5ylTDO4UcvbyVCcC9v0U7R2mjko4yIhv73QICdtlYNONfqsCcyhWGZvHCE8Yr1u4QR4Oh7pTPa5KlKo53I8TAAC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 23, 2023 at 11:49:15AM +0200, Jan Kara wrote: > On Tue 22-08-23 16:24:56, Mateusz Guzik wrote: > > On 8/22/23, Jan Kara wrote: > > > On Tue 22-08-23 00:29:49, Mateusz Guzik wrote: > > >> On 8/21/23, Mateusz Guzik wrote: > > >> > True Fix(tm) is a longer story. > > >> > > > >> > Maybe let's sort out this patchset first, whichever way. :) > > >> > > > >> > > >> So I found the discussion around the original patch with a perf > > >> regression report. > > >> > > >> https://lore.kernel.org/linux-mm/20230608111408.s2minsenlcjow7q3@quack3/ > > >> > > >> The reporter suggests dodging the problem by only allocating per-cpu > > >> counters when the process is going multithreaded. Given that there is > > >> still plenty of forever single-threaded procs out there I think that's > > >> does sound like a great plan regardless of what happens with this > > >> patchset. > > >> > > >> Almost all access is already done using dedicated routines, so this > > >> should be an afternoon churn to sort out, unless I missed a > > >> showstopper. (maybe there is no good place to stuff a flag/whatever > > >> other indicator about the state of counters?) > > >> > > >> That said I'll look into it some time this or next week. > > > > > > Good, just let me know how it went, I also wanted to start looking into > > > this to come up with some concrete patches :). What I had in mind was that > > > we could use 'counters == NULL' as an indication that the counter is still > > > in 'single counter mode'. > > > > > > > In the current state there are only pointers to counters in mm_struct > > and there is no storage for them in task_struct. So I don't think > > merely null-checking the per-cpu stuff is going to cut it -- where > > should the single-threaded counters land? > > I think you misunderstood. What I wanted to do it to provide a new flavor > of percpu_counter (sharing most of code and definitions) which would have > an option to start as simple counter (indicated by pcc->counters == NULL > and using pcc->count for counting) and then be upgraded by a call to real > percpu thing. Because I think such counters would be useful also on other > occasions than as rss counters. > Kent wrote something similar and sent it out last year [1]. However, the case slightly differs from what we'd want here, 1 -> 2 threads becomes percpu vs update rate which a single thread might be able to trigger? [1] https://lore.kernel.org/lkml/20230501165450.15352-8-surenb@google.com/ Thanks, Dennis > > Bonus problem, non-current can modify these counters and this needs to > > be safe against current playing with them at the same time. (and it > > would be a shame to require current to use atomic on them) > > Hum, I didn't realize that. Indeed I can see that e.g. khugepaged can be > modifying the counters for other processes. Thanks for pointing this out. > > > That said, my initial proposal adds a union: > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index 5e74ce4a28cd..ea70f0c08286 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -737,7 +737,11 @@ struct mm_struct { > > > > unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for > > /proc/PID/auxv */ > > > > - struct percpu_counter rss_stat[NR_MM_COUNTERS]; > > + union { > > + struct percpu_counter rss_stat[NR_MM_COUNTERS]; > > + u64 *rss_stat_single; > > + }; > > + bool magic_flag_stuffed_elsewhere; > > > > struct linux_binfmt *binfmt; > > > > > > Then for single-threaded case an area is allocated for NR_MM_COUNTERS > > countes * 2 -- first set updated without any synchro by current > > thread. Second set only to be modified by others and protected with > > mm->arg_lock. The lock protects remote access to the union to begin > > with. > > arg_lock seems a bit like a hack. How is it related to rss_stat? The scheme > with two counters is clever but I'm not 100% convinced the complexity is > really worth it. I'm not sure the overhead of always using an atomic > counter would really be measurable as atomic counter ops in local CPU cache > tend to be cheap. Did you try to measure the difference? > > If the second counter proves to be worth it, we could make just that one > atomic to avoid the need for abusing some spinlock. > > > Transition to per-CPU operation sets the magic flag (there is plenty > > of spare space in mm_struct, I'll find a good home for it without > > growing the struct). It would be a one-way street -- a process which > > gets a bunch of threads and goes back to one stays with per-CPU. > > Agreed with switching to be a one-way street. > > > Then you get the true value of something by adding both counters. > > > > arg_lock is sparingly used, so remote ops are not expected to contend > > with anything. In fact their cost is going to go down since percpu > > summation takes a spinlock which also disables interrupts. > > > > Local ops should be about the same in cost as they are right now. > > > > I might have missed some detail in the above description, but I think > > the approach is decent. > > Honza > -- > Jan Kara > SUSE Labs, CR