From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 519ABC6FA8F for ; Thu, 24 Aug 2023 09:20:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82132280094; Thu, 24 Aug 2023 05:20:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D105280071; Thu, 24 Aug 2023 05:20:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 698C8280094; Thu, 24 Aug 2023 05:20:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 57B93280071 for ; Thu, 24 Aug 2023 05:20:03 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 1A5B6140253 for ; Thu, 24 Aug 2023 09:20:03 +0000 (UTC) X-FDA: 81158451486.06.8E10344 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf18.hostedemail.com (Postfix) with ESMTP id D6FC01C0011 for ; Thu, 24 Aug 2023 09:20:00 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=1JIHW9ny; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=2eEqiSkD; spf=pass (imf18.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692868801; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/RGP9xeav/e3xVesZmYKSwuX1TjnqIVlXGjw9+5yUF0=; b=izIY79/J9O4Iiw1T2o40xDvr9SRhb2HCiQCH1Z1erLGysHbf8D8m89QvbLE52qILZShH+p /h0+HM1JMp64KQlBrALntVIMWp9G7GLwKcLK6w1b9ZXW7nyXDP9UCmnx2gwtcTWCcGHQ4/ lk54HG2achzxp5kR/pBtsrckNFIdpjo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692868801; a=rsa-sha256; cv=none; b=ScB2gnkF3YlzOfqxEFnvfe2iqM28+/qcgxdi2ZSHzg03pHniDD3hnqbeSsxdzxUkDc6LeG bqciiXDMMZbpqAvnUQO6c2GDv1ZCXIaYtQUlMkDh70ijk2QfasIws5Z7yoi9Mgmf+cf5UN G69MQPiLiD5imphvdfSWbWVmcaepgRU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=1JIHW9ny; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=2eEqiSkD; spf=pass (imf18.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 665C522D43; Thu, 24 Aug 2023 09:19:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1692868797; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/RGP9xeav/e3xVesZmYKSwuX1TjnqIVlXGjw9+5yUF0=; b=1JIHW9nydfiEN87mKuDeM/2onAxv3w6VzwVB4kz0YQtV9FLdnj+zEWgQ0TWOhpXuaxzgaw dtDGQdcopESva3h/gaUGp1hYDD1o5uNxpR6/c9jHgIz/KO0rNzJ+28Ei4nSuaDQqujG6u1 81YFi0XVK6lPjGJ3enJ/7ldLr+XtXxg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1692868797; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/RGP9xeav/e3xVesZmYKSwuX1TjnqIVlXGjw9+5yUF0=; b=2eEqiSkDdoa256MB0dclCNlzp37VVhXaRkQg8BqhmwDYcl3CxBgg4mY9r656Gy/I1sofCh 1j3srpdHXZzAG7CQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 52038132F2; Thu, 24 Aug 2023 09:19:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 9kP5E70g52QLEgAAMHmgww (envelope-from ); Thu, 24 Aug 2023 09:19:57 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id CCF15A0774; Thu, 24 Aug 2023 11:19:56 +0200 (CEST) Date: Thu, 24 Aug 2023 11:19:56 +0200 From: Jan Kara To: Dennis Zhou Cc: Jan Kara , Mateusz Guzik , linux-kernel@vger.kernel.org, tj@kernel.org, cl@linux.com, akpm@linux-foundation.org, shakeelb@google.com, linux-mm@kvack.org Subject: Re: [PATCH 0/2] execve scalability issues, part 1 Message-ID: <20230824091956.drn6ucixj4qbxwa7@quack3> References: <20230821202829.2163744-1-mjguzik@gmail.com> <20230821213951.bx3yyqh7omdvpyae@f> <20230822095154.7cr5ofogw552z3jk@quack3> <20230823094915.ggv3spzevgyoov6i@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: D6FC01C0011 X-Rspam-User: X-Stat-Signature: qjz6uid38et9nnjkpt18uzh8x976d86c X-Rspamd-Server: rspam03 X-HE-Tag: 1692868800-143772 X-HE-Meta: U2FsdGVkX1/ioFDbmLQ1sZ0f+MffET7B2GbuBds49HysPOARxu5DXFpAl1dOTIeZ4xhKXkPAGlG0pyMLf/ueMKt/T4EfJQybp3CK2e8HtVWlivg68a5MLtHIYVhXBiXQHcY+xVHVVN2emNPu23EC8tBnxzg7pE/wtmeOVeG+5bsYQJAkLQdP//UPPzqOJs3XkFXcGdiB6hwt9t5WzcKklcRg02aWtK5qKTWuY3ozdU+Ne+jdNiwPXvy7eROMF7LetiFS2Arnncy/furV6Rbb2Sy234mRhFmPKFK2QXs6zuiXJ0omA7BX7Df38L9EzViXgxKpactuix8vUQU7Avyxac1AMKIo8lTHOzn3VsdFhu0+M83HKbr12pVTWXt/Nt8OiZbylWl7+koF3kuPU2XklpnFVVFalIluXgY1sLmPSjfs7B4sEMuXTqg7hszWNgxBGI+AiVo6PbtYxipcbBOUUrVxYSCBasIFujgTDLdHQqR5utTNOp5WCp7ra7LgMx8cOHd6vuQZSce2wcKCoJQkPa4efe3z9axGW6KtBahj7jJ3efD4D3bcQsTaDGftYGo9rdtkrTOiK2aw1RoDk5jQ2Ht3x00AbsKEici9TCgNU6WaOXRnofTBOCcQtBd4Lk7BoiKTWiVREryQ4bAS7XFlM5vkh6su3aqB+TUASoWJt0l6o5HW1GXipbBv5m7+uQ/z7XcIcKJwOyxAfBNWpJW0qDnRlZC2P937MACghUULbskIE3KlGnEhoRukPAmvpJcDr/c6ZBd+INDDKHoPJu2hs7us/2uxyOT3OASoOadLlpyD6DQYO8F0rAsa4sFeY9Bn0CjzRqsV6GT9BSrcBLwDDoKfNNjcqx2UItnfpIss9C5VoXvWVXGjVXXimkW+0KQ7tksCEP9yaRmbPvwzvwzBtPvf+B8gHxcXzATFOyVqzRKAE6I834Asr9BH4q1Hng6hFQn10gzM42QBiQvwxhy E6RGfpsP XL2KgDpcqYtWIxoEeO3vqQoY71SIB76j6R+DILDifmm86w9uBdr/V5dh48LALdcATm9Iw/dOUgwWL6pK7W3ty7F/SQ5Y3qFdVhc1FOpivJV1/lO7Rvwdj9bmV7uD+wjeuaZ6lK4FnJjXdSawKmDmLhx4qMW0p0WHIIXfCZB5O8SC7/p2IkOjs/IVvZVQUQm107rJEmw/NBTOLyAdt65AUeAxmZl4ls5mrVOJ7RVxFEWHqrB0PW2RxbMp9/6if1EOom3AOHA+tDh2EO13gwsffOj7CxJMeCA/FASejc0zoM056vgJ4kwGWaFz/6UXHa5e4JOg9OjQraAczRvA5YdCFesyRQnrTIgl1B4lvFLkjScDeHJc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 23-08-23 13:27:56, Dennis Zhou wrote: > On Wed, Aug 23, 2023 at 11:49:15AM +0200, Jan Kara wrote: > > On Tue 22-08-23 16:24:56, Mateusz Guzik wrote: > > > On 8/22/23, Jan Kara wrote: > > > > On Tue 22-08-23 00:29:49, Mateusz Guzik wrote: > > > >> On 8/21/23, Mateusz Guzik wrote: > > > >> > True Fix(tm) is a longer story. > > > >> > > > > >> > Maybe let's sort out this patchset first, whichever way. :) > > > >> > > > > >> > > > >> So I found the discussion around the original patch with a perf > > > >> regression report. > > > >> > > > >> https://lore.kernel.org/linux-mm/20230608111408.s2minsenlcjow7q3@quack3/ > > > >> > > > >> The reporter suggests dodging the problem by only allocating per-cpu > > > >> counters when the process is going multithreaded. Given that there is > > > >> still plenty of forever single-threaded procs out there I think that's > > > >> does sound like a great plan regardless of what happens with this > > > >> patchset. > > > >> > > > >> Almost all access is already done using dedicated routines, so this > > > >> should be an afternoon churn to sort out, unless I missed a > > > >> showstopper. (maybe there is no good place to stuff a flag/whatever > > > >> other indicator about the state of counters?) > > > >> > > > >> That said I'll look into it some time this or next week. > > > > > > > > Good, just let me know how it went, I also wanted to start looking into > > > > this to come up with some concrete patches :). What I had in mind was that > > > > we could use 'counters == NULL' as an indication that the counter is still > > > > in 'single counter mode'. > > > > > > > > > > In the current state there are only pointers to counters in mm_struct > > > and there is no storage for them in task_struct. So I don't think > > > merely null-checking the per-cpu stuff is going to cut it -- where > > > should the single-threaded counters land? > > > > I think you misunderstood. What I wanted to do it to provide a new flavor > > of percpu_counter (sharing most of code and definitions) which would have > > an option to start as simple counter (indicated by pcc->counters == NULL > > and using pcc->count for counting) and then be upgraded by a call to real > > percpu thing. Because I think such counters would be useful also on other > > occasions than as rss counters. > > > > Kent wrote something similar and sent it out last year [1]. However, the > case slightly differs from what we'd want here, 1 -> 2 threads becomes > percpu vs update rate which a single thread might be able to trigger? Thanks for the pointer but that version of counters is not really suitable here as is (but we could factor out some common bits if that work is happening). 1 thread can easily do 10000 RSS updates per second. > [1] https://lore.kernel.org/lkml/20230501165450.15352-8-surenb@google.com/ Honza > > > Bonus problem, non-current can modify these counters and this needs to > > > be safe against current playing with them at the same time. (and it > > > would be a shame to require current to use atomic on them) > > > > Hum, I didn't realize that. Indeed I can see that e.g. khugepaged can be > > modifying the counters for other processes. Thanks for pointing this out. > > > > > That said, my initial proposal adds a union: > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > > index 5e74ce4a28cd..ea70f0c08286 100644 > > > --- a/include/linux/mm_types.h > > > +++ b/include/linux/mm_types.h > > > @@ -737,7 +737,11 @@ struct mm_struct { > > > > > > unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for > > > /proc/PID/auxv */ > > > > > > - struct percpu_counter rss_stat[NR_MM_COUNTERS]; > > > + union { > > > + struct percpu_counter rss_stat[NR_MM_COUNTERS]; > > > + u64 *rss_stat_single; > > > + }; > > > + bool magic_flag_stuffed_elsewhere; > > > > > > struct linux_binfmt *binfmt; > > > > > > > > > Then for single-threaded case an area is allocated for NR_MM_COUNTERS > > > countes * 2 -- first set updated without any synchro by current > > > thread. Second set only to be modified by others and protected with > > > mm->arg_lock. The lock protects remote access to the union to begin > > > with. > > > > arg_lock seems a bit like a hack. How is it related to rss_stat? The scheme > > with two counters is clever but I'm not 100% convinced the complexity is > > really worth it. I'm not sure the overhead of always using an atomic > > counter would really be measurable as atomic counter ops in local CPU cache > > tend to be cheap. Did you try to measure the difference? > > > > If the second counter proves to be worth it, we could make just that one > > atomic to avoid the need for abusing some spinlock. > > > > > Transition to per-CPU operation sets the magic flag (there is plenty > > > of spare space in mm_struct, I'll find a good home for it without > > > growing the struct). It would be a one-way street -- a process which > > > gets a bunch of threads and goes back to one stays with per-CPU. > > > > Agreed with switching to be a one-way street. > > > > > Then you get the true value of something by adding both counters. > > > > > > arg_lock is sparingly used, so remote ops are not expected to contend > > > with anything. In fact their cost is going to go down since percpu > > > summation takes a spinlock which also disables interrupts. > > > > > > Local ops should be about the same in cost as they are right now. > > > > > > I might have missed some detail in the above description, but I think > > > the approach is decent. > > > > Honza > > -- > > Jan Kara > > SUSE Labs, CR -- Jan Kara SUSE Labs, CR