From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB96CC001DF for ; Thu, 3 Aug 2023 20:41:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB1A528029D; Thu, 3 Aug 2023 16:41:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E3ACF28022C; Thu, 3 Aug 2023 16:41:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB3BD28029D; Thu, 3 Aug 2023 16:41:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B4B2A28022C for ; Thu, 3 Aug 2023 16:41:47 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 875B5A12D3 for ; Thu, 3 Aug 2023 20:41:47 +0000 (UTC) X-FDA: 81083964654.04.1E720E7 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf25.hostedemail.com (Postfix) with ESMTP id AAB49A0008 for ; Thu, 3 Aug 2023 20:41:45 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=biDFdUF6; spf=pass (imf25.hostedemail.com: domain of yury.norov@gmail.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=yury.norov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691095305; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TcDnbxiBPDzOphuc02sW0vwiJpe0C01reQZ+2uN1onU=; b=b1tfOGQVNbb+wD+Qo0AOPsxfhJ88KK83uY3dzsk2IGmqvZ7k+Q0V3SpfEZgHVEq6MJ4fzG zRU56cYs10WlSHGodsmSeW3R7BrY53tQC9N+5bK00SXaKcORcc/RFGqRx67jTKP68eCq89 DYXj/QLLYohbKm9pdtGQt5qLC16a0pw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691095305; a=rsa-sha256; cv=none; b=G7u954U2kLXXTX7gAbqCPVtuB1cQs6BiBnGlDlJVFhTlFH2T8xoy8Hu1KagkT5DvwG9E+4 w0jhcUc1ES5q7gyKPCZ+OMvDc+COLTXQwWbkgP3XOqJdQwzimt3nuZrcnO6TDkxIdor9j3 d5ONOUY3V86kufYiCzHZsruIx9b4HZE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=biDFdUF6; spf=pass (imf25.hostedemail.com: domain of yury.norov@gmail.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=yury.norov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-76c8dd2ce79so115704985a.1 for ; Thu, 03 Aug 2023 13:41:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691095304; x=1691700104; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=TcDnbxiBPDzOphuc02sW0vwiJpe0C01reQZ+2uN1onU=; b=biDFdUF6sYNcKGoLXdrbgsEz+S+wsM0jUhIk7GJK8aExv8hk3/l1Yj2IAo+2B8zK6o vL5ggDZcPoPeexwMRxr+n81qw26GOaCvBPiI4hYsihBJdKYs2NTVOxJ60ajx+getObsr +Q4BD0O0QnT1e7d4+ICPQihlX6al5JQgWwADCWn2r9Np8Ma4tj+QukHEZge5gb8H1Z6t DfryvZbvpu/1KUAwX1zbCUieuV1Xn9nbhmIloIcL+ZZ7PxqHKhSDfh0/w5C9qzDhjJro 2nW7Qrom8L/8cMV08h9iJ8JBN8pVZrZsQu4TJR+gJoSMQbmO1lvIuNhl/laTTzvyyLsU UKew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691095304; x=1691700104; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TcDnbxiBPDzOphuc02sW0vwiJpe0C01reQZ+2uN1onU=; b=bSYSBnDUuKp21UULTZfBfmJ/GPlYe5aQw/0qRUaIoskAl+IR6mWNa0ZnKqUNts1r1R nlJD1KoHZPZTjf9Te6hj++4gjT0huP3uWIfLDdSJt2rMiNoM3UjfhGxze6hXmeOWOx77 QzVx4UwJ86EidnlzizqFPM8tmGGuXLII0o3pYe7qOyGixjgrI4N2bi1VfMbjr5FrMeIe ksiHxzO8VVudRcjTBquWnGaLlq2nZOpgWUuSudAI5EpoVJtrNS+KLRXcn1K/w/foVtIX Id82ifZ9y7fOKAFKFkjyGU+yI/OUA6ZcXnNpz3JyQq9IY7JFnxfM5bArtrC/47rdQU4J oEwQ== X-Gm-Message-State: ABy/qLZ1ZxIyng/C62s2xRLVa5WtqyYrxRf2Uc4f8mcCdvE4R7csjnql EPj8VDOO+uebySEfi2xaGOU= X-Google-Smtp-Source: APBJJlFWAjBAppG4nzoNi4eLC2h6yoS+0G49wA7Jf1MpneXlzuutoD3gCYXcV3EQEWS55u0atqgUNw== X-Received: by 2002:a05:620a:a97:b0:76c:a290:5285 with SMTP id v23-20020a05620a0a9700b0076ca2905285mr16439631qkg.15.1691095304303; Thu, 03 Aug 2023 13:41:44 -0700 (PDT) Received: from localhost ([50.217.79.158]) by smtp.gmail.com with ESMTPSA id r30-20020a0cb29e000000b0063d2a70dff5sm163939qve.72.2023.08.03.13.41.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 13:41:43 -0700 (PDT) Date: Thu, 3 Aug 2023 13:41:42 -0700 From: Yury Norov To: Peter Zijlstra Cc: andriy.shevchenko@linux.intel.com, linux@rasmusvillemoes.dk, Andrew Morton , Linus Torvalds , Mateusz Guzik , linux-kernel@vger.kernel.org, linux-mm@kvack.org, tglx@linutronix.de, rppt@kernel.org Subject: Re: [PATCH v2 2/2] mm,nodemask: Use nr_node_ids Message-ID: References: <20230802112458.230221601@infradead.org> <20230802112525.633758009@infradead.org> <20230802193616.GC231007@hirez.programming.kicks-ass.net> <20230803084125.GE212435@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230803084125.GE212435@hirez.programming.kicks-ass.net> X-Rspamd-Queue-Id: AAB49A0008 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 3fw39qddauzueaamwanxhw4px9quyq99 X-HE-Tag: 1691095305-719740 X-HE-Meta: U2FsdGVkX1/7sJalPXb/iOegzLsWZ2QqDb2QYstm11rI0clmm5AnHLPxZnFFpsyIau7dxGY3dm13OooLmQE1ZpPhtyQ9LBNxRu8ADAXHqZC5XrXEyqword6fdUVfhyzcZzwUhOrnrcFjwOPwbve7Jh46tAKx30bONRoBkFG1Rnz4W2c7wERhvM3xmdMA3putS+qIuhx2wDEmu8nz5kHDQ1qnkEaMm5W/kiwZihc/moXFc8udVB4q6S5ScYJRXyvzcGEQ8r8RPOng9tzTsVyMMtdD6qJPw1rxXqilHXpMXdVe1WfKH9zanstdJpDE6BfVXFBulDrmCsjMBNyC+F0HuBgs18D6RhTDxZhsnPrScLuERbXDYSk/qM+6sew5dQv8IecnhHNyUsXDHCfr/bNddiYltgirU6wqrCFUIMvuyZQny4ljPQiWmSbNnGy9NhYgRKp4ywW9o+4BttK6jOaoLifH66XGDYdYXwGS3P0/O9dDGg5goEpFjZWgwHtopcb45tfyJaioCvT5VXEBtVD+m0hghC0rK7qk+vKjrPzilmms40eixclu54U5czXAwmdp1GVDrGRLSNKPzAJ12YEksroRWCBwwPYI3DlwucXJ7e4Z53v+3ZXThOT/NbTCDMersQJ3XDpqhUVYIJow4Y41XI/9ujFioI3JpAAoXslgdBv0dWAcZ3K5eIc+MBBpGQTQlKPa3uKmadJCNA1Frsgv9/bSant2v88znjlRViN58I2as1xBP39eynV/xiTmKppvJ5qE9NhVtc/k9a9baAeq8Xj4OJYo9ZTeJjYe+wlWTIUqoQP8ijxDmF9vvepQCM53t5mvuWZHg55hQVyCV89ejwzkAc8puNu9ooDHgN7J7eHj1a19R9+l+5yCkb0529fayy0rIpPRRDL+KLMp+F/hx5WY01Y/vpoOumuZnglKpgLmYpfNLNIG7dD92bQaVoOD3Bjy8jLzKkxPu7iDLO0 HUUgnZ1f g/QXyzSpZQphVCo14TATS/oxHAXd28xXNPWZdRGZdbjf2Swo6pVUhdFII8sbz4H6unxLQo1JSY0Ovy64mJThbgZZoqHVywiKnT4nthth7hS3izEpURR3/ulqwz2QkKpP+pq0kxCL1h62t5tqyELyb9pfEdsanXX2goCBDBea4/zDxHwjDJ2xtUTQ07cILdiBpAWJBhUSA/fUBQDKF3uQVQMWRI7PQ8+IS2AuEFGjbCRwSM74mLkDX75buejKYVqkLc7/1idJ8XuNBD1kWD+q3pNYKHNJnlrO67j9+ojEURRM60GrMAOqN5WgoDA/qxT7vGpGeuhV96TRLpLhiyMnUh1XHNy6Iv0xGC9bG/qmoHQiIJ3BPnjHcWgNNtkrwxya3BmMoZ67vKMNlrU0Afy7M03mUpKY9hP3KEky0efSHl/bEfty+1iNalgb7NiEwoeKcD7f9UF0ffY159mx3R3S+2ycR2t+zyy0xS6TO8mNlQnXIpr5hiUnNQohXXzEfrYvrBmUfJUekmm2qAkh/yLrkwuG/RQkfjpWPgTeY4xVyd79xsyi2aTp2lS9vKVkb6G25I+0AwcEVfe24tJ80WXAOHb9q3EgG5LJN4dXpKYEpWDl/S8RO93v040sBFqc5H/tm/oamk0SvvhzSu39Pho22YcFWjzfRTUtwWRl4RdelBpORTK6enDRkgRnuVOKNLNIKXgBy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 03, 2023 at 10:41:25AM +0200, Peter Zijlstra wrote: > On Wed, Aug 02, 2023 at 05:45:44PM -0700, Yury Norov wrote: > > + Linus, Mateusz > > > > On Wed, Aug 02, 2023 at 09:36:16PM +0200, Peter Zijlstra wrote: > > > > > > Just like how cpumask uses nr_cpu_ids to limit the bitmap scanning, > > > make nodemask use nr_node_ids. > > > > > > Since current users expect MAX_NUMNODES as the end-of-bitmap marker, > > > retain this behaviour for now. > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > Reviewed-by: Mike Rapoport (IBM) > > > --- > > > Changes since v1: > > > - updated and reflowed the 'borrowed' comment some more (rppt) > > > > Hi Peter, > > > > Thanks for the patch! I wanted to do it sooner or later. > > > > Can you mention the commit that you used to borrow the approach. > > Maybe suggested-by? > I borrowed the comment from current include/linux/cpumask.h, not a > particular commit. The particular commit is 596ff4a09b898 ("cpumask: re-introduce constant-sized cpumask optimizations"). I think I mentioned it in the other email. It has a quite detailed description why small and large sizes exist. The 'Just like how cpumask does' explanation works, but I'd prefer to have an explicit link that describes why are we doing that. > > The motivation for the original patch was that the following test > > revealed broken small_const_nbits() optimization for cpumasks: > > > > On Fri, Mar 3, 2023 at 12:39 PM Mateusz Guzik wrote: > > > > > > as an example here is a one-liner to show crappers which do 0-sized ops: > > > bpftrace -e 'kprobe:memset,kprobe:memcpy /arg2 == 0/ { @[probe, > > > kstack(2)] = count(); }' > > > > See: > > https://lore.kernel.org/lkml/CAHk-=wgfNrMFQCFWFtn+UXjAdJAGAAFFJZ1JpEomTneza32A6g@mail.gmail.com/ > > > > Can you make sure your patch doesn't brake the test for nodemasks? > > I've no idea what that even tries to do; I don't speak bpf. And Neither me. The idea is that bitmap users shouldn't break small_const_nbits() optimization. > typically bpf things don't work on my machines because I refuse to build > with BTF on since that blows up build times. > > > include/linux/nodemask.h | 121 ++++++++++++++++++++++++++++++++++------------- > > > 1 file changed, 89 insertions(+), 32 deletions(-) > > > > > > --- a/include/linux/nodemask.h > > > +++ b/include/linux/nodemask.h > > > @@ -99,6 +99,48 @@ > > > typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t; > > > extern nodemask_t _unused_nodemask_arg_; > > > > > > +#if MAX_NUMNODES > 1 > > > +extern unsigned int nr_node_ids; > > > +#else > > > +#define nr_node_ids 1U > > > +#endif > > > + > > > +/* > > > + * We have several different "preferred sizes" for the nodemask operations, > > > + * depending on operation. > > > + * > > > + * For example, the bitmap scanning and operating operations have optimized > > > + * routines that work for the single-word case, but only when the size is > > > + * constant. So if MAX_NUMNODES fits in one single word, we are better off > > > + * using that small constant, in order to trigger the optimized bit finding. > > > + * That is 'small_nodemask_size'. > > > + * > > > + * The clearing and copying operations will similarly perform better with a > > > > Copying will not, because there's no nodemask_copy(). :-) > > Yeah, I know, *shrug*. If you really care, I'd prefer to actually > implement that instead of fixing the comment. That would be a dead code, isn't? [...] > > > -#define nodes_setall(dst) __nodes_setall(&(dst), MAX_NUMNODES) > > > +#define nodes_setall(dst) __nodes_setall(&(dst), large_nodemask_bits) > > > static inline void __nodes_setall(nodemask_t *dstp, unsigned int nbits) > > > { > > > bitmap_fill(dstp->bits, nbits); > > > } > > > > When MAX_NUMNODES <= 4*BITS_PER_LONG, this breaks the rule that all > > bits beyond nr_node_ids must be clear. And that in turn may brake > > nodemask_weight() and others. Refer to this patch for details and > > correct implementation: > > I think I got that right, consider: > > #elif MAX_NUMNODES <= 4*BITS_PER_LONG > #define small_nodemask_bits nr_node_ids > #define large_nodemask_bits ((unsigned int)MAX_NUMNODES) > > IOW: small_nodemask_bits <= large_nodemask_bits (as per the naming) > > So nodemask_weight() will look at less or all bits set/cleared. > > The bug you referred to was using fill with nr_cpumask_bits, using > large_cpumask_bits would've been sufficient. Consider MAX_NUMNODES == 64 and nr_node_ids == 4. Then small_nodemask_bits == 64. The nodes_full() will set all 64 bits: #define nodes_full(nodemask) __nodes_full(&(nodemask), small_nodemask_bits) static inline bool __nodes_full(const nodemask_t *srcp, unsigned int nbits) { return bitmap_full(srcp->bits, nbits); } And the following nodes_weight() will return 64: #define nodes_weight(nodemask) __nodes_weight(&(nodemask), small_nodemask_bits) static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits) { return bitmap_weight(srcp->bits, nbits); } Which is definitely wrong because there's 4 nodes at max. To solve this problem, both cpumask and nodemask implementations share the same rule: all bits beyond nr_{node,cpumask}_bits must be always cleared. See how cpumask_setall() implements that: static inline void cpumask_setall(struct cpumask *dstp) { // Make sure we don't break the optimization if (small_const_nbits(small_cpumask_bits)) { cpumask_bits(dstp)[0] = BITMAP_LAST_WORD_MASK(nr_cpumask_bits); return; } // Pass the exact (runtime) number of bits bitmap_fill(cpumask_bits(dstp), nr_cpumask_bits); } Hope that makes sense. Thanks, Yury > > > @@ -452,7 +511,6 @@ static inline unsigned int next_memory_n > > > return next_node(nid, node_states[N_MEMORY]); > > > } > > > > > > -extern unsigned int nr_node_ids; > > > extern unsigned int nr_online_nodes; > > > > > > static inline void node_set_online(int nid) > > > @@ -494,7 +552,6 @@ static inline int num_node_state(enum no > > > #define first_memory_node 0 > > > #define next_online_node(nid) (MAX_NUMNODES) > > > #define next_memory_node(nid) (MAX_NUMNODES) > > > -#define nr_node_ids 1U > > > #define nr_online_nodes 1U > > > > I like how you separated the nr_node_ids from the other ifdefery, and > > changed it to __ro_after_init. But I think it's better to fold this all > > into the 1st patch. > > This move was needed to make it build -- compiler feels strongly you > should have declared a variable before using it etc.. No other > motivation for it. As such it sits in this patch. > > > Why don't we make nr_cpu_ids to be a __ro_after_init just as well? > > Sure, will add patch. Should've checked :/