linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Yury Norov <yury.norov@gmail.com>
Cc: andriy.shevchenko@linux.intel.com, linux@rasmusvillemoes.dk,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mateusz Guzik <mjguzik@gmail.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	tglx@linutronix.de, rppt@kernel.org
Subject: Re: [PATCH v2 2/2] mm,nodemask: Use nr_node_ids
Date: Thu, 3 Aug 2023 10:41:25 +0200	[thread overview]
Message-ID: <20230803084125.GE212435@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <ZMr4uBfjKY9dERl2@yury-ThinkPad>

On Wed, Aug 02, 2023 at 05:45:44PM -0700, Yury Norov wrote:
> + Linus, Mateusz
> 
> On Wed, Aug 02, 2023 at 09:36:16PM +0200, Peter Zijlstra wrote:
> > 
> > Just like how cpumask uses nr_cpu_ids to limit the bitmap scanning,
> > make nodemask use nr_node_ids.
> > 
> > Since current users expect MAX_NUMNODES as the end-of-bitmap marker,
> > retain this behaviour for now.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
> > ---
> > Changes since v1:
> >  - updated and reflowed the 'borrowed' comment some more (rppt)
> 
> Hi Peter,
> 
> Thanks for the patch! I wanted to do it sooner or later.
> 
> Can you mention the commit that you used to borrow the approach.
> Maybe suggested-by?

I borrowed the comment from current include/linux/cpumask.h, not a
particular commit.

> The motivation for the original patch was that the following test
> revealed broken small_const_nbits() optimization for cpumasks:
> 
>   On Fri, Mar 3, 2023 at 12:39 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>   >
>   > as an example here is a one-liner to show crappers which do 0-sized ops:
>   > bpftrace -e 'kprobe:memset,kprobe:memcpy /arg2 == 0/ { @[probe,
>   > kstack(2)] = count(); }'
> 
> See:
> https://lore.kernel.org/lkml/CAHk-=wgfNrMFQCFWFtn+UXjAdJAGAAFFJZ1JpEomTneza32A6g@mail.gmail.com/
> 
> Can you make sure your patch doesn't brake the test for nodemasks?

I've no idea what that even tries to do; I don't speak bpf. And
typically bpf things don't work on my machines because I refuse to build
with BTF on since that blows up build times.

> >  include/linux/nodemask.h |  121 ++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 89 insertions(+), 32 deletions(-)
> > 
> > --- a/include/linux/nodemask.h
> > +++ b/include/linux/nodemask.h
> > @@ -99,6 +99,48 @@
> >  typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
> >  extern nodemask_t _unused_nodemask_arg_;
> >  
> > +#if MAX_NUMNODES > 1
> > +extern unsigned int nr_node_ids;
> > +#else
> > +#define nr_node_ids		1U
> > +#endif
> > +
> > +/*
> > + * We have several different "preferred sizes" for the nodemask operations,
> > + * depending on operation.
> > + *
> > + * For example, the bitmap scanning and operating operations have optimized
> > + * routines that work for the single-word case, but only when the size is
> > + * constant. So if MAX_NUMNODES fits in one single word, we are better off
> > + * using that small constant, in order to trigger the optimized bit finding.
> > + * That is 'small_nodemask_size'.
> > + *
> > + * The clearing and copying operations will similarly perform better with a
> 
> Copying will not, because there's no nodemask_copy(). :-)

Yeah, I know, *shrug*. If you really care, I'd prefer to actually
implement that instead of fixing the comment.

> > + * constant size, but we limit that size arbitrarily to four words. We call
> > + * this 'large_nodemask_size'.
> > + *
> > + * Finally, some operations just want the exact limit, either because they set
> > + * bits or just don't have any faster fixed-sized versions. We call this just
> > + * 'nr_nodemask_bits'.
> > + *
> > + * Note that these optional constants are always guaranteed to be at least as
> > + * big as 'nr_node_ids' itself is, and all our nodemask allocations are at
> > + * least that size. The optimization comes from being able to potentially use
> > + * a compile-time constant instead of a run-time generated exact number of
> > + * nodes.
> > + */
> > +#if MAX_NUMNODES <= BITS_PER_LONG
> > +  #define small_nodemask_bits ((unsigned int)MAX_NUMNODES)
> > +  #define large_nodemask_bits ((unsigned int)MAX_NUMNODES)
> > +#elif MAX_NUMNODES <= 4*BITS_PER_LONG
> > +  #define small_nodemask_bits nr_node_ids
> > +  #define large_nodemask_bits ((unsigned int)MAX_NUMNODES)
> > +#else
> > +  #define small_nodemask_bits nr_node_ids
> > +  #define large_nodemask_bits nr_node_ids
> > +#endif
> > +#define nr_nodemask_bits nr_node_ids
> 
> We don't need nr_nodemask_bits. In CPU subsystem nr_cpumask_bits
> exists (existed) to support dynamic allocation for cpumask_var_t
> if CPUMASK_OFFSTACK is enabled. And it apparently caused troubles.
> 
> In nodemasks we don't have an offstack feature, and don't need the
> nr_nodemask_bits. Just use nr_node_ids everywhere.

Sure, can do.

> [...]
> 
> > -#define nodes_setall(dst) __nodes_setall(&(dst), MAX_NUMNODES)
> > +#define nodes_setall(dst) __nodes_setall(&(dst), large_nodemask_bits)
> >  static inline void __nodes_setall(nodemask_t *dstp, unsigned int nbits)
> >  {
> >  	bitmap_fill(dstp->bits, nbits);
> >  }
> 
> When MAX_NUMNODES <= 4*BITS_PER_LONG, this breaks the rule that all
> bits beyond nr_node_ids must be clear. And that in turn may brake
> nodemask_weight() and others. Refer to this patch for details and
> correct implementation:

I think I got that right, consider:

#elif MAX_NUMNODES <= 4*BITS_PER_LONG
  #define small_nodemask_bits nr_node_ids
  #define large_nodemask_bits ((unsigned int)MAX_NUMNODES)

IOW: small_nodemask_bits <= large_nodemask_bits (as per the naming)

So nodemask_weight() will look at less or all bits set/cleared.

The bug you referred to was using fill with nr_cpumask_bits, using
large_cpumask_bits would've been sufficient.

> > @@ -452,7 +511,6 @@ static inline unsigned int next_memory_n
> >  	return next_node(nid, node_states[N_MEMORY]);
> >  }
> >  
> > -extern unsigned int nr_node_ids;
> >  extern unsigned int nr_online_nodes;
> >  
> >  static inline void node_set_online(int nid)
> > @@ -494,7 +552,6 @@ static inline int num_node_state(enum no
> >  #define first_memory_node	0
> >  #define next_online_node(nid)	(MAX_NUMNODES)
> >  #define next_memory_node(nid)	(MAX_NUMNODES)
> > -#define nr_node_ids		1U
> >  #define nr_online_nodes		1U
>  
> I like how you separated the nr_node_ids from the other ifdefery, and
> changed it to __ro_after_init. But I think it's better to fold this all
> into the 1st patch.

This move was needed to make it build -- compiler feels strongly you
should have declared a variable before using it etc.. No other
motivation for it. As such it sits in this patch.

> Why don't we make nr_cpu_ids to be a __ro_after_init just as well?

Sure, will add patch. Should've checked :/


  reply	other threads:[~2023-08-03  8:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-02 11:24 [PATCH 0/2] nodemask: " Peter Zijlstra
2023-08-02 11:24 ` [PATCH 1/2] mm: Mark nr_node_ids __ro_after_init Peter Zijlstra
2023-08-02 14:58   ` Mike Rapoport
2023-08-02 11:25 ` [PATCH 2/2] mm,nodemask: Use nr_node_ids Peter Zijlstra
2023-08-02 15:32   ` Mike Rapoport
2023-08-02 15:48     ` Peter Zijlstra
2023-08-02 19:36   ` [PATCH v2 " Peter Zijlstra
2023-08-03  0:45     ` Yury Norov
2023-08-03  8:41       ` Peter Zijlstra [this message]
2023-08-03 20:41         ` Yury Norov
2023-08-03 21:00           ` Yury Norov
2023-08-04  8:14           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230803084125.GE212435@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mjguzik@gmail.com \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox