linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org, mingo@redhat.com,
	dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com,
	Andrew Morton <akpm@linux-foundation.org>,
	urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com,
	Arnd Bergmann <arnd@arndb.de>,
	linux-api@vger.kernel.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org, malteskarupke@web.de
Subject: Re: [PATCH v1 11/14] futex: Implement FUTEX2_NUMA
Date: Mon, 31 Jul 2023 20:03:20 +0200	[thread overview]
Message-ID: <20230731180320.GR29590@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <87pm48m19m.ffs@tglx>

On Mon, Jul 31, 2023 at 07:36:21PM +0200, Thomas Gleixner wrote:
> On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote:
> >  struct futex_hash_bucket *futex_hash(union futex_key *key)
> >  {
> > -	u32 hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4,
> > +	u32 hash = jhash2((u32 *)key,
> > +			  offsetof(typeof(*key), both.offset) / sizeof(u32),
> >  			  key->both.offset);
> > +	int node = key->both.node;
> >  
> > -	return &futex_queues[hash & (futex_hashsize - 1)];
> > +	if (node == -1) {
> > +		/*
> > +		 * In case of !FLAGS_NUMA, use some unused hash bits to pick a
> > +		 * node -- this ensures regular futexes are interleaved across
> > +		 * the nodes and avoids having to allocate multiple
> > +		 * hash-tables.
> > +		 *
> > +		 * NOTE: this isn't perfectly uniform, but it is fast and
> > +		 * handles sparse node masks.
> > +		 */
> > +		node = (hash >> futex_hashshift) % nr_node_ids;
> 
> Is nr_node_ids guaranteed to be stable after init? It's marked
> __read_mostly, but not __ro_after_init.

AFAICT it is only ever written to in setup_nr_node_ids() and that is all
__init code. So I'm thinking this could/should indeed be
__ro_after_init. Esp. so since it is an exported variable.

Mike?

> > +		if (!node_possible(node)) {
> > +			node = find_next_bit_wrap(node_possible_map.bits,
> > +						  nr_node_ids, node);
> > +		}
> > +	}
> > +
> > +	return &futex_queues[node][hash & (futex_hashsize - 1)];
> >  }
> >  	fshared = flags & FLAGS_SHARED;
> > +	size = futex_size(flags);
> >  
> >  	/*
> >  	 * The futex address must be "naturally" aligned.
> >  	 */
> >  	key->both.offset = address % PAGE_SIZE;
> > -	if (unlikely((address % sizeof(u32)) != 0))
> > +	if (unlikely((address % size) != 0))
> >  		return -EINVAL;
> 
> Hmm. Shouldn't that have changed with the allowance of the 1 and 2 byte
> futexes?

That patches comes after this.. :-)

But I do have an open question here; do we want FUTEX2_NUMA futexes
aligned at futex_size or double that? That is, what do we want the
alignment of:

struct futex_numa_32 {
	u32 val;
	u32 node;
};

to be? Having that u64 aligned will guarantee these two values end up in
the same page, having them u32 aligned (as per this patch) allows for
them to be split.

The current paths don't care, we don't hold locks, but perhaps it makes
sense to be conservative.

> >  	address -= key->both.offset;
> >  
> > -	if (unlikely(!access_ok(uaddr, sizeof(u32))))
> > +	if (flags & FLAGS_NUMA)
> > +		size *= 2;
> > +
> > +	if (unlikely(!access_ok(uaddr, size)))
> >  		return -EFAULT;
> >  
> >  	if (unlikely(should_fail_futex(fshared)))
> >  		return -EFAULT;
> >  
> > +	key->both.node = -1;
> 
> Please put this into an else path.

Can do, but I figured the compiler could figure it out through dead
store elimitation or somesuch pass.

> > +	if (flags & FLAGS_NUMA) {
> > +		void __user *naddr = uaddr + size/2;
> 
> size / 2;
> 
> > +
> > +		if (futex_get_value(&node, naddr, flags))
> > +			return -EFAULT;
> > +
> > +		if (node == -1) {
> > +			node = numa_node_id();
> > +			if (futex_put_value(node, naddr, flags))
> > +				return -EFAULT;
> > +		}
> > +
> > +		if (node >= MAX_NUMNODES || !node_possible(node))
> > +			return -EINVAL;
> 
> That's clearly an else path too. No point in checking whether
> numa_node_id() is valid.

No, this also checks if the value we read from userspace is valid.

Only when the value we read from userspace is -1 do we set
numa_node_id(), otherwise we take the value as read, which then must be
a valid value.

> > +		key->both.node = node;
> > +	}
> >  
> > +static inline unsigned int futex_size(unsigned int flags)
> > +{
> > +	unsigned int size = flags & FLAGS_SIZE_MASK;
> > +	return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */
> > +}
> > +
> >  static inline bool futex_flags_valid(unsigned int flags)
> >  {
> >  	/* Only 64bit futexes for 64bit code */
> > @@ -77,13 +83,19 @@ static inline bool futex_flags_valid(uns
> >  	if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32)
> >  		return false;
> >  
> > -	return true;
> > -}
> > +	/*
> > +	 * Must be able to represent both NUMA_NO_NODE and every valid nodeid
> > +	 * in a futex word.
> > +	 */
> > +	if (flags & FLAGS_NUMA) {
> > +		int bits = 8 * futex_size(flags);
> > +		u64 max = ~0ULL;
> > +		max >>= 64 - bits;
> Your newline key is broken, right?





Yes :-)

> > +		if (nr_node_ids >= max)
> > +			return false;
> > +	}


  reply	other threads:[~2023-07-31 18:03 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 10:22 [PATCH v1 00/14] futex: More futex2 bits Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 01/14] futex: Clarify FUTEX2 flags Peter Zijlstra
2023-07-31 16:08   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 02/14] futex: Extend the " Peter Zijlstra
2023-07-21 15:47   ` Arnd Bergmann
2023-07-21 18:52     ` Peter Zijlstra
2023-07-31 16:11   ` Thomas Gleixner
2023-07-31 16:25     ` Peter Zijlstra
2023-07-31 17:16     ` Thomas Gleixner
2023-07-31 17:35       ` Peter Zijlstra
2023-07-31 20:52         ` Thomas Gleixner
2023-07-31 17:42     ` Thomas Gleixner
2023-07-31 19:20       ` Peter Zijlstra
2023-07-31 21:14         ` Thomas Gleixner
2023-07-31 21:33           ` Peter Zijlstra
2023-07-31 22:43             ` Thomas Gleixner
2023-07-31 22:59               ` Peter Zijlstra
2023-08-01  8:49                 ` Thomas Gleixner
2023-08-01  6:02               ` Arnd Bergmann
2023-07-21 10:22 ` [PATCH v1 03/14] futex: Flag conversion Peter Zijlstra
2023-07-31 16:21   ` Thomas Gleixner
2023-07-31 16:26     ` Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 04/14] futex: Validate futex value against futex size Peter Zijlstra
2023-07-31 17:12   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 05/14] futex: Add sys_futex_wake() Peter Zijlstra
2023-07-21 15:41   ` Arnd Bergmann
2023-07-21 18:54     ` Peter Zijlstra
2023-07-21 21:23       ` Arnd Bergmann
2023-07-25  7:22   ` Geert Uytterhoeven
2023-07-21 10:22 ` [PATCH v1 06/14] futex: Add sys_futex_wait() Peter Zijlstra
2023-07-25  7:22   ` Geert Uytterhoeven
2023-07-31 16:35   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 07/14] futex: Propagate flags into get_futex_key() Peter Zijlstra
2023-07-31 16:36   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 08/14] futex: Add flags2 argument to futex_requeue() Peter Zijlstra
2023-07-31 16:43   ` Thomas Gleixner
2023-07-21 10:22 ` [PATCH v1 09/14] futex: Add sys_futex_requeue() Peter Zijlstra
2023-07-25  7:23   ` Geert Uytterhoeven
2023-07-31 17:19   ` Thomas Gleixner
2023-07-31 17:38     ` Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 10/14] mm: Add vmalloc_huge_node() Peter Zijlstra
2023-07-24 13:46   ` Christoph Hellwig
2023-07-21 10:22 ` [PATCH v1 11/14] futex: Implement FUTEX2_NUMA Peter Zijlstra
2023-07-21 12:16   ` Peter Zijlstra
2023-07-31 17:36   ` Thomas Gleixner
2023-07-31 18:03     ` Peter Zijlstra [this message]
2023-07-31 21:26       ` Thomas Gleixner
2024-06-12 17:07       ` Christoph Lameter (Ampere)
2024-06-12 17:23   ` Christoph Lameter (Ampere)
2024-06-12 17:44     ` Peter Zijlstra
2024-10-25  8:58     ` Peter Zijlstra
2024-10-25 19:36       ` Christoph Lameter (Ampere)
2024-10-26  7:21         ` Peter Zijlstra
2024-10-28 22:32           ` Christoph Lameter (Ampere)
2023-07-21 10:22 ` [PATCH v1 12/14] futex: Propagate flags into futex_get_value_locked() Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 13/14] futex: Enable FUTEX2_{8,16} Peter Zijlstra
2023-07-21 10:22 ` [PATCH v1 14/14] futex,selftests: Extend the futex selftests Peter Zijlstra
2023-07-21 14:42 ` [PATCH v1 00/14] futex: More futex2 bits Jens Axboe
2023-07-21 15:49 ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230731180320.GR29590@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrealmeid@igalia.com \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=hch@infradead.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=malteskarupke@web.de \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox