From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 575E3C00528 for ; Mon, 31 Jul 2023 18:03:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE413280094; Mon, 31 Jul 2023 14:03:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E947228007A; Mon, 31 Jul 2023 14:03:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D83A7280094; Mon, 31 Jul 2023 14:03:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C6D4928007A for ; Mon, 31 Jul 2023 14:03:32 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9394E140B4B for ; Mon, 31 Jul 2023 18:03:32 +0000 (UTC) X-FDA: 81072679464.11.088030B Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) by imf02.hostedemail.com (Postfix) with ESMTP id D1E8880033 for ; Mon, 31 Jul 2023 18:03:29 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="LDp2D+a/"; spf=none (imf02.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690826610; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b6B0HYLtewPCtZDDCJN1W60jRqsHU7Vz+hikYlsFepk=; b=bBTNxsHae9y3IkpsXRf+mJQ4xjyq5O+JPUk4WpycAgWtrIYwlXKAaVIOW+bVDPwHvSaPVw ktsti58VnavcZ6Bz7SOhjGYbrx6p3s7zskFuXnOr7Lx/tq5Y+4ig0OwF6BxAJPfcFeGA4s B4lR85zsfU0Xs/Hrlq14beDwuMF8pq4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690826610; a=rsa-sha256; cv=none; b=3bB4ni3VeEoleJCaIZvufWd1KANNpdfubhr1KpZMH7GUbxIlxkltd5UQmW5ePPMxDC11ZT lSvh/ONSLnKFW3z9Ink3GUwgXFMhOPClaPH0uEOZzO80ZVMh+qkH0HEFRrI5QUFvfNGo+c JYG/cwk3piAf/XQh6vmP7VZF7Vn/w/c= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=desiato.20200630 header.b="LDp2D+a/"; spf=none (imf02.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.92.199) smtp.mailfrom=peterz@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=b6B0HYLtewPCtZDDCJN1W60jRqsHU7Vz+hikYlsFepk=; b=LDp2D+a/MfR8lAfnFHyEg+mUr/ xb62Gcg8pWH0vgh9mpgWeSx5CU331hvlCcVOaPnUFxDYwOvYdStT2E6Yt5YSdDGdIL5l5kJOBcwxF 6/7KvPdY9qjKUlIV8B7y2z8MUWHAMT9km+Pw8vRvB5OxmFwNnCXxCqlMcQFTKN9eJhPDAvYuSknBq GrisXn87vqQcxfQ1Oj0kjXT5xa1f/LD1UtlEhydVDEZd39dtqKIN7U2/3jt7cypLEvUjFQ3nv1XEl slZBt7y/iPYk3x/+mBXzVWQrhI31I8FXkI1DwYq1xnMWbmX+X8lFFK47ptN74E3L5sIZsySQsDbXn KDGjf81w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qQXEn-00CzRm-0i; Mon, 31 Jul 2023 18:03:21 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id A4E333002CE; Mon, 31 Jul 2023 20:03:20 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 88956203C242E; Mon, 31 Jul 2023 20:03:20 +0200 (CEST) Date: Mon, 31 Jul 2023 20:03:20 +0200 From: Peter Zijlstra To: Thomas Gleixner Cc: axboe@kernel.dk, linux-kernel@vger.kernel.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: Re: [PATCH v1 11/14] futex: Implement FUTEX2_NUMA Message-ID: <20230731180320.GR29590@hirez.programming.kicks-ass.net> References: <20230721102237.268073801@infradead.org> <20230721105744.434742902@infradead.org> <87pm48m19m.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87pm48m19m.ffs@tglx> X-Rspamd-Queue-Id: D1E8880033 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: jffs7rw1nxtmjc64btjgszk8tn1m1i5i X-HE-Tag: 1690826609-494720 X-HE-Meta: U2FsdGVkX19E/dcFjG07GGRz65UWGZZDvbaoJIu7oNJdGIBC6IHNG5l0HblDtwP/V/3h5Q1V+d/OSltMEBPKFMzRlAM3zJZ3YbxAvewqrC/OkCv14oW7ge+hPxof4cZHxNQ8gTwGOUlNbnq4samHGKqKKWRxNiXN2oAR4m/LYZdoz0ZPLVfJeBj/4i4WSgR0mwdVl2QZkkRcoaYcgj3bm4gk9PopyATaLEhkVGyKI4LqiXVp2IHx9iueqUV0z8bFfgpA6CwW70AwxO+ZlYBRZyEoex5z0eh6V0IWfecS1CMcDgcKeiEkhQ7hImOKu883AAuGNRx3Ak+ZfQX9z54Twr0ydkKCwDgw4V9U35ktpiqLwKiLbdgub8p3dICLWsgSCSKmJeLOkVvClTynkMd5rTA9q0FOOFlJ75Bq7ys1gY8gUvjQQ373mBX/Bzd9Sg7BcOrINsiz6QHgsHmN7ZRFdSL8Fc5faTZtgxXclR3Mdy0g6LXdbg3hTILgjZVwZ/eAt9MRCnJSejupgAMiOKpjAjn4cdtOZXsXeKD03RoxUCraGw1EaMjTDimVv0nchG5wZdKK5IE0dyFB2Z177KfKWGxWBmSJes1ssd+DAKc7DFEqwFJqz4iTKBIzAd/Gqi/ZakR/zXs5EXrNpTbgv1J0BdC6dfGtjCQNekFzebraKqQGa9DIhZyLSBQr8pejj7aB+IOZ54Gi9jWAmvrlgEa1PlsWwgHP8MvbTXPBMUfCOK4YCpYhpaRo4v2hV3k/gNLUOyy2cqdaZ10L1rgeSp0fFwbnQ+Ilh7Zez75Knl8M5NFJlu9vPho3OM+pV7HDhPt5n1aXDqmLXKa5srnhVDPEA1xoAmxvVrzYTZjZQNlAr5ek5SWUVsExK52DIDiW3bdlbIokWcg5ndEM/unOA3nS191OKoa2feGHZjNpi8PajfzwIIefrw+AOqdYjXtOX1QDBbcskfjXF+VIXRuY0rY G7CTgWty Xh9Aj2cfOQF/w4ClbbB0T/Djosksa1rSzZwBoD11JeI9Av8eEBRxfH4CIJwKeiy0C160rPttAPRD2tWmKGoCHiBFR0u5rZItlaaavvPXZOlLkKIjevQG+mgZE4TVnac343b6QW6YtUCPSHxHulCGIcW7TldyhQqJQsKnJvBKU0+jlj2kZkrbUcQirr609TtTBzaD2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 31, 2023 at 07:36:21PM +0200, Thomas Gleixner wrote: > On Fri, Jul 21 2023 at 12:22, Peter Zijlstra wrote: > > struct futex_hash_bucket *futex_hash(union futex_key *key) > > { > > - u32 hash = jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, > > + u32 hash = jhash2((u32 *)key, > > + offsetof(typeof(*key), both.offset) / sizeof(u32), > > key->both.offset); > > + int node = key->both.node; > > > > - return &futex_queues[hash & (futex_hashsize - 1)]; > > + if (node == -1) { > > + /* > > + * In case of !FLAGS_NUMA, use some unused hash bits to pick a > > + * node -- this ensures regular futexes are interleaved across > > + * the nodes and avoids having to allocate multiple > > + * hash-tables. > > + * > > + * NOTE: this isn't perfectly uniform, but it is fast and > > + * handles sparse node masks. > > + */ > > + node = (hash >> futex_hashshift) % nr_node_ids; > > Is nr_node_ids guaranteed to be stable after init? It's marked > __read_mostly, but not __ro_after_init. AFAICT it is only ever written to in setup_nr_node_ids() and that is all __init code. So I'm thinking this could/should indeed be __ro_after_init. Esp. so since it is an exported variable. Mike? > > + if (!node_possible(node)) { > > + node = find_next_bit_wrap(node_possible_map.bits, > > + nr_node_ids, node); > > + } > > + } > > + > > + return &futex_queues[node][hash & (futex_hashsize - 1)]; > > } > > fshared = flags & FLAGS_SHARED; > > + size = futex_size(flags); > > > > /* > > * The futex address must be "naturally" aligned. > > */ > > key->both.offset = address % PAGE_SIZE; > > - if (unlikely((address % sizeof(u32)) != 0)) > > + if (unlikely((address % size) != 0)) > > return -EINVAL; > > Hmm. Shouldn't that have changed with the allowance of the 1 and 2 byte > futexes? That patches comes after this.. :-) But I do have an open question here; do we want FUTEX2_NUMA futexes aligned at futex_size or double that? That is, what do we want the alignment of: struct futex_numa_32 { u32 val; u32 node; }; to be? Having that u64 aligned will guarantee these two values end up in the same page, having them u32 aligned (as per this patch) allows for them to be split. The current paths don't care, we don't hold locks, but perhaps it makes sense to be conservative. > > address -= key->both.offset; > > > > - if (unlikely(!access_ok(uaddr, sizeof(u32)))) > > + if (flags & FLAGS_NUMA) > > + size *= 2; > > + > > + if (unlikely(!access_ok(uaddr, size))) > > return -EFAULT; > > > > if (unlikely(should_fail_futex(fshared))) > > return -EFAULT; > > > > + key->both.node = -1; > > Please put this into an else path. Can do, but I figured the compiler could figure it out through dead store elimitation or somesuch pass. > > + if (flags & FLAGS_NUMA) { > > + void __user *naddr = uaddr + size/2; > > size / 2; > > > + > > + if (futex_get_value(&node, naddr, flags)) > > + return -EFAULT; > > + > > + if (node == -1) { > > + node = numa_node_id(); > > + if (futex_put_value(node, naddr, flags)) > > + return -EFAULT; > > + } > > + > > + if (node >= MAX_NUMNODES || !node_possible(node)) > > + return -EINVAL; > > That's clearly an else path too. No point in checking whether > numa_node_id() is valid. No, this also checks if the value we read from userspace is valid. Only when the value we read from userspace is -1 do we set numa_node_id(), otherwise we take the value as read, which then must be a valid value. > > + key->both.node = node; > > + } > > > > +static inline unsigned int futex_size(unsigned int flags) > > +{ > > + unsigned int size = flags & FLAGS_SIZE_MASK; > > + return 1 << size; /* {0,1,2,3} -> {1,2,4,8} */ > > +} > > + > > static inline bool futex_flags_valid(unsigned int flags) > > { > > /* Only 64bit futexes for 64bit code */ > > @@ -77,13 +83,19 @@ static inline bool futex_flags_valid(uns > > if ((flags & FLAGS_SIZE_MASK) != FLAGS_SIZE_32) > > return false; > > > > - return true; > > -} > > + /* > > + * Must be able to represent both NUMA_NO_NODE and every valid nodeid > > + * in a futex word. > > + */ > > + if (flags & FLAGS_NUMA) { > > + int bits = 8 * futex_size(flags); > > + u64 max = ~0ULL; > > + max >>= 64 - bits; > Your newline key is broken, right? Yes :-) > > + if (nr_node_ids >= max) > > + return false; > > + }