From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C792C02188 for ; Mon, 27 Jan 2025 14:09:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F31928015E; Mon, 27 Jan 2025 09:09:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A3A528015C; Mon, 27 Jan 2025 09:09:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66B6F28015E; Mon, 27 Jan 2025 09:09:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 459D028015C for ; Mon, 27 Jan 2025 09:09:29 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E3F5043C75 for ; Mon, 27 Jan 2025 14:09:28 +0000 (UTC) X-FDA: 83053414416.30.D010ED3 Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf25.hostedemail.com (Postfix) with ESMTP id 3BDD5A000C for ; Mon, 27 Jan 2025 14:09:27 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PsYVb50t; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf25.hostedemail.com: domain of will@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=will@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737986967; a=rsa-sha256; cv=none; b=mPFaRtB1cDoYaG0xmZVe73AcK/ICUYsN90PUEbGFk4P6D94p9opyhlc2t5+NT12OCRSdQq bNsgIYMZZ/063WeegPNGjA0QVTC+RDK11axdWjHU8goMRCvHWTphfXZjqe2HQY/RSo52m7 wazTnnFTES/LEfZoF/eoqhHsfP5B5Mc= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PsYVb50t; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf25.hostedemail.com: domain of will@kernel.org designates 147.75.193.91 as permitted sender) smtp.mailfrom=will@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737986967; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9hUtu3rqL7lpkIVcNhlxI3bj9lSJ5ZWO/c2KL9V8HMw=; b=FVyS2crwkgkJYagz/aSAjtI6kRSf5uJEcXgVyJTS02MtZH3f3vyXRsmzuM1o5xJ3EiBS1k DtPyJ8BpfzMNNgMnhC88Ph/xyUacVECKikh3LtWP2mjzaosGZw7J+eDbcIbwzVuSxnEZLZ trL3XoK5DCh0PihJfnmtAsqtRc0BBrA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 6D863A41615; Mon, 27 Jan 2025 14:07:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2894EC4CED2; Mon, 27 Jan 2025 14:09:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1737986966; bh=Rz6meftNFWIKeSBHX9X0i9FyWdEWeAzsHOwEmNXnLVo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PsYVb50tmoIzHQ8LPsNtS6QOOFCTm7a45GJ+NKRDdX9fNeEdF2ssy+DhgJVjZ5Na5 dHcLhlj78j5Uxup6pN+BaZW7kDnoL3bjE4Dm8IYFh+EFlo8ytKigkZsxxEkqWetoc2 ImJSh2tqlyO2CDXVfwfHk76IFlf8oSj7Tr0zqVpQ0QmDaz6rzskGEglR+u1qfzuZXO bgRpsOZyB5GHnqjSV7F/g3Hit7qNCm6x4p9bnP20pT2qAg9y82dPWuGxOjlePGGk9n +FQi3j3kTTaZ+h+Ix9z4/diG4SaucBACHB2YO8xh3fMNpn49SQ3nlT+e6W6PX7Ze9a IiOcuPshSze5w== Date: Mon, 27 Jan 2025 14:09:16 +0000 From: Will Deacon To: Peter Zijlstra Cc: Suren Baghdasaryan , boqun.feng@gmail.com, mark.rutland@arm.com, Mateusz Guzik , akpm@linux-foundation.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH] refcount: Strengthen inc_not_zero() Message-ID: <20250127140915.GA25672@willie-the-truck> References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> <20250115104841.GX5388@noisy.programming.kicks-ass.net> <20250115111334.GE8385@noisy.programming.kicks-ass.net> <20250115160011.GG8385@noisy.programming.kicks-ass.net> <20250117154135.GA17569@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250117154135.GA17569@willie-the-truck> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspam-User: X-Rspamd-Queue-Id: 3BDD5A000C X-Rspamd-Server: rspam10 X-Stat-Signature: sxfw5xys9wkzzic1agz5he1koz7xdf4c X-HE-Tag: 1737986967-610072 X-HE-Meta: U2FsdGVkX181q0Hq+W/ibWlA/wjafwD5LRhDzH/yNnBONgMF6hkajF4IZMEurhe+Gm5k8quC4FhWnyt6QDxED/WGZDgihpSFYdOXNWztgzPTv6bAteqVU9HdYioCo79wNSFHWWS4ThDPYkQH6P2ATKORQj6ZrSxl3Rd8OeEGOF6J4sZviLybWBipqyZaFqcSrGmAiRXWm4ZANhz1JSvc/sHFEa2R17WFaQNBU+yZe0e2Cf7xMHeHwDN3Pxhu+6UZinrgLFkpjGmlOzxVIKN+A/PTK5WntGFjY/hlmqRTPNzMIje9ENgHBDiLzSFdZ/GfJ8Kb4lwyiw4fEB4I/ExuaPzeCZ8CfX8ifD2cptYYPCVw4pS7Kt0hp3ECcE1PS5ZgVRL8yK7DQWFmtN0TzewOTicb2E0wnzydBDcmtUS4dE2vHhee6rybRxY63ELCbgZpd1kLevSdr5og92cFGYWxNeBJNcMySVZByQ+c88WLPvrQtymhiN+MkslPytf6tOOoWB0TFKNRZ0814TPXOcyEla4TTwafTjpyqKgSWEO1VSe/TgnTUZkDy36h7Z8zZUnItpD0fdA6roBQc4SYMF90zr0lZmjBqLMYX2SrK+JwcpN+qL0mTpFwWaJacYoc6urVahA+xg3Y3ohIMS45owCkG8gT2cV9OiFFqnvYIE53KObyKg3QjDGyj71CBX2wNJ+YoiY7yOeIBYa6R/fADViBaB3SChPD78No0tz3TTbET2Nqq51lmUCSfaG0WI5AsscaFu6mfO4qo/556RbkZ7XCrIaSChGkMjuxPzQU43yspQDBimxAut7O2yHCmBwaUMIHsbkhFYNfEWVfr+baxxfREBOFB7l9km7MnTt6hYb3yQOK2D+I4llbUodgC1YT8donKZSEzY2w5ARN2P3COYVgQSp24CeKqQtQ2BVx1I0XyUZCLOiO/INDKYdySkn8JnTfsMpS56mmAbtv5LrMe23 5aY5iRZ7 QVIUVtd8SuuIteXuWEojDf6doEh+BqBIEXzCgIQnnmeoNXUAgku9zUEWIsKxasntt9LdRMl2GRGjNVHtPAHM2n/A1U35CIsqXeecg2A+8dirkytI86IXwMy3SGaarmmQ/cF58cycuv5iMTfkkA2+1QIKzLOJwNetIClFG3nif8NvdHV7gTmSfFJqVFwimzT6kMhrK6gyhEKwhVwOoOL8d3pTAEt96Y38ayRDU7HVliClSPSSxRs1Q2jHiLw/8ju1LKcKGE5TnyR+0ejaWgUCZly8bd+Lnucsj5urQ2zT4q6+DoJ7oSW+mGhCnJPRrf+SKMZg+UGNbu96weV7bI6q5/qw6v+MzC0o3FmchMlvTzwMsON/vjyfD4Siog7ggRx89fNVcilh5nNJLP+I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000010, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote: > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote: > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote: > > > > > Notably, it means refcount_t is entirely unsuitable for anything > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation > > > conditions after the refcount succeeds. > > > > > > And this is probably fine, but let me ponder this all a little more. > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better > > fix this, these things are hard enough as they are. > > > > Will, others, wdyt? > > We should also update the Documentation (atomic_t.txt and > refcount-vs-atomic.rst) if we strengthen this. > > > --- > > Subject: refcount: Strengthen inc_not_zero() > > > > For speculative lookups where a successful inc_not_zero() pins the > > object, but where we still need to double check if the object acquired > > is indeed the one we set out to aquire, needs this validation to happen > > *after* the increment. > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example. > > > > Signed-off-by: Peter Zijlstra (Intel) > > --- > > include/linux/refcount.h | 15 ++++++++------- > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h > > index 35f039ecb272..340e7ffa445e 100644 > > --- a/include/linux/refcount.h > > +++ b/include/linux/refcount.h > > @@ -69,9 +69,10 @@ > > * its the lock acquire, for RCU/lockless data structures its the dependent > > * load. > > * > > - * Do note that inc_not_zero() provides a control dependency which will order > > - * future stores against the inc, this ensures we'll never modify the object > > - * if we did not in fact acquire a reference. > > + * Do note that inc_not_zero() does provide acquire order, which will order > > + * future load and stores against the inc, this ensures all subsequent accesses > > + * are from this object and not anything previously occupying this memory -- > > + * consider SLAB_TYPESAFE_BY_RCU. > > * > > * The decrements will provide release order, such that all the prior loads and > > * stores will be issued before, it also provides a control dependency, which > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp) > > do { > > if (!old) > > break; > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i)); > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i)); > > Hmm, do the later memory accesses need to be ordered against the store > part of the increment or just the read? If it's the former, then I don't > think that _acquire is sufficient -- accesses can still get in-between > the read and write parts of the RmW. I dug some more into this at the end of last week. For the SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with dec_and_test(), then I think using _acquire() above is correct as the later references can only move up into the critical section in the case that we successfully obtained a reference. However, if we're going to make the barriers implicit in the refcount operations here then I think we also need to do something on the producer side for when the object is re-initialised after being destroyed and allocated again. I think that would necessitate release ordering for refcount_set() so that whatever allows the consumer to validate the object (e.g. sequence number) is published *before* the refcount. Will