From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB67CC0218A for ; Tue, 28 Jan 2025 23:51:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01F28280254; Tue, 28 Jan 2025 18:51:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F12532801D5; Tue, 28 Jan 2025 18:51:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD9A4280254; Tue, 28 Jan 2025 18:51:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BFF432801D5 for ; Tue, 28 Jan 2025 18:51:21 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 41665A01C8 for ; Tue, 28 Jan 2025 23:51:21 +0000 (UTC) X-FDA: 83058509562.27.CE408C1 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 52A972000A for ; Tue, 28 Jan 2025 23:51:19 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rQLGM89h; spf=pass (imf03.hostedemail.com: domain of surenb@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738108279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kl4Kh1XmXL6YLYN/cMAp2FCU6rLrvAZ6GRS31Vuy+jM=; b=3bW/Ke94KZY+U+Yo4q/iz4uqdsWLXu2rz8bJwt5a8FN1ULn+KPeKZlAa721+rAxnj4jSsn a7k2qiYuDXvmaF3T77h2Jl/1bdctbwinkFNq2PDTYlEo9EsoElrIChMkgvimmVdHMGFXkr +uf7cidIrczVE5dZRn7P3N0jGVeTfjg= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rQLGM89h; spf=pass (imf03.hostedemail.com: domain of surenb@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738108279; a=rsa-sha256; cv=none; b=NH/+PFx1IDOMmLPkgEPO3cxv1uiH7dA84ORTkVVVwdwkhvcUixOkSz4HnvXJaAXtzqsO4y h5sd5fF2eyZtQ9qyt7ioMuN3tJG6/DLV8XlzNCqMsteHpdfUwt0mex/g7VXrsVvsyP9l/s tkb0Sp63zojYsO/Of6GYcuS4Latp6Vc= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-4678c9310afso376951cf.1 for ; Tue, 28 Jan 2025 15:51:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738108278; x=1738713078; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kl4Kh1XmXL6YLYN/cMAp2FCU6rLrvAZ6GRS31Vuy+jM=; b=rQLGM89hCbKq5AZRR3fAy8/4pO0/UpD6+J/Xg+jnDQi2h9Sw18EL0F5RHPuvZ7tQgY r7zt6Ikp0BaJ0AwdEK9+CLYhbNIRp4k+lTCFbdInsWqqWGANFyo5B7q2IG+3+5snzE08 MwIfi3GbnfqIDqjrSJZAITuGnyU7wksRN1au66o8r+lxxXBkrP3nVDj8FZFe5/MV2Rr3 w0kcy8kT3W1xOA+ab0YIgvfvwUsebwGa8N5Umb//FJQyizaHTWBHXnafDG1EEhE7caEN iqAPAdDe2GHRWKJUtvPE5HkEDy1U+sTyfAUgBAn0LiI7P59mWyzSOiEjw66Kyp1LEXX/ scQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738108278; x=1738713078; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kl4Kh1XmXL6YLYN/cMAp2FCU6rLrvAZ6GRS31Vuy+jM=; b=viEElOTvQda4pIysA8HXkzSApA7zeR5fafKMV1Tr88m4GTHgtnhPy94h+Xb9NUxvu3 vFi/F2A4SN79BhzayXiW/SYP4kfuJ67mCoLWQeULbXw7W2XdUQAeu5C2GmEYLM/TeH6j lRWCbhkRid1CSJ1m+dE6ArUY0TdYdj3mVVRKfbdMV9OUcI3N7+pMH2a4mrAaZbkP+E5G GMEJ9s2Rm7fJeTE9E2Y6FG73i+3RaM0C8uJ7gkzMxjxTccIfXj6Z2hYh7wBuGN7MXii2 aUZ5qcqBfTwSDtvXPjtJh6FCbC+OqQMJ4RG5NCc3MhAKshRCD3+jiikNvYirtp+qXZPd OxjA== X-Forwarded-Encrypted: i=1; AJvYcCUVmRjb5is13P+zNe+Omv74nYe4WwQSI9XmGPfhoEZoYN6lC4u6sbpCfGaHFT8bP/KaRQWz2KKzag==@kvack.org X-Gm-Message-State: AOJu0Yw/ZJMYwj7tfafQyQvI8XfDOgy+hM9f80N69zCU7rMaZgVNv8DH 6vTp6/Dr/MfXBBQJt2oZIkUU6c3w6HoVLL2GawzkekXqFhR83/DqvyCiJSzGAaW8U51Rhgz7B1l ZlnWohby6OINtW6nV82eAAffgkLGdHG/ug4k9 X-Gm-Gg: ASbGnct6TIrMfL9MEgthUAuJzftvDSB8FVkCHMC9Gf5eK5+eVI8NyxDUxLsjPHUb8ic KltDuJZF/vxU6H4o9h3wBQy2PNQir0G5JG0ynATgmnBWeQk5L4EDDeNtJ9UoRwQ5utb6FD9cO X-Google-Smtp-Source: AGHT+IH75aBnT6Rb5gLzj/2/ikYOgCcFQHvEjwTDW/1OZF7lwMWzYt15uQ15SgABltSFxq/hMYcIs8PtixQIGV+891k= X-Received: by 2002:ac8:5d0a:0:b0:467:82de:d949 with SMTP id d75a77b69052e-46fd11dae2fmr1298441cf.12.1738108278050; Tue, 28 Jan 2025 15:51:18 -0800 (PST) MIME-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> <20250115104841.GX5388@noisy.programming.kicks-ass.net> <20250115111334.GE8385@noisy.programming.kicks-ass.net> <20250115160011.GG8385@noisy.programming.kicks-ass.net> <20250117154135.GA17569@willie-the-truck> <20250127140915.GA25672@willie-the-truck> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 28 Jan 2025 15:51:06 -0800 X-Gm-Features: AWEUYZlCMYHKB8DvzSXtWCnYOPNaQvEW7O7rWbgjDvJcq_SqFe7Mu3LpbjD1Qmw Message-ID: Subject: Re: [PATCH] refcount: Strengthen inc_not_zero() To: Will Deacon Cc: Peter Zijlstra , boqun.feng@gmail.com, mark.rutland@arm.com, Mateusz Guzik , akpm@linux-foundation.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 52A972000A X-Stat-Signature: 4aw3wou6rup1jnnpc3qp3u3xfooeixts X-Rspam-User: X-HE-Tag: 1738108279-984687 X-HE-Meta: U2FsdGVkX18jxCJeimY3qve3tDDbotks3Axme5q9OwmKLredtTxXCA4rwQKN2YH9ifU6RA6/lOSDvipgq7DHeS9Dr4GP4SZH2wSupE1M8V6638OkZT6jrWF1g/GiL4ITfI2MOpLYVLizx/FKsOLs7VvmOO5IxeQEgt9xXQ7B0dNK2qmaB5Kvi3YM9GpkrY/qhiWY0GGloUcyAV9xTHIvAslEFRGlRmpXzMEf5cs5odKHJy9hUZ33bcMRkDHw5ndvEvYqR63yqVV4EJ+7MhBBWsjILtzyBULmt6O2W0FmpnHS//ykxxLDaJnp7AolF6Mxbb3Hg+qUuA5ND5liXqyXvcl2POtrWm0WHSMRIBKkPE0juv6hMOlHTtBrtKnBjae21TmFpZbZtBqNCfsDCRrHrrggRYjUcRTX6NQfz2jVA3HA18eMyu3vgofMNchyiaQeBqtJblttlVt4XgPBjTlu15gzYm1QTtJnK+dyiStJNw/rRK9HFUc2PkCAu3P2M16ewxIGpVfF36KuOc/MvsAUStpsx+JoWl0PB1YZy0QuYv3kjR7t+/tf6ST1ySVJcNSBYnjXW/snmpD7vi+rbU40gPKcGqf7wRUnxt22V5gmtnr9YpSLzunY0vK/yDATfK0GyEwuR3qrePontERTuYZ3MpoYWdTmdQFlEjQxsvO/4zSuJbQnMc5Qi3IO0GYWZCVk8jlzFdA5hmD6UaFhabM6APxMwd4COUwI00mMf1HtPC7R4JMJD04lrkVWxpEkW7R3PUecMCTpSeonB+A5FyBuDjSAcAexYpk3RwudqbPIv7CZW8G9kyHPQ7qZ9zjVdwZU/M0YnnDizwAkD9+oYI+ITRPlwJv1g8hr/hDyrSwZ/FDo8RmzLX0M2gK3YejeyBy7yElccRrS3lumqtc8pKPKRGry7U0HJ2yPK3Ik4HrbFVn6qqNu/zSRAIPSob9avYzxBPAsVSF0ZfpuCM5p5Z3 U9DtJH5R AC8Oh3/v7S7QZvWjPNz0tlCGAfTrikVcxVNYkDm8Za/ecdcxBza34bEi3C1WYJjuMgqjVzFRBswrVNMRWN1VI3hUGowzPWhGzh5O4Z7h3M8XPx+Qwn16EKPQzC1LUXgS9E4Gc6c1Wold+OJKehjl3iR+Whc4r/7/epfqlqhBOTLhm/+sH6R8aqJvJkzNtka3GBpxMx26RFbJzKbQp1ElbU07OwmqEdjeXps+hookpSY5XHCdz+QbtJTZ6WMTSqQ7uVhY34F1MxAOqIbVLjIUsO6ObH6ephMfL+Tgo9aujx1afsmdCH7t1D9Iho8+YsNuMQIV+80QmogZ5Rdq1Y3GANZR8NwrxkbN0ToJXILAnxXsIDWz37YXROgvgSw4JhSPyP7iz2WZf7i0MSezy3JCZjHtLYTlvegrpBtAjmw/l1dJRCm3P/2thfDVwA8GrT/TlPWG65ZIMLSicjEuCM4xUwsV/MeqI6CipBwik X-Bogosity: Ham, tests=bogofilter, spamicity=0.000044, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 27, 2025 at 11:21=E2=80=AFAM Suren Baghdasaryan wrote: > > On Mon, Jan 27, 2025 at 6:09=E2=80=AFAM Will Deacon wro= te: > > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote: > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote: > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote: > > > > > > > > > Notably, it means refcount_t is entirely unsuitable for anything > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validati= on > > > > > conditions after the refcount succeeds. > > > > > > > > > > And this is probably fine, but let me ponder this all a little mo= re. > > > > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd b= etter > > > > fix this, these things are hard enough as they are. > > > > > > > > Will, others, wdyt? > > > > > > We should also update the Documentation (atomic_t.txt and > > > refcount-vs-atomic.rst) if we strengthen this. > > > > > > > --- > > > > Subject: refcount: Strengthen inc_not_zero() > > > > > > > > For speculative lookups where a successful inc_not_zero() pins the > > > > object, but where we still need to double check if the object acqui= red > > > > is indeed the one we set out to aquire, needs this validation to ha= ppen > > > > *after* the increment. > > > > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example. > > > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > > --- > > > > include/linux/refcount.h | 15 ++++++++------- > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h > > > > index 35f039ecb272..340e7ffa445e 100644 > > > > --- a/include/linux/refcount.h > > > > +++ b/include/linux/refcount.h > > > > @@ -69,9 +69,10 @@ > > > > * its the lock acquire, for RCU/lockless data structures its the = dependent > > > > * load. > > > > * > > > > - * Do note that inc_not_zero() provides a control dependency which= will order > > > > - * future stores against the inc, this ensures we'll never modify = the object > > > > - * if we did not in fact acquire a reference. > > > > + * Do note that inc_not_zero() does provide acquire order, which w= ill order > > > > + * future load and stores against the inc, this ensures all subseq= uent accesses > > > > + * are from this object and not anything previously occupying this= memory -- > > > > + * consider SLAB_TYPESAFE_BY_RCU. > > > > * > > > > * The decrements will provide release order, such that all the pr= ior loads and > > > > * stores will be issued before, it also provides a control depend= ency, which > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t = *r, int *oldp) > > > > do { > > > > if (!old) > > > > break; > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i)); > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i)); > > > > > > Hmm, do the later memory accesses need to be ordered against the stor= e > > > part of the increment or just the read? If it's the former, then I do= n't > > > think that _acquire is sufficient -- accesses can still get in-betwee= n > > > the read and write parts of the RmW. > > > > I dug some more into this at the end of last week. For the > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with > > dec_and_test(), then I think using _acquire() above is correct as the > > later references can only move up into the critical section in the case > > that we successfully obtained a reference. > > > > However, if we're going to make the barriers implicit in the refcount > > operations here then I think we also need to do something on the produc= er > > side for when the object is re-initialised after being destroyed and > > allocated again. I think that would necessitate release ordering for > > refcount_set() so that whatever allows the consumer to validate the > > object (e.g. sequence number) is published *before* the refcount. > > Thanks Will! > I would like to expand on your answer to provide an example of the > race that would happen without release ordering in the producer. To > save reader's time I provide a simplified flow and reasoning first. > More detailed code of what I'm considering a typical > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my > reply (Addendum). > Simplified flow looks like this: > > consumer: > obj =3D lookup(collection, key); > if (!refcount_inc_not_zero(&obj->ref)) > return; > smp_rmb(); /* Peter's new acquire fence */ > if (READ_ONCE(obj->key) !=3D key) { > put_ref(obj); > return; > } > use(obj->value); > > producer: > old_key =3D obj->key; > remove(collection, old_key); > if (!refcount_dec_and_test(&obj->ref)) > return; > obj->key =3D KEY_INVALID; > free(objj); > ... > obj =3D malloc(); /* obj is reused */ > obj->key =3D new_key; > obj->value =3D new_value; > smp_wmb(); /* Will's proposed release fence */ > refcount_set(obj->ref, 1); > insert(collection, key, obj); > > Let's consider a case when new_key =3D=3D old_key. Will call both of them > "key". Without WIll's proposed fence the following reordering is > possible: > > consumer: > obj =3D lookup(collection, key); > > producer: > key =3D obj->key > remove(collection, key); > if (!refcount_dec_and_test(&obj->ref)) > return; > obj->key =3D KEY_INVALID; > free(objj); > obj =3D malloc(); /* obj is reused */ > refcount_set(obj->ref, 1); > obj->key =3D key; /* same key */ > > if (!refcount_inc_not_zero(&obj->ref)) > return; > smp_rmb(); > if (READ_ONCE(obj->key) !=3D key) { > put_ref(obj); > return; > } > use(obj->value); > > obj->value =3D new_value; /* reordered store */ > add(collection, key, obj); > > So, the consumer finds the old object, successfully takes a refcount > and validates the key. It succeeds because the object is allocated and > has the same key, which is fine. However it proceeds to use stale > obj->value. Will's proposed release ordering would prevent that. > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/li= nux/slab.h#L102 > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU. > I think it would be better to introduce two new functions: > refcount_add_not_zero_acquire() and refcount_set_release(), clearly > document that they should be used when a freed object can be recycled > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release() > should also clarify that once it's called, the object can be accessed > by consumers even if it was not added yet into the collection used for > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/= slab.h#L102 > then can explicitly use these new functions in the example code, > further clarifying their purpose and proper use. > WDYT? Hi Peter, Should I take a stab at preparing a patch to add the two new refcounting functions suggested above with updates to the documentation and comments? If you disagree with that or need more time to decide then I'll wait. Please let me know. Thanks, Suren. > > ADDENDUM. > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU: > > struct object { > refcount_t ref; > u64 key; > u64 value; > }; > > void init(struct object *obj, u64 key, u64 value) > { > obj->key =3D key; > obj->value =3D value; > smp_wmb(); /* Will's proposed release fence */ > refcount_set(obj->ref, 1); > } > > bool get_ref(struct object *obj, u64 key) > { > if (!refcount_inc_not_zero(&obj->ref)) > return false; > smp_rmb(); /* Peter's new acquire fence */ > if (READ_ONCE(obj->key) !=3D key) { > put_ref(obj); > return false; > } > return true; > } > > void put_ref(struct object *obj) > { > if (!refcount_dec_and_test(&obj->ref)) > return; > obj->key =3D KEY_INVALID; > free(obj); > } > > consumer: > obj =3D lookup(collection, key); > if (!get_ref(obj, key) > return; > use(obj->value); > > producer: > remove(collection, old_obj->key); > put_ref(old_obj); > new_obj =3D malloc(); > init(new_obj, new_key, new_value); > insert(collection, new_key, new_obj); > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be > equal to new_obj. > > > > > > Will