From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7C6BC021A6 for ; Thu, 13 Feb 2025 23:04:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30429280019; Thu, 13 Feb 2025 18:04:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B3DA280018; Thu, 13 Feb 2025 18:04:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10512280019; Thu, 13 Feb 2025 18:04:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id DF3B5280018 for ; Thu, 13 Feb 2025 18:04:42 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7E223816E7 for ; Thu, 13 Feb 2025 23:04:42 +0000 (UTC) X-FDA: 83116452804.04.76D8C21 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf06.hostedemail.com (Postfix) with ESMTP id 8F23A18000B for ; Thu, 13 Feb 2025 23:04:40 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CVJcmHyz; spf=pass (imf06.hostedemail.com: domain of surenb@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739487880; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jBIR+bt56wtcSqA6PQ7lniFOC3/eEtZ264TeXuADiqg=; b=P79k0CQYjW7rMCVpbgO1maHSH4ZH6HzOtiILxCyz23fc9dLFJ89FmTAlscplZwfQGaFa90 0HusUSFd84GRKobOIaJhnIlf4eCvB0rATIHCBMQfK4VpncM37oXBfoM8okpvP5XAS7Fbcs Kvx0oreMMxsPdyjhsPki2I7eFwq5Qk8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=CVJcmHyz; spf=pass (imf06.hostedemail.com: domain of surenb@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739487880; a=rsa-sha256; cv=none; b=Y3fM8QY+LvcWIZB0Y6rKOJ1UWT1imwWiMvVvmpjm41UQwXovj/lX1PJGz65987WFIhidaR Zm5CIegksAkD3CwcKhkWoro6Ye6eMXEvBC9ZuyHns8oj2UHUiJh/5rmqFH46LFXxU+Lszp flvpVm6BHTg1tdJpqdokBpYuhhV7oZ8= Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-4718aea0718so118291cf.0 for ; Thu, 13 Feb 2025 15:04:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739487880; x=1740092680; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jBIR+bt56wtcSqA6PQ7lniFOC3/eEtZ264TeXuADiqg=; b=CVJcmHyzMW/JoXAY9l0opA0svL1vpK7q1q5QAoKA8uxDZsnauCDvsddxkje6AoGvoI I84+bxsgktERxwKinp6OUBApC5EbNccRGEF/cAB7oEmCOsFDbY9W5M+idDpHaulv9WUF TAmGdfp5elk6wTIni/sQOVBzHjvE9V2wZebuouG6WQlucZ6N06/KXv9vnl4IHDSYPm3Z U5zEBBXdZMgTBQJ0gft5ziGZ3zqX7LLvX/9CjRWR+rEUzl3xP+HDDIYP8DxZX/XEaL7R XYmmsP3aGn2L++W7jgKncs+pgHO2lt1PHzDAW9AI8e1Jeq+CYcw5DdKrltzNxhIGSZXj fsqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739487880; x=1740092680; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jBIR+bt56wtcSqA6PQ7lniFOC3/eEtZ264TeXuADiqg=; b=CgTqmO+XhCFM5wc4rB03AZH2j5+coWykOSxaBeaHxpsA+km6ulfZikzqqEE6Vj6sic V6aRukCy9l5OtvFWiOmn5Cdvbh7NaRUcijgjvFKWxLLPRcpROBgWJk+NdTuCvI08NzaT 9vyNcmu+ZHMLU8kLaSrsdQHMGFWjZfYQpkIgHTvNo3Az1aHACianrTYV6ZwmEWewrTzZ wvBJOAweuJskmRl/CLkKrOv54Lmnu1nLG5P2t1OE1txHGePZqz6P7Hg7pSXReLW99uWS iOE2XrPZVGAAgQkBG+A/vZhCmR1TsbuaOwoooK7MsaI6rPqZa4AUg2AOjyTuVxN0OJOf xPkA== X-Forwarded-Encrypted: i=1; AJvYcCUAugL9+QRGWvdSUr7S1+Vrtyku6YRre5Vm3EI+akIpElQzeE/g+wlhfmyU9BXCZYaDoy03dGu5zw==@kvack.org X-Gm-Message-State: AOJu0YzEjh8E226o4el5DWtKB4cTKyJWbSw3b96Iu6r2l4IJX8vwO6m3 1bdaK9VdyTGiHPRxv+zfkMn9QG+GlyXpkuVHJ7qeO3g3k6joFxnucEtjXW58hSG4tWKq6aX8dFb u1cEftBqVa58OayWCUII9E8Zn0KoLd0X0Q8RZ X-Gm-Gg: ASbGncsEIwVpV0cdW0zpMGLwoX33ZhRjoaB8oYl0HzJsE8Kh1mtvogCZ9A3mbGC+6f2 fzNej/uX0NtCI+TxABtBHj3PoiKTTo3yi9PxCnxLocv85XJOynBwm5okUwZbODVly5OPE+UIva7 s/QzobW88BGDyOaLOvsB0YGJIvvZk= X-Google-Smtp-Source: AGHT+IGf874ZRh0zerFODNt7RB8GYHE5ea29qNg8zPbOQBkNZFFwLpiSvmZMzlKwzTmGlhHoTCIIXejfb5EeHZEHOME= X-Received: by 2002:a05:622a:1191:b0:471:bf59:5f9f with SMTP id d75a77b69052e-471ce95eecfmr1150381cf.22.1739487879285; Thu, 13 Feb 2025 15:04:39 -0800 (PST) MIME-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> <20250115104841.GX5388@noisy.programming.kicks-ass.net> <20250115111334.GE8385@noisy.programming.kicks-ass.net> <20250115160011.GG8385@noisy.programming.kicks-ass.net> <20250117154135.GA17569@willie-the-truck> <20250127140915.GA25672@willie-the-truck> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 13 Feb 2025 15:04:28 -0800 X-Gm-Features: AWEUYZlCybXByrvagkzEiChzM_Vh_Z7U51Cwvq0lCWEUfWRGM0B5G4X0WxzOEd8 Message-ID: Subject: Re: [PATCH] refcount: Strengthen inc_not_zero() To: Will Deacon Cc: Peter Zijlstra , boqun.feng@gmail.com, mark.rutland@arm.com, Mateusz Guzik , akpm@linux-foundation.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 8F23A18000B X-Stat-Signature: 575jq6d8bjjx6usd7a4z8op1ci59rzqj X-HE-Tag: 1739487880-830426 X-HE-Meta: U2FsdGVkX1+lqfSMutyyPuBkK+rVYpA2KG3rMeV1MH7Yoa7jHyHcLGVxPdT99o+QxwPGSprDEYBktDMhQwZ8nYMcy/qZqMDRDsojPKutmaSRplnayYCbKmYkDlOcNZrho6LP5Opr5jFeF71GLKUZhUgobqhAzC/cVeXMjvGvCzMoJx92O1AWdB6B0CFIiwTgBDpjIGNQQ4imKNwdKkay6Byw0BzoU+utefLF69ygMAE4uFWH31c5YCz3FUWd3kUgkkecQHhX+TgfNEaWkR4omcvFQ58YWEaAXtk+mYLtnXdTeGTsPhSxhKJ/x59SOTZcoUD6pRb2H2gcjOF/lJNajZJVLteZ3ZV0eAJmPKqzEL6hWfWshwoKtri2I4AIVjU/mBpq+5+AvTkF1TT3rMD3G1SymwERLQu2QZBIiIDUXrHGsz1zEkxP6tLynpcWqiW5pvuuFn/c1NLSkIv+5TfV8Ow70f13tlSFOZuuYzGpXrrWYRv3pFyv1AT2vGFTmaqZCq2KyGjoH1+mCaPOtaujEcrYILovYLqCAA/Mp1/Lxw9n+kl5T+ooObDadhTpc6QIvEiGDMwv078zzn5ZenXfh29yxoTan5nIdn1oG6n7sLyNZme3XHECcRxXNko5uJI/FeU950akywS4maUZ3TSJf5OOSsUsR/g31dyoxFDb1agvmSiuQ2BT5Eq2MUdbWqVPRMh9RFvjqO3J/uClWbsDqJ7iG9I8IQHOIMQFxQlRNrzMpjIhlWwjvrOLoIfQkb1k6+ixbYEQYBI02lpJArhaPR26n2PSZ1RTeV+HbWWPYBnAvqR9q+zf0Rncq0kfkfJ+AXoi2CZhXVSYJCMnqXQF/NLdc3ElO4Pyoq2U020NhPZ2pvC0G4sNfCKR/6Q54e5FN5eQurjQaudJ1VTrFqltofSTR3zku3BgmXt+HPx2sOkg5PpbQGjPpfHaD2bGKVw6qX+Boo1LXw5iwf2BwMS hZZah5U6 /NdratP8LSuDEzS+AY4VelHm2ymZ27RV5jzSOfF+tsppi4BFZNfrvs3E+VBvuwHCqu71drScD4MBJFtTBDXaXuSY3jFyjlCBX/4OLDnmdWQO3KOaIdmCyUnp8uhPZ0L0E0lwacWnd5yZkWUBiI7RqJCb6rFVil9yNhdsQlewfXxSoeP4JlNImXTLCjoohJKa+ecgmoPJkDcIlOFYfwJZvveTJ8uJ9TAM1A0atuf412rifFcwJ728o2iKmXNbvzEVGWgGcmikcaVU/RzwCQCUinAc3kbifdSVDHr0sLwZF6DowgtUMByUPdZkv641INAOzYd6P2ACvpsnGy4kcbCrckPEOKnT5JAlFEw4SfkjnTUKegYVBFc7rRr+txoFQp+PqKYSwVxdFgbcYGpewKbK0d27xi5cRPflla1BzTh1tfUCxEz+k/j8WYtOSn633F7a2Si32ouEFuej5VTUYfk0kfOBVm07uFfGK6RgX2EkwRDrRQch3JGuu1j8kGu/VSwzStMwH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 5, 2025 at 7:03=E2=80=AFPM Suren Baghdasaryan wrote: > > On Tue, Jan 28, 2025 at 3:51=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Mon, Jan 27, 2025 at 11:21=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > On Mon, Jan 27, 2025 at 6:09=E2=80=AFAM Will Deacon = wrote: > > > > > > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote: > > > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote: > > > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote: > > > > > > > > > > > > > Notably, it means refcount_t is entirely unsuitable for anyth= ing > > > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary vali= dation > > > > > > > conditions after the refcount succeeds. > > > > > > > > > > > > > > And this is probably fine, but let me ponder this all a littl= e more. > > > > > > > > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we= 'd better > > > > > > fix this, these things are hard enough as they are. > > > > > > > > > > > > Will, others, wdyt? > > > > > > > > > > We should also update the Documentation (atomic_t.txt and > > > > > refcount-vs-atomic.rst) if we strengthen this. > > > > > > > > > > > --- > > > > > > Subject: refcount: Strengthen inc_not_zero() > > > > > > > > > > > > For speculative lookups where a successful inc_not_zero() pins = the > > > > > > object, but where we still need to double check if the object a= cquired > > > > > > is indeed the one we set out to aquire, needs this validation t= o happen > > > > > > *after* the increment. > > > > > > > > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example. > > > > > > > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > > > > --- > > > > > > include/linux/refcount.h | 15 ++++++++------- > > > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.= h > > > > > > index 35f039ecb272..340e7ffa445e 100644 > > > > > > --- a/include/linux/refcount.h > > > > > > +++ b/include/linux/refcount.h > > > > > > @@ -69,9 +69,10 @@ > > > > > > * its the lock acquire, for RCU/lockless data structures its = the dependent > > > > > > * load. > > > > > > * > > > > > > - * Do note that inc_not_zero() provides a control dependency w= hich will order > > > > > > - * future stores against the inc, this ensures we'll never mod= ify the object > > > > > > - * if we did not in fact acquire a reference. > > > > > > + * Do note that inc_not_zero() does provide acquire order, whi= ch will order > > > > > > + * future load and stores against the inc, this ensures all su= bsequent accesses > > > > > > + * are from this object and not anything previously occupying = this memory -- > > > > > > + * consider SLAB_TYPESAFE_BY_RCU. > > > > > > * > > > > > > * The decrements will provide release order, such that all th= e prior loads and > > > > > > * stores will be issued before, it also provides a control de= pendency, which > > > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcoun= t_t *r, int *oldp) > > > > > > do { > > > > > > if (!old) > > > > > > break; > > > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + = i)); > > > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + = i)); > > > > > > > > > > Hmm, do the later memory accesses need to be ordered against the = store > > > > > part of the increment or just the read? If it's the former, then = I don't > > > > > think that _acquire is sufficient -- accesses can still get in-be= tween > > > > > the read and write parts of the RmW. > > > > > > > > I dug some more into this at the end of last week. For the > > > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with > > > > dec_and_test(), then I think using _acquire() above is correct as t= he > > > > later references can only move up into the critical section in the = case > > > > that we successfully obtained a reference. > > > > > > > > However, if we're going to make the barriers implicit in the refcou= nt > > > > operations here then I think we also need to do something on the pr= oducer > > > > side for when the object is re-initialised after being destroyed an= d > > > > allocated again. I think that would necessitate release ordering fo= r > > > > refcount_set() so that whatever allows the consumer to validate the > > > > object (e.g. sequence number) is published *before* the refcount. > > > > > > Thanks Will! > > > I would like to expand on your answer to provide an example of the > > > race that would happen without release ordering in the producer. To > > > save reader's time I provide a simplified flow and reasoning first. > > > More detailed code of what I'm considering a typical > > > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my > > > reply (Addendum). > > > Simplified flow looks like this: > > > > > > consumer: > > > obj =3D lookup(collection, key); > > > if (!refcount_inc_not_zero(&obj->ref)) > > > return; > > > smp_rmb(); /* Peter's new acquire fence */ > > > if (READ_ONCE(obj->key) !=3D key) { > > > put_ref(obj); > > > return; > > > } > > > use(obj->value); > > > > > > producer: > > > old_key =3D obj->key; > > > remove(collection, old_key); > > > if (!refcount_dec_and_test(&obj->ref)) > > > return; > > > obj->key =3D KEY_INVALID; > > > free(objj); > > > ... > > > obj =3D malloc(); /* obj is reused */ > > > obj->key =3D new_key; > > > obj->value =3D new_value; > > > smp_wmb(); /* Will's proposed release fence */ > > > refcount_set(obj->ref, 1); > > > insert(collection, key, obj); > > > > > > Let's consider a case when new_key =3D=3D old_key. Will call both of = them > > > "key". Without WIll's proposed fence the following reordering is > > > possible: > > > > > > consumer: > > > obj =3D lookup(collection, key); > > > > > > producer: > > > key =3D obj->key > > > remove(collection, key); > > > if (!refcount_dec_and_test(&obj->ref)) > > > return; > > > obj->key =3D KEY_INVALID; > > > free(objj); > > > obj =3D malloc(); /* obj is reused */ > > > refcount_set(obj->ref, 1); > > > obj->key =3D key; /* same key */ > > > > > > if (!refcount_inc_not_zero(&obj->ref)) > > > return; > > > smp_rmb(); > > > if (READ_ONCE(obj->key) !=3D key) { > > > put_ref(obj); > > > return; > > > } > > > use(obj->value); > > > > > > obj->value =3D new_value; /* reordered store */ > > > add(collection, key, obj); > > > > > > So, the consumer finds the old object, successfully takes a refcount > > > and validates the key. It succeeds because the object is allocated an= d > > > has the same key, which is fine. However it proceeds to use stale > > > obj->value. Will's proposed release ordering would prevent that. > > > > > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/includ= e/linux/slab.h#L102 > > > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU. > > > I think it would be better to introduce two new functions: > > > refcount_add_not_zero_acquire() and refcount_set_release(), clearly > > > document that they should be used when a freed object can be recycled > > > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release() > > > should also clarify that once it's called, the object can be accessed > > > by consumers even if it was not added yet into the collection used fo= r > > > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU > > > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/li= nux/slab.h#L102 > > > then can explicitly use these new functions in the example code, > > > further clarifying their purpose and proper use. > > > WDYT? > > > > Hi Peter, > > Should I take a stab at preparing a patch to add the two new > > refcounting functions suggested above with updates to the > > documentation and comments? > > If you disagree with that or need more time to decide then I'll wait. > > Please let me know. > > Not sure if "--in-reply-to" worked but I just posted a patch adding > new refcounting APIs for SLAB_TYPESAFE_BY_RCU here: > https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/ Since I did not get any replies other than Vlastimil's Ack on the above patch, I went ahead and posted v10 of my patchset [1] and included the patch above in it [2]. Feedback is highly appreciated! [1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.com/ [2] https://lore.kernel.org/all/20250213224655.1680278-11-surenb@google.com= / > Since Peter seems to be busy I discussed these ordering requirements > for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards > having separate functions with the additional fences for this case. > That's what I provided in my patch. > Another possible option is to add acquire ordering in the > __refcount_add_not_zero() as Peter suggested and add > refcount_set_release() function. > Thanks, > Suren. > > > > Thanks, > > Suren. > > > > > > > > > > ADDENDUM. > > > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RC= U: > > > > > > struct object { > > > refcount_t ref; > > > u64 key; > > > u64 value; > > > }; > > > > > > void init(struct object *obj, u64 key, u64 value) > > > { > > > obj->key =3D key; > > > obj->value =3D value; > > > smp_wmb(); /* Will's proposed release fence */ > > > refcount_set(obj->ref, 1); > > > } > > > > > > bool get_ref(struct object *obj, u64 key) > > > { > > > if (!refcount_inc_not_zero(&obj->ref)) > > > return false; > > > smp_rmb(); /* Peter's new acquire fence */ > > > if (READ_ONCE(obj->key) !=3D key) { > > > put_ref(obj); > > > return false; > > > } > > > return true; > > > } > > > > > > void put_ref(struct object *obj) > > > { > > > if (!refcount_dec_and_test(&obj->ref)) > > > return; > > > obj->key =3D KEY_INVALID; > > > free(obj); > > > } > > > > > > consumer: > > > obj =3D lookup(collection, key); > > > if (!get_ref(obj, key) > > > return; > > > use(obj->value); > > > > > > producer: > > > remove(collection, old_obj->key); > > > put_ref(old_obj); > > > new_obj =3D malloc(); > > > init(new_obj, new_key, new_value); > > > insert(collection, new_key, new_obj); > > > > > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and b= e > > > equal to new_obj. > > > > > > > > > > > > > > Will On Wed, Feb 5, 2025 at 7:03=E2=80=AFPM Suren Baghdasaryan wrote: > > On Tue, Jan 28, 2025 at 3:51=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Mon, Jan 27, 2025 at 11:21=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > On Mon, Jan 27, 2025 at 6:09=E2=80=AFAM Will Deacon = wrote: > > > > > > > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote: > > > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote: > > > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote: > > > > > > > > > > > > > Notably, it means refcount_t is entirely unsuitable for anyth= ing > > > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary vali= dation > > > > > > > conditions after the refcount succeeds. > > > > > > > > > > > > > > And this is probably fine, but let me ponder this all a littl= e more. > > > > > > > > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we= 'd better > > > > > > fix this, these things are hard enough as they are. > > > > > > > > > > > > Will, others, wdyt? > > > > > > > > > > We should also update the Documentation (atomic_t.txt and > > > > > refcount-vs-atomic.rst) if we strengthen this. > > > > > > > > > > > --- > > > > > > Subject: refcount: Strengthen inc_not_zero() > > > > > > > > > > > > For speculative lookups where a successful inc_not_zero() pins = the > > > > > > object, but where we still need to double check if the object a= cquired > > > > > > is indeed the one we set out to aquire, needs this validation t= o happen > > > > > > *after* the increment. > > > > > > > > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example. > > > > > > > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > > > > --- > > > > > > include/linux/refcount.h | 15 ++++++++------- > > > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.= h > > > > > > index 35f039ecb272..340e7ffa445e 100644 > > > > > > --- a/include/linux/refcount.h > > > > > > +++ b/include/linux/refcount.h > > > > > > @@ -69,9 +69,10 @@ > > > > > > * its the lock acquire, for RCU/lockless data structures its = the dependent > > > > > > * load. > > > > > > * > > > > > > - * Do note that inc_not_zero() provides a control dependency w= hich will order > > > > > > - * future stores against the inc, this ensures we'll never mod= ify the object > > > > > > - * if we did not in fact acquire a reference. > > > > > > + * Do note that inc_not_zero() does provide acquire order, whi= ch will order > > > > > > + * future load and stores against the inc, this ensures all su= bsequent accesses > > > > > > + * are from this object and not anything previously occupying = this memory -- > > > > > > + * consider SLAB_TYPESAFE_BY_RCU. > > > > > > * > > > > > > * The decrements will provide release order, such that all th= e prior loads and > > > > > > * stores will be issued before, it also provides a control de= pendency, which > > > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcoun= t_t *r, int *oldp) > > > > > > do { > > > > > > if (!old) > > > > > > break; > > > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + = i)); > > > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + = i)); > > > > > > > > > > Hmm, do the later memory accesses need to be ordered against the = store > > > > > part of the increment or just the read? If it's the former, then = I don't > > > > > think that _acquire is sufficient -- accesses can still get in-be= tween > > > > > the read and write parts of the RmW. > > > > > > > > I dug some more into this at the end of last week. For the > > > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with > > > > dec_and_test(), then I think using _acquire() above is correct as t= he > > > > later references can only move up into the critical section in the = case > > > > that we successfully obtained a reference. > > > > > > > > However, if we're going to make the barriers implicit in the refcou= nt > > > > operations here then I think we also need to do something on the pr= oducer > > > > side for when the object is re-initialised after being destroyed an= d > > > > allocated again. I think that would necessitate release ordering fo= r > > > > refcount_set() so that whatever allows the consumer to validate the > > > > object (e.g. sequence number) is published *before* the refcount. > > > > > > Thanks Will! > > > I would like to expand on your answer to provide an example of the > > > race that would happen without release ordering in the producer. To > > > save reader's time I provide a simplified flow and reasoning first. > > > More detailed code of what I'm considering a typical > > > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my > > > reply (Addendum). > > > Simplified flow looks like this: > > > > > > consumer: > > > obj =3D lookup(collection, key); > > > if (!refcount_inc_not_zero(&obj->ref)) > > > return; > > > smp_rmb(); /* Peter's new acquire fence */ > > > if (READ_ONCE(obj->key) !=3D key) { > > > put_ref(obj); > > > return; > > > } > > > use(obj->value); > > > > > > producer: > > > old_key =3D obj->key; > > > remove(collection, old_key); > > > if (!refcount_dec_and_test(&obj->ref)) > > > return; > > > obj->key =3D KEY_INVALID; > > > free(objj); > > > ... > > > obj =3D malloc(); /* obj is reused */ > > > obj->key =3D new_key; > > > obj->value =3D new_value; > > > smp_wmb(); /* Will's proposed release fence */ > > > refcount_set(obj->ref, 1); > > > insert(collection, key, obj); > > > > > > Let's consider a case when new_key =3D=3D old_key. Will call both of = them > > > "key". Without WIll's proposed fence the following reordering is > > > possible: > > > > > > consumer: > > > obj =3D lookup(collection, key); > > > > > > producer: > > > key =3D obj->key > > > remove(collection, key); > > > if (!refcount_dec_and_test(&obj->ref)) > > > return; > > > obj->key =3D KEY_INVALID; > > > free(objj); > > > obj =3D malloc(); /* obj is reused */ > > > refcount_set(obj->ref, 1); > > > obj->key =3D key; /* same key */ > > > > > > if (!refcount_inc_not_zero(&obj->ref)) > > > return; > > > smp_rmb(); > > > if (READ_ONCE(obj->key) !=3D key) { > > > put_ref(obj); > > > return; > > > } > > > use(obj->value); > > > > > > obj->value =3D new_value; /* reordered store */ > > > add(collection, key, obj); > > > > > > So, the consumer finds the old object, successfully takes a refcount > > > and validates the key. It succeeds because the object is allocated an= d > > > has the same key, which is fine. However it proceeds to use stale > > > obj->value. Will's proposed release ordering would prevent that. > > > > > > The example in https://elixir.bootlin.com/linux/v6.12.6/source/includ= e/linux/slab.h#L102 > > > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU. > > > I think it would be better to introduce two new functions: > > > refcount_add_not_zero_acquire() and refcount_set_release(), clearly > > > document that they should be used when a freed object can be recycled > > > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release() > > > should also clarify that once it's called, the object can be accessed > > > by consumers even if it was not added yet into the collection used fo= r > > > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU > > > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/li= nux/slab.h#L102 > > > then can explicitly use these new functions in the example code, > > > further clarifying their purpose and proper use. > > > WDYT? > > > > Hi Peter, > > Should I take a stab at preparing a patch to add the two new > > refcounting functions suggested above with updates to the > > documentation and comments? > > If you disagree with that or need more time to decide then I'll wait. > > Please let me know. > > Not sure if "--in-reply-to" worked but I just posted a patch adding > new refcounting APIs for SLAB_TYPESAFE_BY_RCU here: > https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/ > Since Peter seems to be busy I discussed these ordering requirements > for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards > having separate functions with the additional fences for this case. > That's what I provided in my patch. > Another possible option is to add acquire ordering in the > __refcount_add_not_zero() as Peter suggested and add > refcount_set_release() function. > Thanks, > Suren. > > > > Thanks, > > Suren. > > > > > > > > > > ADDENDUM. > > > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RC= U: > > > > > > struct object { > > > refcount_t ref; > > > u64 key; > > > u64 value; > > > }; > > > > > > void init(struct object *obj, u64 key, u64 value) > > > { > > > obj->key =3D key; > > > obj->value =3D value; > > > smp_wmb(); /* Will's proposed release fence */ > > > refcount_set(obj->ref, 1); > > > } > > > > > > bool get_ref(struct object *obj, u64 key) > > > { > > > if (!refcount_inc_not_zero(&obj->ref)) > > > return false; > > > smp_rmb(); /* Peter's new acquire fence */ > > > if (READ_ONCE(obj->key) !=3D key) { > > > put_ref(obj); > > > return false; > > > } > > > return true; > > > } > > > > > > void put_ref(struct object *obj) > > > { > > > if (!refcount_dec_and_test(&obj->ref)) > > > return; > > > obj->key =3D KEY_INVALID; > > > free(obj); > > > } > > > > > > consumer: > > > obj =3D lookup(collection, key); > > > if (!get_ref(obj, key) > > > return; > > > use(obj->value); > > > > > > producer: > > > remove(collection, old_obj->key); > > > put_ref(old_obj); > > > new_obj =3D malloc(); > > > init(new_obj, new_key, new_value); > > > insert(collection, new_key, new_obj); > > > > > > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and b= e > > > equal to new_obj. > > > > > > > > > > > > > > Will