From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A031EC02188 for ; Mon, 27 Jan 2025 19:21:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1D99128019B; Mon, 27 Jan 2025 14:21:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 16230280191; Mon, 27 Jan 2025 14:21:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1EBE28019B; Mon, 27 Jan 2025 14:21:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D4D05280191 for ; Mon, 27 Jan 2025 14:21:20 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7D87AC0591 for ; Mon, 27 Jan 2025 19:21:20 +0000 (UTC) X-FDA: 83054200320.05.5BE1A7E Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) by imf12.hostedemail.com (Postfix) with ESMTP id 586E140005 for ; Mon, 27 Jan 2025 19:21:18 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=X9KyNdRN; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738005678; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hPtpmhTd13u2F/CmIuromPk6kNHtYBvvGPZ16oBWfeg=; b=f9RGahfPljNtimR0p9doAGFlmBk24rEMH98/UELIh+9SR5YQhjDDIdze+aoEiTyGQH5Pdo eaPBmdnEbt4/SXH/hoxgIFb2JEsCxdv3RABzHsG8GPt/pvMsmsKdr3Lo9k/Ku35Si6HgSQ bKqUUsGHlcM8Pe4B8V8gDuusPcyBWm8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=X9KyNdRN; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.160.176 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738005678; a=rsa-sha256; cv=none; b=e0stkqXQXjsx44oadOC/BigMntSRaRkuWKMnYoOTiXwGmBBO6W0LqpPE10X2/zmFWUQFJ8 fLYfFZYrV4IL5hum2CAhncW/oDfr6l6aYB1oycNBCh7H1mcBwXvLyoZ0VhYbyFBsOSjkt6 immwFQlP+Ymkp9DVxbzoVuMUmSj3V+U= Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-4678c9310afso32861cf.1 for ; Mon, 27 Jan 2025 11:21:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738005677; x=1738610477; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hPtpmhTd13u2F/CmIuromPk6kNHtYBvvGPZ16oBWfeg=; b=X9KyNdRN97+oUIZQjKuOvehousx6g0Ot2sHOeRLQkbQ/pBO3nVE8HU+ASRouondlAt N+JhB2c0SkcBUCBPRoyIBLQ5A2LXZ6bqgx1Xj9ewf9NDHk0KJ33BcUBPNNwqu1WVtcEA nrHCV0N81O0bjYj7BY1Z4HIqyOTeoE86epRXmUi2H5igiiyixnJrpYHt+CB+vko3l7FA BjedGqAvx3EGyImHCB+84yw2cNjRX6IpNHn9v9g5WKUGJH1C1EkhOMQE+AWBFv9AA2jp 3lQaLld8Nff1r1Nc55pof8imWZ+JgsvDps9uT8LwQci7n1ARZtB9eV2ni6kvMYKK2zvH b2bQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738005677; x=1738610477; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hPtpmhTd13u2F/CmIuromPk6kNHtYBvvGPZ16oBWfeg=; b=MbsBhURv7p/TL/CIZl5Kg2yDx6kccqdd4p9hBfSYvzl2zbCcKhlkE9KkToVFjJxlB7 nw4qVuv4si46lOWP1eDs2TFqirQEXbF3UN7X4Z5qMJemFuNtIBTpwxZk3KK3jdGkYqqD Jop8IKc95pnG12DXSvmDRyPnKZ2lM9i0SHGK4AKed19Er1WLZydVEuagsL4xIiejJI3j n76CWtXl4b8HNDNDQUnZFQdinFJ14nDuKYSezHVyTg8WRD1U+y3y8mTyc25PJjx+U8y1 a1l/7Z895/6IBZ3hq6v9m4GssS7IET74ZcwA4koRhN04a44cr1OdKtzV5mqE/mmqFLA+ tb4g== X-Forwarded-Encrypted: i=1; AJvYcCWHmzCPUYEhp6YCf2sS/TVzIiY2q8HP65vjBbfMhmKjvzOzx+gKF552Yi1B65pRVBa80JeaINUTsQ==@kvack.org X-Gm-Message-State: AOJu0YySKTMsqhILdycXm1I7hZQ+d5RexFqdHK7sAE19h9LjPQ2/ZQ6+ EDfZ8GtCNhu+9iWx3qBdXIK2cFAYxvESlNWgYr9/K5Z9iSa5s7/TKklYZ1siFjZQoWWVAENGFJE HBqga7zc8hOh4NH4UXsEZY1JR/lC9fvZH9CoA X-Gm-Gg: ASbGnct7XirTK1WJ900aMJTnLD8RaDKIkqfE+FjcE5qWHgCFGc2tQNto9gIF8Of+hXU pjd3K4ZEVFhUwgaiEnMWV1vVYOlntdkwTdHoMhADIlVa7TtceASlw3GdeNUD/QA== X-Google-Smtp-Source: AGHT+IG31g4nMZEmaEsC2vKAhXoQldOkeBXIX4J+6WUl23p5m6fuaHLtgqA/xBuVVJnDvm2z3aIjIRpLRSMUoCgFPuE= X-Received: by 2002:ac8:5946:0:b0:465:c590:ed18 with SMTP id d75a77b69052e-46fc57a4079mr53491cf.9.1738005677028; Mon, 27 Jan 2025 11:21:17 -0800 (PST) MIME-Version: 1.0 References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> <20250115104841.GX5388@noisy.programming.kicks-ass.net> <20250115111334.GE8385@noisy.programming.kicks-ass.net> <20250115160011.GG8385@noisy.programming.kicks-ass.net> <20250117154135.GA17569@willie-the-truck> <20250127140915.GA25672@willie-the-truck> In-Reply-To: <20250127140915.GA25672@willie-the-truck> From: Suren Baghdasaryan Date: Mon, 27 Jan 2025 11:21:06 -0800 X-Gm-Features: AWEUYZnY9D4uQUJ9yQfaVSTdhD2W1HrOehkqc9RLmPRj2cdFEFGW0SrjXOFP0u8 Message-ID: Subject: Re: [PATCH] refcount: Strengthen inc_not_zero() To: Will Deacon Cc: Peter Zijlstra , boqun.feng@gmail.com, mark.rutland@arm.com, Mateusz Guzik , akpm@linux-foundation.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 586E140005 X-Stat-Signature: ccyjkpamopckxazafcebg9f8ez5cprri X-Rspam-User: X-HE-Tag: 1738005678-945531 X-HE-Meta: U2FsdGVkX18UWXQX93HWREWCdNnUCvO3WvOgYhKj0lRbxeyqymmioh9VtfUzdqfnVboKV9Vb9fG9aC4rDnFrbTxrQxH6WIYVgPknr6glESKwtppUXft0/JUfbVhKcejpY6Vfm8z4oJTIWPLzd+ynXlBl7V2gbxK923TtwdrihfO0azneJh8b4R1vhS5Cv41bX3A0ARJkbLDfhwsV7WIPvkqhhmFkzKCNqDw/eG7dvNJyCU2IxnaqnAYi9tdqwO6xADJjHEQBC+EadwQs89CxRBSp6t5H0z+LfrhXKL6rkUPuxjDeY8waeoa9ICMoLGQuFYBoj32wuVmx1+GhhwiV4UKdj9Y+aNALNDP9drxWUeg3+Kb4KoqjSLqsGy9sKF1unrBYCHtt2JPmH48BvBOln3WdHr0UkJTYxO0vjVdfccamjL70TiHebaOX8GxJ2aK3iaPwMFd4ZBuputZ07DwNPlZdT+bInTRt0ZEHBCNWSGQnHAW7j9FYUJsmzokE1/2Mz6s4xbd7fakKWK8HLOuqL++33GOcicDdiI2U++zwPD/ed2pgZ86jJTkfod5hCLMXhsKtB77/DKvEQogo3GTuQgCw0u6JzPRvCirxMo258T1EENmlkq4ZtGNr+0GIj90e+Ob8fA0ELuztRk4Zlh/PSY723WTOjBfhoJenUtpOTnIntWohe/NeIfnS0vT53U7oN7yU6KV4P4IN7NjaDCvYSQO8PYbvkZ+GU9e0U0nJaaYOrCkHZQDW+ogW5AJdtH8N5uzyWYa5HqvePlM6zL1X0FzF0VI/OuiF9sTsDAfv4oMTnLtPJM3SzJAAQb5xrSMDtk7bm6OhgAVtZljnacsubb0WzQzQJlg0CuPLjtPxkD912tp6GTmiSixPXZSkYLpriX/Jn4kke7shFtVruArc/Ymbv7k6EunPwqP3pwlo4GWhHt5Y307FmhSDITJddYXLVco+C6EDEKdKPQu+iKr ZmUeGem4 3itYBT6/s0orfNEOFUvyocNXe73NeQLCIcAtUf3SnpO+eRJzwN2gLMhiza95SvZFTpi1s6OHZO7WYum1dVeHUARWR71N0Zgzl/9yrffYduhab5f69gv5LQuNMnMnDvNUKJacBqSO8UF/sfXg+gG6v+7Hjl4V/kWnExsEb1l44KQvXbamcCk4APErwkZL/l8EW2iH3mHh1SuhljvDut6/Gxo8dLTACMIkaFG8nglDKOG7gk5HDkHl5q/Uc3tt1FGXAgsKx3Va3oExkt9ghSt3t08sqi5CXiAqglKkhD9cYBdtCW6bnjql+YlApDNtgAUKysTB9iAvjwckYzFEARSzttWl/RaK0fXbCLEyCsyC8ADCshQvEMMzIPwSeAizkErMJ4iD8Xs+gFHNwRVCg99GCncexpdh4Oh8ljjddklCckvWJ6Dp70fU0Pm+bQZn/6PqooZFT22TVl9qsi90= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jan 27, 2025 at 6:09=E2=80=AFAM Will Deacon wrote= : > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote: > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote: > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote: > > > > > > > Notably, it means refcount_t is entirely unsuitable for anything > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation > > > > conditions after the refcount succeeds. > > > > > > > > And this is probably fine, but let me ponder this all a little more= . > > > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd bet= ter > > > fix this, these things are hard enough as they are. > > > > > > Will, others, wdyt? > > > > We should also update the Documentation (atomic_t.txt and > > refcount-vs-atomic.rst) if we strengthen this. > > > > > --- > > > Subject: refcount: Strengthen inc_not_zero() > > > > > > For speculative lookups where a successful inc_not_zero() pins the > > > object, but where we still need to double check if the object acquire= d > > > is indeed the one we set out to aquire, needs this validation to happ= en > > > *after* the increment. > > > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example. > > > > > > Signed-off-by: Peter Zijlstra (Intel) > > > --- > > > include/linux/refcount.h | 15 ++++++++------- > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h > > > index 35f039ecb272..340e7ffa445e 100644 > > > --- a/include/linux/refcount.h > > > +++ b/include/linux/refcount.h > > > @@ -69,9 +69,10 @@ > > > * its the lock acquire, for RCU/lockless data structures its the de= pendent > > > * load. > > > * > > > - * Do note that inc_not_zero() provides a control dependency which w= ill order > > > - * future stores against the inc, this ensures we'll never modify th= e object > > > - * if we did not in fact acquire a reference. > > > + * Do note that inc_not_zero() does provide acquire order, which wil= l order > > > + * future load and stores against the inc, this ensures all subseque= nt accesses > > > + * are from this object and not anything previously occupying this m= emory -- > > > + * consider SLAB_TYPESAFE_BY_RCU. > > > * > > > * The decrements will provide release order, such that all the prio= r loads and > > > * stores will be issued before, it also provides a control dependen= cy, which > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r= , int *oldp) > > > do { > > > if (!old) > > > break; > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i)); > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i)); > > > > Hmm, do the later memory accesses need to be ordered against the store > > part of the increment or just the read? If it's the former, then I don'= t > > think that _acquire is sufficient -- accesses can still get in-between > > the read and write parts of the RmW. > > I dug some more into this at the end of last week. For the > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with > dec_and_test(), then I think using _acquire() above is correct as the > later references can only move up into the critical section in the case > that we successfully obtained a reference. > > However, if we're going to make the barriers implicit in the refcount > operations here then I think we also need to do something on the producer > side for when the object is re-initialised after being destroyed and > allocated again. I think that would necessitate release ordering for > refcount_set() so that whatever allows the consumer to validate the > object (e.g. sequence number) is published *before* the refcount. Thanks Will! I would like to expand on your answer to provide an example of the race that would happen without release ordering in the producer. To save reader's time I provide a simplified flow and reasoning first. More detailed code of what I'm considering a typical SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my reply (Addendum). Simplified flow looks like this: consumer: obj =3D lookup(collection, key); if (!refcount_inc_not_zero(&obj->ref)) return; smp_rmb(); /* Peter's new acquire fence */ if (READ_ONCE(obj->key) !=3D key) { put_ref(obj); return; } use(obj->value); producer: old_key =3D obj->key; remove(collection, old_key); if (!refcount_dec_and_test(&obj->ref)) return; obj->key =3D KEY_INVALID; free(objj); ... obj =3D malloc(); /* obj is reused */ obj->key =3D new_key; obj->value =3D new_value; smp_wmb(); /* Will's proposed release fence */ refcount_set(obj->ref, 1); insert(collection, key, obj); Let's consider a case when new_key =3D=3D old_key. Will call both of them "key". Without WIll's proposed fence the following reordering is possible: consumer: obj =3D lookup(collection, key); producer: key =3D obj->key remove(collection, key); if (!refcount_dec_and_test(&obj->ref)) return; obj->key =3D KEY_INVALID; free(objj); obj =3D malloc(); /* obj is reused */ refcount_set(obj->ref, 1); obj->key =3D key; /* same key */ if (!refcount_inc_not_zero(&obj->ref)) return; smp_rmb(); if (READ_ONCE(obj->key) !=3D key) { put_ref(obj); return; } use(obj->value); obj->value =3D new_value; /* reordered store */ add(collection, key, obj); So, the consumer finds the old object, successfully takes a refcount and validates the key. It succeeds because the object is allocated and has the same key, which is fine. However it proceeds to use stale obj->value. Will's proposed release ordering would prevent that. The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linu= x/slab.h#L102 omits all these ordering issues for SLAB_TYPESAFE_BY_RCU. I think it would be better to introduce two new functions: refcount_add_not_zero_acquire() and refcount_set_release(), clearly document that they should be used when a freed object can be recycled and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release() should also clarify that once it's called, the object can be accessed by consumers even if it was not added yet into the collection used for object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/sl= ab.h#L102 then can explicitly use these new functions in the example code, further clarifying their purpose and proper use. WDYT? ADDENDUM. Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU: struct object { refcount_t ref; u64 key; u64 value; }; void init(struct object *obj, u64 key, u64 value) { obj->key =3D key; obj->value =3D value; smp_wmb(); /* Will's proposed release fence */ refcount_set(obj->ref, 1); } bool get_ref(struct object *obj, u64 key) { if (!refcount_inc_not_zero(&obj->ref)) return false; smp_rmb(); /* Peter's new acquire fence */ if (READ_ONCE(obj->key) !=3D key) { put_ref(obj); return false; } return true; } void put_ref(struct object *obj) { if (!refcount_dec_and_test(&obj->ref)) return; obj->key =3D KEY_INVALID; free(obj); } consumer: obj =3D lookup(collection, key); if (!get_ref(obj, key) return; use(obj->value); producer: remove(collection, old_obj->key); put_ref(old_obj); new_obj =3D malloc(); init(new_obj, new_key, new_value); insert(collection, new_key, new_obj); With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be equal to new_obj. > > Will