From: Suren Baghdasaryan <surenb@google.com>
To: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
boqun.feng@gmail.com, mark.rutland@arm.com,
Mateusz Guzik <mjguzik@gmail.com>,
akpm@linux-foundation.org, willy@infradead.org,
liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz,
hannes@cmpxchg.org, oliver.sang@intel.com,
mgorman@techsingularity.net, david@redhat.com,
peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net,
paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com,
hdanton@sina.com, hughd@google.com, lokeshgidra@google.com,
minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
souravpanda@google.com, pasha.tatashin@soleen.com,
klarasmodin@gmail.com, richard.weiyang@gmail.com,
corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH] refcount: Strengthen inc_not_zero()
Date: Wed, 5 Feb 2025 19:03:38 -0800 [thread overview]
Message-ID: <CAJuCfpGeHaQu_F6mJo70S+cxWD_PvhDCvfBsD-UXWqwMuOTRYA@mail.gmail.com> (raw)
In-Reply-To: <CAJuCfpHJedxhZuuoKkKaQXsT-3tPSkhUwHadJRGEdTxSkzRc_w@mail.gmail.com>
On Tue, Jan 28, 2025 at 3:51 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Mon, Jan 27, 2025 at 11:21 AM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Mon, Jan 27, 2025 at 6:09 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Fri, Jan 17, 2025 at 03:41:36PM +0000, Will Deacon wrote:
> > > > On Wed, Jan 15, 2025 at 05:00:11PM +0100, Peter Zijlstra wrote:
> > > > > On Wed, Jan 15, 2025 at 12:13:34PM +0100, Peter Zijlstra wrote:
> > > > >
> > > > > > Notably, it means refcount_t is entirely unsuitable for anything
> > > > > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation
> > > > > > conditions after the refcount succeeds.
> > > > > >
> > > > > > And this is probably fine, but let me ponder this all a little more.
> > > > >
> > > > > Even though SLAB_TYPESAFE_BY_RCU is relatively rare, I think we'd better
> > > > > fix this, these things are hard enough as they are.
> > > > >
> > > > > Will, others, wdyt?
> > > >
> > > > We should also update the Documentation (atomic_t.txt and
> > > > refcount-vs-atomic.rst) if we strengthen this.
> > > >
> > > > > ---
> > > > > Subject: refcount: Strengthen inc_not_zero()
> > > > >
> > > > > For speculative lookups where a successful inc_not_zero() pins the
> > > > > object, but where we still need to double check if the object acquired
> > > > > is indeed the one we set out to aquire, needs this validation to happen
> > > > > *after* the increment.
> > > > >
> > > > > Notably SLAB_TYPESAFE_BY_RCU is one such an example.
> > > > >
> > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > ---
> > > > > include/linux/refcount.h | 15 ++++++++-------
> > > > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> > > > > index 35f039ecb272..340e7ffa445e 100644
> > > > > --- a/include/linux/refcount.h
> > > > > +++ b/include/linux/refcount.h
> > > > > @@ -69,9 +69,10 @@
> > > > > * its the lock acquire, for RCU/lockless data structures its the dependent
> > > > > * load.
> > > > > *
> > > > > - * Do note that inc_not_zero() provides a control dependency which will order
> > > > > - * future stores against the inc, this ensures we'll never modify the object
> > > > > - * if we did not in fact acquire a reference.
> > > > > + * Do note that inc_not_zero() does provide acquire order, which will order
> > > > > + * future load and stores against the inc, this ensures all subsequent accesses
> > > > > + * are from this object and not anything previously occupying this memory --
> > > > > + * consider SLAB_TYPESAFE_BY_RCU.
> > > > > *
> > > > > * The decrements will provide release order, such that all the prior loads and
> > > > > * stores will be issued before, it also provides a control dependency, which
> > > > > @@ -144,7 +145,7 @@ bool __refcount_add_not_zero(int i, refcount_t *r, int *oldp)
> > > > > do {
> > > > > if (!old)
> > > > > break;
> > > > > - } while (!atomic_try_cmpxchg_relaxed(&r->refs, &old, old + i));
> > > > > + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i));
> > > >
> > > > Hmm, do the later memory accesses need to be ordered against the store
> > > > part of the increment or just the read? If it's the former, then I don't
> > > > think that _acquire is sufficient -- accesses can still get in-between
> > > > the read and write parts of the RmW.
> > >
> > > I dug some more into this at the end of last week. For the
> > > SLAB_TYPESAFE_BY_RCU where we're racing inc_not_zero() with
> > > dec_and_test(), then I think using _acquire() above is correct as the
> > > later references can only move up into the critical section in the case
> > > that we successfully obtained a reference.
> > >
> > > However, if we're going to make the barriers implicit in the refcount
> > > operations here then I think we also need to do something on the producer
> > > side for when the object is re-initialised after being destroyed and
> > > allocated again. I think that would necessitate release ordering for
> > > refcount_set() so that whatever allows the consumer to validate the
> > > object (e.g. sequence number) is published *before* the refcount.
> >
> > Thanks Will!
> > I would like to expand on your answer to provide an example of the
> > race that would happen without release ordering in the producer. To
> > save reader's time I provide a simplified flow and reasoning first.
> > More detailed code of what I'm considering a typical
> > SLAB_TYPESAFE_BY_RCU refcounting example is added at the end of my
> > reply (Addendum).
> > Simplified flow looks like this:
> >
> > consumer:
> > obj = lookup(collection, key);
> > if (!refcount_inc_not_zero(&obj->ref))
> > return;
> > smp_rmb(); /* Peter's new acquire fence */
> > if (READ_ONCE(obj->key) != key) {
> > put_ref(obj);
> > return;
> > }
> > use(obj->value);
> >
> > producer:
> > old_key = obj->key;
> > remove(collection, old_key);
> > if (!refcount_dec_and_test(&obj->ref))
> > return;
> > obj->key = KEY_INVALID;
> > free(objj);
> > ...
> > obj = malloc(); /* obj is reused */
> > obj->key = new_key;
> > obj->value = new_value;
> > smp_wmb(); /* Will's proposed release fence */
> > refcount_set(obj->ref, 1);
> > insert(collection, key, obj);
> >
> > Let's consider a case when new_key == old_key. Will call both of them
> > "key". Without WIll's proposed fence the following reordering is
> > possible:
> >
> > consumer:
> > obj = lookup(collection, key);
> >
> > producer:
> > key = obj->key
> > remove(collection, key);
> > if (!refcount_dec_and_test(&obj->ref))
> > return;
> > obj->key = KEY_INVALID;
> > free(objj);
> > obj = malloc(); /* obj is reused */
> > refcount_set(obj->ref, 1);
> > obj->key = key; /* same key */
> >
> > if (!refcount_inc_not_zero(&obj->ref))
> > return;
> > smp_rmb();
> > if (READ_ONCE(obj->key) != key) {
> > put_ref(obj);
> > return;
> > }
> > use(obj->value);
> >
> > obj->value = new_value; /* reordered store */
> > add(collection, key, obj);
> >
> > So, the consumer finds the old object, successfully takes a refcount
> > and validates the key. It succeeds because the object is allocated and
> > has the same key, which is fine. However it proceeds to use stale
> > obj->value. Will's proposed release ordering would prevent that.
> >
> > The example in https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > omits all these ordering issues for SLAB_TYPESAFE_BY_RCU.
> > I think it would be better to introduce two new functions:
> > refcount_add_not_zero_acquire() and refcount_set_release(), clearly
> > document that they should be used when a freed object can be recycled
> > and reused, like in SLAB_TYPESAFE_BY_RCU case. refcount_set_release()
> > should also clarify that once it's called, the object can be accessed
> > by consumers even if it was not added yet into the collection used for
> > object lookup (like in the example above). SLAB_TYPESAFE_BY_RCU
> > comment at https://elixir.bootlin.com/linux/v6.12.6/source/include/linux/slab.h#L102
> > then can explicitly use these new functions in the example code,
> > further clarifying their purpose and proper use.
> > WDYT?
>
> Hi Peter,
> Should I take a stab at preparing a patch to add the two new
> refcounting functions suggested above with updates to the
> documentation and comments?
> If you disagree with that or need more time to decide then I'll wait.
> Please let me know.
Not sure if "--in-reply-to" worked but I just posted a patch adding
new refcounting APIs for SLAB_TYPESAFE_BY_RCU here:
https://lore.kernel.org/all/20250206025201.979573-1-surenb@google.com/
Since Peter seems to be busy I discussed these ordering requirements
for SLAB_TYPESAFE_BY_RCU with Paul McKenney and he was leaning towards
having separate functions with the additional fences for this case.
That's what I provided in my patch.
Another possible option is to add acquire ordering in the
__refcount_add_not_zero() as Peter suggested and add
refcount_set_release() function.
Thanks,
Suren.
> Thanks,
> Suren.
>
>
> >
> > ADDENDUM.
> > Detailed code for typical use of refcounting with SLAB_TYPESAFE_BY_RCU:
> >
> > struct object {
> > refcount_t ref;
> > u64 key;
> > u64 value;
> > };
> >
> > void init(struct object *obj, u64 key, u64 value)
> > {
> > obj->key = key;
> > obj->value = value;
> > smp_wmb(); /* Will's proposed release fence */
> > refcount_set(obj->ref, 1);
> > }
> >
> > bool get_ref(struct object *obj, u64 key)
> > {
> > if (!refcount_inc_not_zero(&obj->ref))
> > return false;
> > smp_rmb(); /* Peter's new acquire fence */
> > if (READ_ONCE(obj->key) != key) {
> > put_ref(obj);
> > return false;
> > }
> > return true;
> > }
> >
> > void put_ref(struct object *obj)
> > {
> > if (!refcount_dec_and_test(&obj->ref))
> > return;
> > obj->key = KEY_INVALID;
> > free(obj);
> > }
> >
> > consumer:
> > obj = lookup(collection, key);
> > if (!get_ref(obj, key)
> > return;
> > use(obj->value);
> >
> > producer:
> > remove(collection, old_obj->key);
> > put_ref(old_obj);
> > new_obj = malloc();
> > init(new_obj, new_key, new_value);
> > insert(collection, new_key, new_obj);
> >
> > With SLAB_TYPESAFE_BY_RCU old_obj in the producer can be reused and be
> > equal to new_obj.
> >
> >
> > >
> > > Will
next prev parent reply other threads:[~2025-02-06 3:03 UTC|newest]
Thread overview: 140+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-11 4:25 [PATCH v9 00/17] reimplement per-vma lock as a refcount Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 01/17] mm: introduce vma_start_read_locked{_nested} helpers Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 02/17] mm: move per-vma lock into vm_area_struct Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 03/17] mm: mark vma as detached until it's added into vma tree Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 04/17] mm: introduce vma_iter_store_attached() to use with attached vmas Suren Baghdasaryan
2025-01-13 11:58 ` Lorenzo Stoakes
2025-01-13 16:31 ` Suren Baghdasaryan
2025-01-13 16:44 ` Lorenzo Stoakes
2025-01-13 16:47 ` Lorenzo Stoakes
2025-01-13 19:09 ` Suren Baghdasaryan
2025-01-14 11:38 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 05/17] mm: mark vmas detached upon exit Suren Baghdasaryan
2025-01-13 12:05 ` Lorenzo Stoakes
2025-01-13 17:02 ` Suren Baghdasaryan
2025-01-13 17:13 ` Lorenzo Stoakes
2025-01-13 19:11 ` Suren Baghdasaryan
2025-01-13 20:32 ` Vlastimil Babka
2025-01-13 20:42 ` Suren Baghdasaryan
2025-01-14 11:36 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 06/17] types: move struct rcuwait into types.h Suren Baghdasaryan
2025-01-13 14:46 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 07/17] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail Suren Baghdasaryan
2025-01-13 15:25 ` Lorenzo Stoakes
2025-01-13 17:53 ` Suren Baghdasaryan
2025-01-14 11:48 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 08/17] mm: move mmap_init_lock() out of the header file Suren Baghdasaryan
2025-01-13 15:27 ` Lorenzo Stoakes
2025-01-13 17:53 ` Suren Baghdasaryan
2025-01-11 4:25 ` [PATCH v9 09/17] mm: uninline the main body of vma_start_write() Suren Baghdasaryan
2025-01-13 15:52 ` Lorenzo Stoakes
2025-01-11 4:25 ` [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited Suren Baghdasaryan
2025-01-11 6:31 ` Hillf Danton
2025-01-11 9:59 ` Suren Baghdasaryan
2025-01-11 10:00 ` Suren Baghdasaryan
2025-01-11 12:13 ` Hillf Danton
2025-01-11 17:11 ` Suren Baghdasaryan
2025-01-11 23:44 ` Hillf Danton
2025-01-12 0:31 ` Suren Baghdasaryan
2025-01-15 9:39 ` Peter Zijlstra
2025-01-16 10:52 ` Hillf Danton
2025-01-11 12:39 ` David Laight
2025-01-11 17:07 ` Matthew Wilcox
2025-01-11 18:30 ` Paul E. McKenney
2025-01-11 22:19 ` David Laight
2025-01-11 22:50 ` [PATCH v9 10/17] refcount: introduce __refcount_{add|inc}_not_zero_limited - clang 17.0.1 bug David Laight
2025-01-12 11:37 ` David Laight
2025-01-12 17:56 ` Paul E. McKenney
2025-01-11 4:25 ` [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Suren Baghdasaryan
2025-01-11 11:24 ` Mateusz Guzik
2025-01-11 20:14 ` Suren Baghdasaryan
2025-01-11 20:16 ` Suren Baghdasaryan
2025-01-11 20:31 ` Mateusz Guzik
2025-01-11 20:58 ` Suren Baghdasaryan
2025-01-11 20:38 ` Vlastimil Babka
2025-01-13 1:47 ` Wei Yang
2025-01-13 2:25 ` Wei Yang
2025-01-13 21:14 ` Suren Baghdasaryan
2025-01-13 21:08 ` Suren Baghdasaryan
2025-01-15 10:48 ` Peter Zijlstra
2025-01-15 11:13 ` Peter Zijlstra
2025-01-15 15:00 ` Suren Baghdasaryan
2025-01-15 15:35 ` Peter Zijlstra
2025-01-15 15:38 ` Peter Zijlstra
2025-01-15 16:22 ` Suren Baghdasaryan
2025-01-15 16:00 ` [PATCH] refcount: Strengthen inc_not_zero() Peter Zijlstra
2025-01-16 15:12 ` Suren Baghdasaryan
2025-01-17 15:41 ` Will Deacon
2025-01-27 14:09 ` Will Deacon
2025-01-27 19:21 ` Suren Baghdasaryan
2025-01-28 23:51 ` Suren Baghdasaryan
2025-02-06 2:52 ` [PATCH 1/1] refcount: provide ops for cases when object's memory can be reused Suren Baghdasaryan
2025-02-06 10:41 ` Vlastimil Babka
2025-02-06 3:03 ` Suren Baghdasaryan [this message]
2025-02-13 23:04 ` [PATCH] refcount: Strengthen inc_not_zero() Suren Baghdasaryan
2025-01-17 16:13 ` Matthew Wilcox
2025-01-12 2:59 ` [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Wei Yang
2025-01-12 17:35 ` Suren Baghdasaryan
2025-01-13 0:59 ` Wei Yang
2025-01-13 2:37 ` Wei Yang
2025-01-13 21:16 ` Suren Baghdasaryan
2025-01-13 9:36 ` Wei Yang
2025-01-13 21:18 ` Suren Baghdasaryan
2025-01-15 2:58 ` Wei Yang
2025-01-15 3:12 ` Suren Baghdasaryan
2025-01-15 12:05 ` Wei Yang
2025-01-15 15:01 ` Suren Baghdasaryan
2025-01-16 1:37 ` Wei Yang
2025-01-16 1:41 ` Suren Baghdasaryan
2025-01-16 9:10 ` Wei Yang
2025-01-11 4:25 ` [PATCH v9 12/17] mm: move lesser used vma_area_struct members into the last cacheline Suren Baghdasaryan
2025-01-13 16:15 ` Lorenzo Stoakes
2025-01-15 10:50 ` Peter Zijlstra
2025-01-15 16:39 ` Suren Baghdasaryan
2025-02-13 22:59 ` Suren Baghdasaryan
2025-01-11 4:26 ` [PATCH v9 13/17] mm/debug: print vm_refcnt state when dumping the vma Suren Baghdasaryan
2025-01-13 16:21 ` Lorenzo Stoakes
2025-01-13 16:35 ` Liam R. Howlett
2025-01-13 17:57 ` Suren Baghdasaryan
2025-01-14 11:41 ` Lorenzo Stoakes
2025-01-11 4:26 ` [PATCH v9 14/17] mm: remove extra vma_numab_state_init() call Suren Baghdasaryan
2025-01-13 16:28 ` Lorenzo Stoakes
2025-01-13 17:56 ` Suren Baghdasaryan
2025-01-14 11:45 ` Lorenzo Stoakes
2025-01-11 4:26 ` [PATCH v9 15/17] mm: prepare lock_vma_under_rcu() for vma reuse possibility Suren Baghdasaryan
2025-01-11 4:26 ` [PATCH v9 16/17] mm: make vma cache SLAB_TYPESAFE_BY_RCU Suren Baghdasaryan
2025-01-15 2:27 ` Wei Yang
2025-01-15 3:15 ` Suren Baghdasaryan
2025-01-15 3:58 ` Liam R. Howlett
2025-01-15 5:41 ` Suren Baghdasaryan
2025-01-15 3:59 ` Mateusz Guzik
2025-01-15 5:47 ` Suren Baghdasaryan
2025-01-15 5:51 ` Mateusz Guzik
2025-01-15 6:41 ` Suren Baghdasaryan
2025-01-15 7:58 ` Vlastimil Babka
2025-01-15 15:10 ` Suren Baghdasaryan
2025-02-13 22:56 ` Suren Baghdasaryan
2025-01-15 12:17 ` Wei Yang
2025-01-15 21:46 ` Suren Baghdasaryan
2025-01-11 4:26 ` [PATCH v9 17/17] docs/mm: document latest changes to vm_lock Suren Baghdasaryan
2025-01-13 16:33 ` Lorenzo Stoakes
2025-01-13 17:56 ` Suren Baghdasaryan
2025-01-11 4:52 ` [PATCH v9 00/17] reimplement per-vma lock as a refcount Matthew Wilcox
2025-01-11 9:45 ` Suren Baghdasaryan
2025-01-13 12:14 ` Lorenzo Stoakes
2025-01-13 16:58 ` Suren Baghdasaryan
2025-01-13 17:11 ` Lorenzo Stoakes
2025-01-13 19:00 ` Suren Baghdasaryan
2025-01-14 11:35 ` Lorenzo Stoakes
2025-01-14 1:49 ` Andrew Morton
2025-01-14 2:53 ` Suren Baghdasaryan
2025-01-14 4:09 ` Andrew Morton
2025-01-14 9:09 ` Vlastimil Babka
2025-01-14 10:27 ` Hillf Danton
2025-01-14 9:47 ` Lorenzo Stoakes
2025-01-14 14:59 ` Liam R. Howlett
2025-01-14 15:54 ` Suren Baghdasaryan
2025-01-15 11:34 ` Lorenzo Stoakes
2025-01-15 15:14 ` Suren Baghdasaryan
2025-01-28 5:26 ` Shivank Garg
2025-01-28 5:50 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJuCfpGeHaQu_F6mJo70S+cxWD_PvhDCvfBsD-UXWqwMuOTRYA@mail.gmail.com \
--to=surenb@google.com \
--cc=akpm@linux-foundation.org \
--cc=boqun.feng@gmail.com \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dave@stgolabs.net \
--cc=david.laight.linux@gmail.com \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kernel-team@android.com \
--cc=klarasmodin@gmail.com \
--cc=liam.howlett@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mark.rutland@arm.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=minchan@google.com \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=oliver.sang@intel.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulmck@kernel.org \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=souravpanda@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox