From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
To: Jann Horn <jannh@google.com>, Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
linux-mm@kvack.org, lkmm@vger.kernel.org,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Mark Rutland <mark.rutland@arm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Kent Overstreet <kent.overstreet@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
maged.michael@gmail.com,
Neeraj Upadhyay <neeraj.upadhyay@amd.com>,
lkmm@lists.linux.dev
Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers
Date: Fri, 20 Sep 2024 09:43:52 +0200 [thread overview]
Message-ID: <16c2aba7-212d-4612-8cea-50c64626d8f3@huaweicloud.com> (raw)
In-Reply-To: <abf4a899-e53e-41ac-91d6-1865ffeff5c6@huaweicloud.com>
Am 9/19/2024 um 10:30 PM schrieb Jonas Oberhauser:
>
>
>
> Am 9/19/2024 um 2:12 AM schrieb Jann Horn:
>> On Tue, Sep 17, 2024 at 4:33 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>>> Hazard pointers [1] provide a way to dynamically distribute refcounting
>>> and can be used to improve the scalability of refcounting without
>>> significant space cost.
>>
>>> +static inline void *__hazptr_tryprotect(hazptr_t *hzp,
>>> + void *const *p,
>>> + unsigned long head_offset)
>>> +{
>>> + void *ptr;
>>> + struct callback_head *head;
>>> +
>>> + ptr = READ_ONCE(*p);
>>> +
>>> + if (ptr == NULL)
>>> + return NULL;
>>> +
>>> + head = (struct callback_head *)(ptr + head_offset);
>>> +
>>> + WRITE_ONCE(*hzp, head);
>>> + smp_mb();
>>> +
>>> + ptr = READ_ONCE(*p); // read again
>>> +
>>> + if (ptr + head_offset != head) { // pointer changed
>>> + WRITE_ONCE(*hzp, NULL); // reset hazard pointer
>>> + return NULL;
>>> + } else
>>> + return ptr;
>>> +}
>>
>> I got nerdsniped by the Plumbers talk. So, about that smp_mb()...
>>
>> I think you should be able to avoid the smp_mb() using relaxed atomics
>> (on architectures that have those), at the cost of something like a
>> cmpxchg-acquire sandwiched between a load-acquire and a relaxed load?
>> I'm not sure how their cost compares to an smp_mb() though.
>
>
>
> We have done a similar scheme before, and on some architectures (not
> x86) the RMW is slightly cheaper than the mb.
>
> Your reasoning is a bit simplified because it seems to assume a stronger
> concept of ordering than LKMM has, but even with LKMM's ordering your
> code seems fine.
>
> I feel it can even be simplified a little, the hazard bit does not seem
> necessary.
>
> Assuming atomic operations for everything racy, relaxed unless stated
> otherwise:
>
> (R)eader:
>
> old = read p // I don't think this needs acq, because of address
> dependencies (*)
> haz ||=_acq old
> if (read p != old) retry;
I realized before going to bed that I copied a subtle mistake here from
Jann's code, we need an address dependency from this read, or it is not
ABA safe.
This can be done with the small detour that Boqun used:
head = read p // I don't think this needs acq, because of address
dependencies (*)
haz ||=_acq head
old = read p // again
if (head != old) retry;
barrier(); // ensure compiler does not use its knowledge that head
== old to do *head instead!
*old // definitely use the second read pointer so hardware can't
speculatively dereference this before the second read!
Have a good time,
jonas
next prev parent reply other threads:[~2024-09-20 7:44 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-17 14:33 [RFC PATCH 0/4] Add hazard pointers to kernel Boqun Feng
2024-09-17 14:33 ` [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers Boqun Feng
2024-09-18 8:27 ` Mathieu Desnoyers
2024-09-18 15:17 ` Alan Huang
2024-09-19 6:56 ` Boqun Feng
2024-09-19 18:07 ` Jonas Oberhauser
2024-09-19 0:12 ` Jann Horn
2024-09-19 20:30 ` Jonas Oberhauser
2024-09-20 7:43 ` Jonas Oberhauser [this message]
2024-09-19 6:39 ` Lai Jiangshan
2024-09-19 7:10 ` Boqun Feng
2024-09-19 12:33 ` Alan Huang
2024-09-19 13:57 ` Alan Huang
2024-09-19 18:58 ` Boqun Feng
2024-09-19 19:53 ` Alan Huang
2024-09-19 16:10 ` Alan Huang
2024-09-19 14:00 ` Jonas Oberhauser
2024-09-20 7:41 ` Jonas Oberhauser
2024-09-25 10:02 ` Boqun Feng
2024-09-25 10:11 ` Jonas Oberhauser
2024-09-25 10:45 ` Boqun Feng
2024-09-25 11:59 ` Mathieu Desnoyers
2024-09-25 12:16 ` Boqun Feng
2024-09-25 12:47 ` Mathieu Desnoyers
2024-09-25 13:10 ` Mathieu Desnoyers
2024-09-25 13:20 ` Mathieu Desnoyers
2024-09-26 6:16 ` Mathieu Desnoyers
2024-09-26 15:53 ` Jonas Oberhauser
2024-09-26 16:12 ` Linus Torvalds
2024-09-26 16:40 ` Jonas Oberhauser
2024-09-26 16:54 ` Linus Torvalds
2024-09-27 0:01 ` Boqun Feng
2024-09-27 1:30 ` Mathieu Desnoyers
2024-09-27 1:37 ` Boqun Feng
2024-09-27 4:28 ` Boqun Feng
2024-09-27 10:59 ` Mathieu Desnoyers
2024-09-27 14:43 ` Mathieu Desnoyers
2024-09-27 15:22 ` Mathieu Desnoyers
2024-09-27 16:06 ` Alan Huang
2024-09-27 16:44 ` Linus Torvalds
2024-09-27 17:15 ` Mathieu Desnoyers
2024-09-27 17:23 ` Linus Torvalds
2024-09-27 17:51 ` Mathieu Desnoyers
2024-09-27 18:13 ` Linus Torvalds
2024-09-27 19:12 ` Jonas Oberhauser
2024-09-27 19:28 ` Linus Torvalds
2024-09-27 20:24 ` Linus Torvalds
2024-09-27 20:02 ` Mathieu Desnoyers
2024-09-27 1:20 ` Mathieu Desnoyers
2024-09-27 4:38 ` Boqun Feng
2024-09-27 19:23 ` Jonas Oberhauser
2024-09-27 20:10 ` Mathieu Desnoyers
2024-09-27 22:18 ` Jonas Oberhauser
2024-09-28 22:10 ` Alan Huang
2024-09-28 23:12 ` Alan Huang
2024-09-25 12:19 ` Jonas Oberhauser
2024-09-17 14:34 ` [RFC PATCH 2/4] refscale: Add benchmarks for hazptr Boqun Feng
2024-09-17 14:34 ` [RFC PATCH 3/4] refscale: Add benchmarks for percpu_ref Boqun Feng
2024-09-17 14:34 ` [RFC PATCH 4/4] WIP: hazptr: Add hazptr test sample Boqun Feng
2024-09-18 7:18 ` [RFC PATCH 0/4] Add hazard pointers to kernel Linus Torvalds
2024-09-18 22:44 ` Neeraj Upadhyay
2024-09-19 6:46 ` Linus Torvalds
2024-09-20 5:00 ` Neeraj Upadhyay
2024-09-19 14:30 ` Mateusz Guzik
2024-09-19 14:14 ` Christoph Hellwig
2024-09-19 14:21 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=16c2aba7-212d-4612-8cea-50c64626d8f3@huaweicloud.com \
--to=jonas.oberhauser@huaweicloud.com \
--cc=boqun.feng@gmail.com \
--cc=frederic@kernel.org \
--cc=jannh@google.com \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=kent.overstreet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkmm@lists.linux.dev \
--cc=lkmm@vger.kernel.org \
--cc=longman@redhat.com \
--cc=maged.michael@gmail.com \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@redhat.com \
--cc=neeraj.upadhyay@amd.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox