Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
To: Jann Horn <jannh@google.com>, Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
	linux-mm@kvack.org, lkmm@vger.kernel.org,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	Waiman Long <longman@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	maged.michael@gmail.com,
	Neeraj Upadhyay <neeraj.upadhyay@amd.com>,
	lkmm@lists.linux.dev
Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers
Date: Thu, 19 Sep 2024 22:30:34 +0200	[thread overview]
Message-ID: <abf4a899-e53e-41ac-91d6-1865ffeff5c6@huaweicloud.com> (raw)
In-Reply-To: <CAG48ez0VN8oZcqhdzkWQgNv6bwUN=MUu5EacLg5iPvMQL+R-Qg@mail.gmail.com>

Am 9/19/2024 um 2:12 AM schrieb Jann Horn:
> On Tue, Sep 17, 2024 at 4:33 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>> Hazard pointers [1] provide a way to dynamically distribute refcounting
>> and can be used to improve the scalability of refcounting without
>> significant space cost.
> 
>> +static inline void *__hazptr_tryprotect(hazptr_t *hzp,
>> +                                       void *const *p,
>> +                                       unsigned long head_offset)
>> +{
>> +       void *ptr;
>> +       struct callback_head *head;
>> +
>> +       ptr = READ_ONCE(*p);
>> +
>> +       if (ptr == NULL)
>> +               return NULL;
>> +
>> +       head = (struct callback_head *)(ptr + head_offset);
>> +
>> +       WRITE_ONCE(*hzp, head);
>> +       smp_mb();
>> +
>> +       ptr = READ_ONCE(*p); // read again
>> +
>> +       if (ptr + head_offset != head) { // pointer changed
>> +               WRITE_ONCE(*hzp, NULL);  // reset hazard pointer
>> +               return NULL;
>> +       } else
>> +               return ptr;
>> +}
> 
> I got nerdsniped by the Plumbers talk. So, about that smp_mb()...
> 
> I think you should be able to avoid the smp_mb() using relaxed atomics
> (on architectures that have those), at the cost of something like a
> cmpxchg-acquire sandwiched between a load-acquire and a relaxed load?
> I'm not sure how their cost compares to an smp_mb() though.

We have done a similar scheme before, and on some architectures (not 
x86) the RMW is slightly cheaper than the mb.

Your reasoning is a bit simplified because it seems to assume a stronger 
concept of ordering than LKMM has, but even with LKMM's ordering your 
code seems fine.

I feel it can even be simplified a little, the hazard bit does not seem 
necessary.

Assuming atomic operations for everything racy, relaxed unless stated 
otherwise:

(R)eader:

    old = read p // I don't think this needs acq, because of address 
dependencies (*)
    haz ||=_acq old
    if (read p != old) retry;
    *old

(W)riter:

    p =_??? ... // assuming we don't set it to null this needs a rel
    --- mb ---
    haz ||= 0
    while (read_acq haz == old) retry;
    delete old

In order to get a use-after-free, both of the  R:read p  need to read 
before  W:p = ... , so because of the W:mb, they execute before  W:haz||=0 .

Also, for the use-after-free,  W:read_acq haz  needs to read before (the 
write part of)  R:haz||=_acq old .

Then the  W:haz ||= 0  also needs to read before (the write part of) 
R:haz||=_acq old  because of coherence on the same location.

Since both of them are atomic RMW, the  W:haz||= 0  also needs to write 
before (the write part of)  R:haz||=_acq old , and in the absence of 
non-RMW writes (so assuming no thread will just store to the hazard 
pointer), this implies that the latter reads-from an rmw-sequence-later 
store than  W:haz||=0 , which therefore executes before  R:haz||=_acq old .

But because of the acquire barrier,  R:haz||=_acq old  executes before 
the second  R:read p .

This gives a cycle
    (2nd)R:read p  ->xb  W:haz||=0  ->xb  R:haz||=_acq  ->xb  (2nd)R:read p

and therefore is forbidden.

Note that in this argument, the two  R:read p  are not necessarily 
reading from the same store. Because of ABA, it can happen that they 
read from distinct stores and see the same value.

It does require clearing hazard pointers with an RMW like atomic_and(0) 
(**). The combination of the two RMW (for setting & clearing the hazard 
pointer) might in total be slower again than an mb.

(I took the liberty to add an acquire barrier in the writer's while loop 
for the case where we read from the (not shown) release store of the 
reader that clears the hazard pointer. It's arguable whether that acq is 
needed since one could argue that a delete is a kind of store, in which 
case control dependency would handle it.)

have fun,
   jonas

(*
   you talk about sandwiching, and the data dependency does guarantee 
some weaker form of sandwiching than the acq, but I don't think 
sandwiching is required at all. If you happened to be able to set the 
hazard pointer before reading the pointer, that would just extend the 
protected area, wouldn't it?
)

(**
I think if you clear the pointer with a store, the hazard bit does not 
help. You could be overwriting the hazard bit, and the RMWs that set the 
hazard bit might never propagate to your CPU.

Also in your code you probably meant to clear the whole hazard pointer 
in the retry code of the reader, not just the hazard bit.)

next prev parent reply	other threads:[~2024-09-19 20:31 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-17 14:33 [RFC PATCH 0/4] Add hazard pointers to kernel Boqun Feng
2024-09-17 14:33 ` [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers Boqun Feng
2024-09-18  8:27   ` Mathieu Desnoyers
2024-09-18 15:17   ` Alan Huang
2024-09-19  6:56     ` Boqun Feng
2024-09-19 18:07       ` Jonas Oberhauser
2024-09-19  0:12   ` Jann Horn
2024-09-19 20:30     ` Jonas Oberhauser [this message]
2024-09-20  7:43       ` Jonas Oberhauser
2024-09-19  6:39   ` Lai Jiangshan
2024-09-19  7:10     ` Boqun Feng
2024-09-19 12:33       ` Alan Huang
2024-09-19 13:57       ` Alan Huang
2024-09-19 18:58         ` Boqun Feng
2024-09-19 19:53           ` Alan Huang
2024-09-19 16:10       ` Alan Huang
2024-09-19 14:00   ` Jonas Oberhauser
2024-09-20  7:41   ` Jonas Oberhauser
2024-09-25 10:02     ` Boqun Feng
2024-09-25 10:11       ` Jonas Oberhauser
2024-09-25 10:45         ` Boqun Feng
2024-09-25 11:59           ` Mathieu Desnoyers
2024-09-25 12:16             ` Boqun Feng
2024-09-25 12:47               ` Mathieu Desnoyers
2024-09-25 13:10                 ` Mathieu Desnoyers
2024-09-25 13:20                   ` Mathieu Desnoyers
2024-09-26  6:16                     ` Mathieu Desnoyers
2024-09-26 15:53                       ` Jonas Oberhauser
2024-09-26 16:12                         ` Linus Torvalds
2024-09-26 16:40                           ` Jonas Oberhauser
2024-09-26 16:54                             ` Linus Torvalds
2024-09-27  0:01                               ` Boqun Feng
2024-09-27  1:30                                 ` Mathieu Desnoyers
2024-09-27  1:37                                   ` Boqun Feng
2024-09-27  4:28                                     ` Boqun Feng
2024-09-27 10:59                                       ` Mathieu Desnoyers
2024-09-27 14:43                                         ` Mathieu Desnoyers
2024-09-27 15:22                                           ` Mathieu Desnoyers
2024-09-27 16:06                                       ` Alan Huang
2024-09-27 16:44                                     ` Linus Torvalds
2024-09-27 17:15                                       ` Mathieu Desnoyers
2024-09-27 17:23                                         ` Linus Torvalds
2024-09-27 17:51                                           ` Mathieu Desnoyers
2024-09-27 18:13                                             ` Linus Torvalds
2024-09-27 19:12                                               ` Jonas Oberhauser
2024-09-27 19:28                                                 ` Linus Torvalds
2024-09-27 20:24                                                   ` Linus Torvalds
2024-09-27 20:02                                               ` Mathieu Desnoyers
2024-09-27  1:20                           ` Mathieu Desnoyers
2024-09-27  4:38                             ` Boqun Feng
2024-09-27 19:23                               ` Jonas Oberhauser
2024-09-27 20:10                                 ` Mathieu Desnoyers
2024-09-27 22:18                                   ` Jonas Oberhauser
2024-09-28 22:10                                     ` Alan Huang
2024-09-28 23:12                                       ` Alan Huang
2024-09-25 12:19             ` Jonas Oberhauser
2024-09-17 14:34 ` [RFC PATCH 2/4] refscale: Add benchmarks for hazptr Boqun Feng
2024-09-17 14:34 ` [RFC PATCH 3/4] refscale: Add benchmarks for percpu_ref Boqun Feng
2024-09-17 14:34 ` [RFC PATCH 4/4] WIP: hazptr: Add hazptr test sample Boqun Feng
2024-09-18  7:18 ` [RFC PATCH 0/4] Add hazard pointers to kernel Linus Torvalds
2024-09-18 22:44   ` Neeraj Upadhyay
2024-09-19  6:46     ` Linus Torvalds
2024-09-20  5:00       ` Neeraj Upadhyay
2024-09-19 14:30     ` Mateusz Guzik
2024-09-19 14:14   ` Christoph Hellwig
2024-09-19 14:21     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abf4a899-e53e-41ac-91d6-1865ffeff5c6@huaweicloud.com \
    --to=jonas.oberhauser@huaweicloud.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jannh@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkmm@lists.linux.dev \
    --cc=lkmm@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=maged.michael@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=neeraj.upadhyay@amd.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox