Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jann Horn <jannh@google.com>
To: Boqun Feng <boqun.feng@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
	linux-mm@kvack.org,  lkmm@vger.kernel.org,
	Frederic Weisbecker <frederic@kernel.org>,
	 Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	 Josh Triplett <josh@joshtriplett.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	 Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	 Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
	 Waiman Long <longman@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	 Thomas Gleixner <tglx@linutronix.de>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	maged.michael@gmail.com,
	 Neeraj Upadhyay <neeraj.upadhyay@amd.com>
Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers
Date: Thu, 19 Sep 2024 02:12:48 +0200	[thread overview]
Message-ID: <CAG48ez0VN8oZcqhdzkWQgNv6bwUN=MUu5EacLg5iPvMQL+R-Qg@mail.gmail.com> (raw)
In-Reply-To: <20240917143402.930114-2-boqun.feng@gmail.com>

On Tue, Sep 17, 2024 at 4:33 PM Boqun Feng <boqun.feng@gmail.com> wrote:
> Hazard pointers [1] provide a way to dynamically distribute refcounting
> and can be used to improve the scalability of refcounting without
> significant space cost.

> +static inline void *__hazptr_tryprotect(hazptr_t *hzp,
> +                                       void *const *p,
> +                                       unsigned long head_offset)
> +{
> +       void *ptr;
> +       struct callback_head *head;
> +
> +       ptr = READ_ONCE(*p);
> +
> +       if (ptr == NULL)
> +               return NULL;
> +
> +       head = (struct callback_head *)(ptr + head_offset);
> +
> +       WRITE_ONCE(*hzp, head);
> +       smp_mb();
> +
> +       ptr = READ_ONCE(*p); // read again
> +
> +       if (ptr + head_offset != head) { // pointer changed
> +               WRITE_ONCE(*hzp, NULL);  // reset hazard pointer
> +               return NULL;
> +       } else
> +               return ptr;
> +}

I got nerdsniped by the Plumbers talk. So, about that smp_mb()...

I think you should be able to avoid the smp_mb() using relaxed atomics
(on architectures that have those), at the cost of something like a
cmpxchg-acquire sandwiched between a load-acquire and a relaxed load?
I'm not sure how their cost compares to an smp_mb() though.

Something like this, *assuming there can only be one context at a time
waiting for a given hazptr_t*:

typedef struct {
  /* consists of: marker bit in least significant bit, rest is normal pointer */
  atomic_long_t value;
} hazptr_t;

/* assumes that hzp is currently set to NULL (but it may contain a
marker bit) */
static inline void *__hazptr_tryprotect(hazptr_t *hzp, void *const *p) {
  /* note that the loads of these three operations are ordered using
acquire semantics */
  void *ptr = smp_load_acquire(p);
  /* set pointer while leaving marker bit intact */
  unsigned long hazard_scanning =
atomic_long_fetch_or_acquire((unsigned long)ptr, &hzp->value);
  if (unlikely(hazard_scanning)) {
    BUG_ON(hazard_scanning != 1);
    /* slowpath, concurrent hazard pointer waiter */
    smp_mb();
  }
  if (READ_ONCE(*p) == ptr) { /* recheck */
    atomic_long_and(~1UL, &hzp->value);
    return NULL;
  }
  return ptr;
}

/* simplified for illustration, assumes there's only a single hazard
pointer @hzp that could point to @ptr */
static void remove_pointer_and_wait_for_hazard(hazptr_t *hzp, void
*ptr, void *const *p) {
  WRITE_ONCE(*p, NULL);
  smb_mb();
  /* set marker bit */
  atomic_long_or(1UL, &hzp->value);
  while ((void*)(atomic_long_read(&hzp->value) & ~1UL) == ptr))
    wait();
  /* clear marker bit */
  atomic_long_and(~1UL, &hzp->value);
}

The idea would be that the possible orderings when these two functions
race against each other are:

Ordering A: The atomic_long_fetch_or_acquire() in
__hazptr_tryprotect() happens after the atomic_long_or(), two
subcases:
Ordering A1 (slowpath): atomic_long_fetch_or_acquire() is ordered
before the atomic_long_and() in remove_pointer_and_wait_for_hazard(),
so the marker bit is observed set, "hazard_scanning" is true. We go on
the safe slowpath which is like the original patch, so it's safe.
Ordering A2 (recheck fails): atomic_long_fetch_or_acquire() is ordered
after the atomic_long_and() in remove_pointer_and_wait_for_hazard(),
so the subsequent READ_ONCE(*p) is also ordered after the
atomic_long_and(), which is ordered after the WRITE_ONCE(*p, NULL), so
the READ_ONCE(*p) recheck must see a NULL pointer and fail.
Ordering B (hazard pointer visible): The
atomic_long_fetch_or_acquire() in __hazptr_tryprotect() happens before
the atomic_long_or(). In that case, it also happens before the
atomic_long_read() in remove_pointer_and_wait_for_hazard(), so the
hazard pointer will be visible to
remove_pointer_and_wait_for_hazard().

But this seems pretty gnarly/complicated, so even if my 2AM reasoning
ability is correct, actually implementing this might not be a good
idea... and it definitely wouldn't help on X86 at all, since X86
doesn't have such nice relaxed RMW ops.

next prev parent reply	other threads:[~2024-09-19  0:13 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-17 14:33 [RFC PATCH 0/4] Add hazard pointers to kernel Boqun Feng
2024-09-17 14:33 ` [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers Boqun Feng
2024-09-18  8:27   ` Mathieu Desnoyers
2024-09-18 15:17   ` Alan Huang
2024-09-19  6:56     ` Boqun Feng
2024-09-19 18:07       ` Jonas Oberhauser
2024-09-19  0:12   ` Jann Horn [this message]
2024-09-19 20:30     ` Jonas Oberhauser
2024-09-20  7:43       ` Jonas Oberhauser
2024-09-19  6:39   ` Lai Jiangshan
2024-09-19  7:10     ` Boqun Feng
2024-09-19 12:33       ` Alan Huang
2024-09-19 13:57       ` Alan Huang
2024-09-19 18:58         ` Boqun Feng
2024-09-19 19:53           ` Alan Huang
2024-09-19 16:10       ` Alan Huang
2024-09-19 14:00   ` Jonas Oberhauser
2024-09-20  7:41   ` Jonas Oberhauser
2024-09-25 10:02     ` Boqun Feng
2024-09-25 10:11       ` Jonas Oberhauser
2024-09-25 10:45         ` Boqun Feng
2024-09-25 11:59           ` Mathieu Desnoyers
2024-09-25 12:16             ` Boqun Feng
2024-09-25 12:47               ` Mathieu Desnoyers
2024-09-25 13:10                 ` Mathieu Desnoyers
2024-09-25 13:20                   ` Mathieu Desnoyers
2024-09-26  6:16                     ` Mathieu Desnoyers
2024-09-26 15:53                       ` Jonas Oberhauser
2024-09-26 16:12                         ` Linus Torvalds
2024-09-26 16:40                           ` Jonas Oberhauser
2024-09-26 16:54                             ` Linus Torvalds
2024-09-27  0:01                               ` Boqun Feng
2024-09-27  1:30                                 ` Mathieu Desnoyers
2024-09-27  1:37                                   ` Boqun Feng
2024-09-27  4:28                                     ` Boqun Feng
2024-09-27 10:59                                       ` Mathieu Desnoyers
2024-09-27 14:43                                         ` Mathieu Desnoyers
2024-09-27 15:22                                           ` Mathieu Desnoyers
2024-09-27 16:06                                       ` Alan Huang
2024-09-27 16:44                                     ` Linus Torvalds
2024-09-27 17:15                                       ` Mathieu Desnoyers
2024-09-27 17:23                                         ` Linus Torvalds
2024-09-27 17:51                                           ` Mathieu Desnoyers
2024-09-27 18:13                                             ` Linus Torvalds
2024-09-27 19:12                                               ` Jonas Oberhauser
2024-09-27 19:28                                                 ` Linus Torvalds
2024-09-27 20:24                                                   ` Linus Torvalds
2024-09-27 20:02                                               ` Mathieu Desnoyers
2024-09-27  1:20                           ` Mathieu Desnoyers
2024-09-27  4:38                             ` Boqun Feng
2024-09-27 19:23                               ` Jonas Oberhauser
2024-09-27 20:10                                 ` Mathieu Desnoyers
2024-09-27 22:18                                   ` Jonas Oberhauser
2024-09-28 22:10                                     ` Alan Huang
2024-09-28 23:12                                       ` Alan Huang
2024-09-25 12:19             ` Jonas Oberhauser
2024-09-17 14:34 ` [RFC PATCH 2/4] refscale: Add benchmarks for hazptr Boqun Feng
2024-09-17 14:34 ` [RFC PATCH 3/4] refscale: Add benchmarks for percpu_ref Boqun Feng
2024-09-17 14:34 ` [RFC PATCH 4/4] WIP: hazptr: Add hazptr test sample Boqun Feng
2024-09-18  7:18 ` [RFC PATCH 0/4] Add hazard pointers to kernel Linus Torvalds
2024-09-18 22:44   ` Neeraj Upadhyay
2024-09-19  6:46     ` Linus Torvalds
2024-09-20  5:00       ` Neeraj Upadhyay
2024-09-19 14:30     ` Mateusz Guzik
2024-09-19 14:14   ` Christoph Hellwig
2024-09-19 14:21     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez0VN8oZcqhdzkWQgNv6bwUN=MUu5EacLg5iPvMQL+R-Qg@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkmm@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=maged.michael@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=neeraj.upadhyay@amd.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox