From: Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Alan Stern <stern@rowland.harvard.edu>,
John Stultz <jstultz@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Frederic Weisbecker <frederic@kernel.org>,
Joel Fernandes <joel@joelfernandes.org>,
Josh Triplett <josh@joshtriplett.org>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>,
Mark Rutland <mark.rutland@arm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>,
maged.michael@gmail.com, Mateusz Guzik <mjguzik@gmail.com>,
rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev
Subject: Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter
Date: Wed, 25 Sep 2024 07:57:43 +0200 [thread overview]
Message-ID: <e4721439-8cad-4134-8c07-84b6ecc69573@huaweicloud.com> (raw)
In-Reply-To: <20240921164210.256278-1-mathieu.desnoyers@efficios.com>
Hi Mathieu,
interesting idea. Conceptually it looks good.
There's another approach of using hazard pointer to optimize shared
reference counting (to make it lock-free in common cases).
https://github.com/cmuparlay/concurrent_deferred_rc
It doesn't go as far as what you're doing, but they also use the hazard
pointer to protect the refcount (like you do in the promote function).
I haven't read your code in detail but it seems to me you have an ABA
bug: as I explained elsewhere, you could read the same pointer after ABA
but you don't synchronize with the newer store that gave you node2,
leaving you to speculatively read stale values through *ctx->hp.
(I am assuming here that ctx->hp is essentially an out parameter used to
let the caller know which node got protected).
Have fun,
jonas
> +/*
> + * hpref_hp_get: Obtain a reference to a stable object, protected either
> + * by hazard pointer (fast-path) or using reference
> + * counter as fall-back.
> + */
> +static inline
> +bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx)
> +{
> + int cpu = rseq_current_cpu_raw();
> + struct hpref_percpu_slots *cpu_slots = rseq_percpu_ptr(hpref_percpu_slots, cpu);
> + struct hpref_slot *slot = &cpu_slots->slots[cpu_slots->current_slot];
> + bool use_refcount = false;
> + struct hpref_node *node, *node2;
> + unsigned int next_slot;
> +
> +retry:
> + node = uatomic_load(node_p, CMM_RELAXED);
> + if (!node)
> + return false;
> + /* Use rseq to try setting current slot hp. Store B. */
> + if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
> + (intptr_t *) &slot->node, (intptr_t) NULL,
> + (intptr_t) node, cpu)) {
> + slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];
> + use_refcount = true;
> + /*
> + * This may busy-wait for another reader using the
> + * emergency slot to transition to refcount.
> + */
> + caa_cpu_relax();
> + goto retry;
> + }
> + /* Memory ordering: Store B before Load A. */
> + cmm_smp_mb();
> + node2 = uatomic_load(node_p, CMM_RELAXED); /* Load A */
> + if (node != node2) {
> + uatomic_store(&slot->node, NULL, CMM_RELAXED);
> + if (!node2)
> + return false;
> + goto retry;
> + }
> + ctx->type = HPREF_TYPE_HP;
> + ctx->hp = node;
^---- here
> + ctx->slot = slot;
> + if (use_refcount) {
> + hpref_promote_hp_to_ref(ctx);
> + return true;
> + }
> + /*
> + * Increment current slot (racy increment is OK because it is
> + * just a position hint). Skip the emergency slot.
> + */
> + next_slot = uatomic_load(&cpu_slots->current_slot, CMM_RELAXED) + 1;
> + if (next_slot >= HPREF_EMERGENCY_SLOT)
> + next_slot = 0;
> + uatomic_store(&cpu_slots->current_slot, next_slot, CMM_RELAXED);
> + return true;
> +}
> +
> +static inline
> +void hpref_put(struct hpref_ctx *ctx)
> +{
> + if (ctx->type == HPREF_TYPE_REF) {
> + urcu_ref_put(&ctx->hp->refcount, hpref_release);
> + } else {
> + /* Release HP. */
> + uatomic_store(&ctx->slot->node, NULL, CMM_RELEASE);
> + }
> + ctx->hp = NULL;
> +}
> +
> +/*
> + * hpref_set_pointer: Store pointer @node to @ptr, with RCU publication
> + * guarantees.
> + */
> +static inline
> +void hpref_set_pointer(struct hpref_node **ptr, struct hpref_node *node)
> +{
> + if (__builtin_constant_p(node) && node == NULL)
> + uatomic_store(ptr, NULL, CMM_RELAXED);
> + else
> + uatomic_store(ptr, node, CMM_RELEASE);
> +}
> +
> +/*
> + * Return the content of the hpref context hazard pointer field.
> + */
> +static inline
> +struct hpref_node *hpref_ctx_pointer(struct hpref_ctx *ctx)
> +{
> + return ctx->hp;
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _URCU_HPREF_H */
> diff --git a/src/Makefile.am b/src/Makefile.am
> index b555c818..7312c9f7 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -19,7 +19,8 @@ RCULFHASH = rculfhash.c rculfhash-mm-order.c rculfhash-mm-chunk.c \
> lib_LTLIBRARIES = liburcu-common.la \
> liburcu.la liburcu-qsbr.la \
> liburcu-mb.la liburcu-bp.la \
> - liburcu-memb.la liburcu-cds.la
> + liburcu-memb.la liburcu-cds.la \
> + liburcu-hpref.la
>
> #
> # liburcu-common contains wait-free queues (needed by call_rcu) as well
> @@ -50,6 +51,9 @@ liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c lfstack.c \
> workqueue.c workqueue.h $(RCULFHASH) $(COMPAT)
> liburcu_cds_la_LIBADD = liburcu-common.la
>
> +liburcu_hpref_la_SOURCES = hpref.c
> +liburcu_hpref_la_LIBADD = -lrseq
> +
> pkgconfigdir = $(libdir)/pkgconfig
> pkgconfig_DATA = liburcu-cds.pc liburcu.pc liburcu-bp.pc liburcu-qsbr.pc \
> liburcu-mb.pc liburcu-memb.pc
> diff --git a/src/hpref.c b/src/hpref.c
> new file mode 100644
> index 00000000..f63530f5
> --- /dev/null
> +++ b/src/hpref.c
> @@ -0,0 +1,78 @@
> +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> +//
> +// SPDX-License-Identifier: LGPL-2.1-or-later
> +
> +/*
> + * HPREF: Hazard pointers with reference counters
> + */
> +
> +#define _LGPL_SOURCE
> +#include <urcu/hpref.h>
> +#include <rseq/mempool.h> /* Per-CPU memory */
> +
> +static struct rseq_mempool *mempool;
> +struct hpref_percpu_slots *hpref_percpu_slots;
> +
> +void hpref_release(struct urcu_ref *ref)
> +{
> + struct hpref_node *node = caa_container_of(ref, struct hpref_node, refcount);
> +
> + node->release(node);
> +}
> +
> +/*
> + * hpref_synchronize: Wait for any reader possessing a hazard pointer to
> + * @node to clear its hazard pointer slot.
> + */
> +void hpref_synchronize(struct hpref_node *node)
> +{
> + int nr_cpus = rseq_get_max_nr_cpus(), cpu;
> +
> + /* Memory ordering: Store A before Load B. */
> + cmm_smp_mb();
> + /* Scan all CPUs slots. */
> + for (cpu = 0; cpu < nr_cpus; cpu++) {
> + struct hpref_percpu_slots *cpu_slots = rseq_percpu_ptr(hpref_percpu_slots, cpu);
> + struct hpref_slot *slot;
> + unsigned int i;
> +
> + for (i = 0; i < HPREF_NR_PERCPU_SLOTS; i++) {
> + slot = &cpu_slots->slots[i];
> + /* Busy-wait if node is found. */
> + while (uatomic_load(&slot->node, CMM_ACQUIRE) == node) /* Load B */
> + caa_cpu_relax();
> + }
> + }
> +}
> +
> +/*
> + * hpref_synchronize_put: Wait for any reader possessing a hazard
> + * pointer to clear its slot and put reference
> + * count.
> + */
> +void hpref_synchronize_put(struct hpref_node *node)
> +{
> + if (!node)
> + return;
> + hpref_synchronize(node);
> + urcu_ref_put(&node->refcount, hpref_release);
> +}
> +
> +static __attribute__((constructor))
> +void hpref_init(void)
> +{
> + mempool = rseq_mempool_create("hpref", sizeof(struct hpref_percpu_slots), NULL);
> + if (!mempool)
> + abort();
> + hpref_percpu_slots = rseq_mempool_percpu_zmalloc(mempool);
> + if (!hpref_percpu_slots)
> + abort();
> +}
> +
> +static __attribute__((destructor))
> +void hpref_exit(void)
> +{
> + rseq_mempool_percpu_free(hpref_percpu_slots);
> + if (rseq_mempool_destroy(mempool))
> + abort();
> +}
next prev parent reply other threads:[~2024-09-25 5:58 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-21 16:42 Mathieu Desnoyers
2024-09-21 21:07 ` Lai Jiangshan
2024-09-22 7:47 ` Mathieu Desnoyers
2024-09-25 5:57 ` Jonas Oberhauser [this message]
2024-09-25 6:35 ` Mathieu Desnoyers
2024-09-25 10:06 ` Jonas Oberhauser
2024-09-25 11:36 ` Mathieu Desnoyers
2024-09-25 12:02 ` Jonas Oberhauser
2024-09-28 11:22 ` Jonas Oberhauser
2024-09-28 11:33 ` Mathieu Desnoyers
2024-09-28 11:50 ` Jonas Oberhauser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e4721439-8cad-4134-8c07-84b6ecc69573@huaweicloud.com \
--to=jonas.oberhauser@huaweicloud.com \
--cc=boqun.feng@gmail.com \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=josh@joshtriplett.org \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkmm@lists.linux.dev \
--cc=longman@redhat.com \
--cc=maged.michael@gmail.com \
--cc=mark.rutland@arm.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=stern@rowland.harvard.edu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox