From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Will Deacon <will@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Alan Stern <stern@rowland.harvard.edu>,
John Stultz <jstultz@google.com>,
Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Frederic Weisbecker <frederic@kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>,
Mark Rutland <mark.rutland@arm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vlastimil Babka <vbabka@suse.cz>,
maged.michael@gmail.com, Mateusz Guzik <mjguzik@gmail.com>,
Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>,
rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev
Subject: Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers
Date: Thu, 18 Dec 2025 12:35:18 -0500 [thread overview]
Message-ID: <fcbc99d0-b7c7-45f7-9cf8-f5d3bfad966e@efficios.com> (raw)
In-Reply-To: <aUO9KRaP0FXpm_l9@tardis-2.local>
On 2025-12-18 03:36, Boqun Feng wrote:
> On Wed, Dec 17, 2025 at 08:45:30PM -0500, Mathieu Desnoyers wrote:
[...]
>> +static inline
>> +void *hazptr_acquire(struct hazptr_ctx *ctx, void * const * addr_p)
>> +{
>> + struct hazptr_slot *slot = NULL;
>> + void *addr, *addr2;
>> +
>> + /*
>> + * Load @addr_p to know which address should be protected.
>> + */
>> + addr = READ_ONCE(*addr_p);
>> + for (;;) {
>> + if (!addr)
>> + return NULL;
>> + guard(preempt)();
>> + if (likely(!hazptr_slot_is_backup(ctx, slot))) {
>> + slot = hazptr_get_free_percpu_slot();
>
> I need to continue share my concerns about this "allocating slot while
> protecting" pattern. Here realistically, we will go over a few of the
> per-CPU hazard pointer slots *every time* instead of directly using a
> pre-allocated hazard pointer slot.
No, that's not the expected fast-path with CONFIG_PREEMPT_HAZPTR=y
(introduced in patch 4/4).
With PREEMPT_HAZPTR, using more than one hazard pointer per CPU will
only happen if there are nested hazard pointer users, which can happen
due to:
- Holding a hazard pointer across function calls, where another hazard
pointer is used.
- Using hazard pointers from interrupt handlers (note: my current code
only does preempt disable, not irq disable, this is something I'd need
to change if we wish to acquire/release hazard pointers from interrupt
handlers). But even that should be a rare event.
So the fast-path has an initial state where there are no hazard pointers
in use on the CPU, which means hazptr_acquire() finds its empty slot at
index 0.
> Could you utilize this[1] to see a
> comparison of the reader-side performance against RCU/SRCU?
Good point ! Let's see.
On a AMD 2x EPYC 9654 96-Core Processor with 192 cores,
hyperthreading disabled,
CONFIG_PREEMPT=y,
CONFIG_PREEMPT_RCU=y,
CONFIG_PREEMPT_HAZPTR=y.
scale_type ns
-----------------------
hazptr-smp-mb 13.1 <- this implementation
hazptr-barrier 11.5 <- replace smp_mb() on acquire with barrier(), requires IPIs on synchronize.
hazptr-smp-mb-hlist 12.7 <- replace per-task hp context and per-cpu overflow lists by hlist.
rcu 17.0
srcu 20.0
srcu-fast 1.5
rcu-tasks 0.0
rcu-trace 1.7
refcnt 1148.0
rwlock 1190.0
rwsem 4199.3
lock 41070.6
lock-irq 46176.3
acqrel 1.1
So only srcu-fast, rcu-tasks, rcu-trace and a plain acqrel
appear to beat hazptr read-side performance.
[...]
>> +/*
>> + * Perform piecewise iteration on overflow list waiting until "addr" is
>> + * not present. Raw spinlock is released and taken between each list
>> + * item and busy loop iteration. The overflow list generation is checked
>> + * each time the lock is taken to validate that the list has not changed
>> + * before resuming iteration or busy wait. If the generation has
>> + * changed, retry the entire list traversal.
>> + */
>> +static
>> +void hazptr_synchronize_overflow_list(struct overflow_list *overflow_list, void *addr)
>> +{
>> + struct hazptr_backup_slot *backup_slot;
>> + uint64_t snapshot_gen;
>> +
>> + raw_spin_lock(&overflow_list->lock);
>> +retry:
>> + snapshot_gen = overflow_list->gen;
>> + list_for_each_entry(backup_slot, &overflow_list->head, node) {
>> + /* Busy-wait if node is found. */
>> + while (smp_load_acquire(&backup_slot->slot.addr) == addr) { /* Load B */
>> + raw_spin_unlock(&overflow_list->lock);
>> + cpu_relax();
>
> I think we should prioritize the scan thread solution [2] instead of
> busy waiting hazrd pointer updaters, because when we have multiple
> hazard pointer usages we would want to consolidate the scans from
> updater side.
I agree that batching scans with a worker thread is a logical next step.
> If so, the whole ->gen can be avoided.
How would it allow removing the generation trick without causing long
raw spinlock latencies ?
>
> However this ->gen idea does seem ot resolve another issue for me, I'm
> trying to make shazptr critical section preemptive by using a per-task
> backup slot (if you recall, this is your idea from the hallway
> discussions we had during LPC 2024),
I honestly did not remember. It's been a whole year! ;-)
> and currently I could not make it
> work because the following sequeue:
>
> 1. CPU 0 already has one pointer protected.
>
> 2. CPU 1 begins the updater scan, and it scans the list of preempted
> hazard pointer readers, no reader.
>
> 3. CPU 0 does a context switch, it stores the current hazard pointer
> value to the current task's ->hazard_slot (let's say the task is task
> A), and add it to the list of preempted hazard pointer readers.
>
> 4. CPU 0 clears its percpu hazptr_slots for the next task (B).
>
> 5. CPU 1 continues the updater scan, and it scans the percpu slot of
> CPU 0, and finds no reader.
>
> in this situation, updater will miss a reader. But if we add a
> generation snapshotting at step 2 and generation increment at step 3, I
> think it'll work.
>
> IMO, if we make this work, it's better than the current backup slot
> mechanism IMO, because we only need to acquire the lock if context
> switch happens.
With PREEMPT_HAZPTR we also only need to acquire the per-cpu overflow
list raw spinlock on context switch (preemption or blocking). The only
other case requiring it is hazptr nested usage (more than 8 active
hazptr) on a thread context + nested irqs.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
next prev parent reply other threads:[~2025-12-18 17:35 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-18 1:45 [RFC PATCH v4 0/4] " Mathieu Desnoyers
2025-12-18 1:45 ` [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
2025-12-18 9:03 ` David Laight
2025-12-18 13:51 ` Mathieu Desnoyers
2025-12-18 15:54 ` David Laight
2025-12-18 14:27 ` Gary Guo
2025-12-18 16:12 ` David Laight
2025-12-18 1:45 ` [RFC PATCH v4 2/4] Documentation: RCU: Refer to ptr_eq() Mathieu Desnoyers
2025-12-18 1:45 ` [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers Mathieu Desnoyers
2025-12-18 8:36 ` Boqun Feng
2025-12-18 17:35 ` Mathieu Desnoyers [this message]
2025-12-18 20:22 ` Boqun Feng
2025-12-18 23:36 ` Mathieu Desnoyers
2025-12-19 0:25 ` Boqun Feng
2025-12-19 6:06 ` Joel Fernandes
2025-12-19 15:14 ` Mathieu Desnoyers
2025-12-19 15:42 ` Joel Fernandes
2025-12-19 22:19 ` Mathieu Desnoyers
2025-12-19 22:39 ` Joel Fernandes
2025-12-21 9:59 ` Boqun Feng
2025-12-19 0:43 ` Boqun Feng
2025-12-19 14:22 ` Mathieu Desnoyers
2025-12-19 1:22 ` Joel Fernandes
2025-12-18 1:45 ` [RFC PATCH v4 4/4] hazptr: Migrate per-CPU slots to backup slot on context switch Mathieu Desnoyers
2025-12-18 16:20 ` Mathieu Desnoyers
2025-12-18 22:16 ` Boqun Feng
2025-12-19 0:21 ` Mathieu Desnoyers
2025-12-18 10:33 ` [RFC PATCH v4 0/4] Hazard Pointers Joel Fernandes
2025-12-18 17:54 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fcbc99d0-b7c7-45f7-9cf8-f5d3bfad966e@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=Neeraj.Upadhyay@amd.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=frederic@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=jiangshanlai@gmail.com \
--cc=joel@joelfernandes.org \
--cc=jonas.oberhauser@huaweicloud.com \
--cc=josh@joshtriplett.org \
--cc=jstultz@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkmm@lists.linux.dev \
--cc=longman@redhat.com \
--cc=maged.michael@gmail.com \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=stern@rowland.harvard.edu \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=urezki@gmail.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox