Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Joel Fernandes <joel@joelfernandes.org>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	linux-kernel@vger.kernel.org, Nicholas Piggin <npiggin@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Alan Stern <stern@rowland.harvard.edu>,
	John Stultz <jstultz@google.com>,
	Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>,
	Ingo Molnar <mingo@redhat.com>, Waiman Long <longman@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	maged.michael@gmail.com, Mateusz Guzik <mjguzik@gmail.com>,
	Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>,
	rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev
Subject: Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers
Date: Fri, 19 Dec 2025 01:06:16 -0500	[thread overview]
Message-ID: <BEBCE6F9-F76A-4443-AEF7-9E0D80039F41@joelfernandes.org> (raw)
In-Reply-To: <aUSbhiMgluudbVEG@tardis-2.local>

Hi Boqun,

> On Dec 18, 2025, at 9:07 PM, Boqun Feng <boqun.feng@gmail.com> wrote:
> 
> On Thu, Dec 18, 2025 at 06:36:00PM -0500, Mathieu Desnoyers wrote:
>> On 2025-12-18 15:22, Boqun Feng wrote:
>> [...]
>>>>> Could you utilize this[1] to see a
>>>>> comparison of the reader-side performance against RCU/SRCU?
>>>> 
>>>> Good point ! Let's see.
>>>> 
>>>> On a AMD 2x EPYC 9654 96-Core Processor with 192 cores,
>>>> hyperthreading disabled,
>>>> CONFIG_PREEMPT=y,
>>>> CONFIG_PREEMPT_RCU=y,
>>>> CONFIG_PREEMPT_HAZPTR=y.
>>>> 
>>>> scale_type                 ns
>>>> -----------------------
>>>> hazptr-smp-mb             13.1   <- this implementation
>>>> hazptr-barrier            11.5   <- replace smp_mb() on acquire with barrier(), requires IPIs on synchronize.
>>>> hazptr-smp-mb-hlist       12.7   <- replace per-task hp context and per-cpu overflow lists by hlist.
>>>> rcu                       17.0
>>>> srcu                      20.0
>>>> srcu-fast                  1.5
>>>> rcu-tasks                  0.0
>>>> rcu-trace                  1.7
>>>> refcnt                  1148.0
>>>> rwlock                  1190.0
>>>> rwsem                   4199.3
>>>> lock                   41070.6
>>>> lock-irq               46176.3
>>>> acqrel                     1.1
>>>> 
>>>> So only srcu-fast, rcu-tasks, rcu-trace and a plain acqrel
>>>> appear to beat hazptr read-side performance.
>>>> 
>>> 
>>> Could you also see the reader-side performance impact when the percpu
>>> hazard pointer slots are used up? I.e. the worst case.
>> 
>> I've modified the code to populate "(void *)1UL" in the 7 first slots
>> at bootup, here is the result:
>> 
>> hazptr-smp-mb-7-fail    16.3 ns
>> 
>> So we go from 13.1 ns to 16.3 ns when all but one slots are used.
>> 
>> And if we pre-populate the 8 slots for each cpu, and thus force
>> fallback to overflow list:
>> 
>> hazptr-smp-mb-8-fail    67.1 ns
>> 
> 
> Thank you! So involving locking seems to hurt performance more than
> per-CPU/per-task operations. This may suggest that enabling
> PREEMPT_HAZPTR by default has an acceptable performance.

My impression is we do other locking on preemption anyway, such as the block list for preempted RCU critical sections. So maybe that's okay. As you said, it is an acceptable performance.

>>> 
>>>> [...]
>>>> 
>>>>>> +/*
>>>>>> + * Perform piecewise iteration on overflow list waiting until "addr" is
>>>>>> + * not present. Raw spinlock is released and taken between each list
>>>>>> + * item and busy loop iteration. The overflow list generation is checked
>>>>>> + * each time the lock is taken to validate that the list has not changed
>>>>>> + * before resuming iteration or busy wait. If the generation has
>>>>>> + * changed, retry the entire list traversal.
>>>>>> + */
>>>>>> +static
>>>>>> +void hazptr_synchronize_overflow_list(struct overflow_list *overflow_list, void *addr)
>>>>>> +{
>>>>>> +    struct hazptr_backup_slot *backup_slot;
>>>>>> +    uint64_t snapshot_gen;
>>>>>> +
>>>>>> +    raw_spin_lock(&overflow_list->lock);
>>>>>> +retry:
>>>>>> +    snapshot_gen = overflow_list->gen;
>>>>>> +    list_for_each_entry(backup_slot, &overflow_list->head, node) {
>>>>>> +        /* Busy-wait if node is found. */
>>>>>> +        while (smp_load_acquire(&backup_slot->slot.addr) == addr) { /* Load B */
>>>>>> +            raw_spin_unlock(&overflow_list->lock);
>>>>>> +            cpu_relax();
>>>>> 
>>>>> I think we should prioritize the scan thread solution [2] instead of
>>>>> busy waiting hazrd pointer updaters, because when we have multiple
>>>>> hazard pointer usages we would want to consolidate the scans from
>>>>> updater side.

Yeah the scan thread idea also fixes the scan cost issue with per-task slots if we batch. If we implement a separate hazard pointer flavor along those lines, then maybe we should definitely do a worker thread.

>>>> I agree that batching scans with a worker thread is a logical next step.
>>>> 
>>>>> If so, the whole ->gen can be avoided.
>>>> 
>>>> How would it allow removing the generation trick without causing long
>>>> raw spinlock latencies ?
>>>> 
>>> 
>>> Because we won't need to busy-wait for the readers to go away, we can
>>> check whether they are still there in the next scan.
>>> 
>>> so:
>>> 
>>>    list_for_each_entry(backup_slot, &overflow_list->head, node) {
>>>        /* Busy-wait if node is found. */
>>>        if (smp_load_acquire(&backup_slot->slot.addr) == addr) { /* Load B */
>>>            <mark addr as unable to free and move on>
>> 
>> But then you still iterate on a possibly large list of overflow nodes,
>> with a raw spinlock held. That raw spinlock is taken by the scheduler
>> on context switch. This can cause very long scheduler latency.
>> 
> 
> That's fair.

What about combining both approaches?  
Can we do the generation trick along with worker thread scanning? I feel that should be doable.

> 
>> So breaking up the iteration into pieces is not just to handle
>> busy-waiting, but also to make sure we don't increase the
>> system latency by holding a raw spinlock (taken with rq lock
>> held) for more than the little time needed to iterate to the next
>> node.
>> 
> 
> I agree that it helps reduce the latency, but I feel like with a scan
> thread in the picture (and we don't need to busy-wait), we should use
> a forward-progress-guaranteed way in the updater side scan, which means
> we may need to explore other solutions for the latency (e.g.
> fine-grained locking hashlist for the overflow list) than the generation
> counter.

Hmm.. That only works I guess if there is no interference between the finger grained list being iterated and the list being overflowed into. Otherwise, I think it might run into the same issue, but I could be missing something about the idea.

thanks,

 - Joel


> 
> Regards,
> Boqun
> 
>> Thanks,
>> 
>> Mathieu
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> https://www.efficios.com
>

next prev parent reply	other threads:[~2025-12-19  6:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-18  1:45 [RFC PATCH v4 0/4] " Mathieu Desnoyers
2025-12-18  1:45 ` [RFC PATCH v4 1/4] compiler.h: Introduce ptr_eq() to preserve address dependency Mathieu Desnoyers
2025-12-18  9:03   ` David Laight
2025-12-18 13:51     ` Mathieu Desnoyers
2025-12-18 15:54       ` David Laight
2025-12-18 14:27     ` Gary Guo
2025-12-18 16:12       ` David Laight
2025-12-18  1:45 ` [RFC PATCH v4 2/4] Documentation: RCU: Refer to ptr_eq() Mathieu Desnoyers
2025-12-18  1:45 ` [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers Mathieu Desnoyers
2025-12-18  8:36   ` Boqun Feng
2025-12-18 17:35     ` Mathieu Desnoyers
2025-12-18 20:22       ` Boqun Feng
2025-12-18 23:36         ` Mathieu Desnoyers
2025-12-19  0:25           ` Boqun Feng
2025-12-19  6:06             ` Joel Fernandes [this message]
2025-12-19 15:14             ` Mathieu Desnoyers
2025-12-19 15:42               ` Joel Fernandes
2025-12-19 22:19                 ` Mathieu Desnoyers
2025-12-19 22:39                   ` Joel Fernandes
2025-12-21  9:59                     ` Boqun Feng
2025-12-19  0:43       ` Boqun Feng
2025-12-19 14:22         ` Mathieu Desnoyers
2025-12-19  1:22   ` Joel Fernandes
2025-12-18  1:45 ` [RFC PATCH v4 4/4] hazptr: Migrate per-CPU slots to backup slot on context switch Mathieu Desnoyers
2025-12-18 16:20   ` Mathieu Desnoyers
2025-12-18 22:16   ` Boqun Feng
2025-12-19  0:21     ` Mathieu Desnoyers
2025-12-18 10:33 ` [RFC PATCH v4 0/4] Hazard Pointers Joel Fernandes
2025-12-18 17:54   ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BEBCE6F9-F76A-4443-AEF7-9E0D80039F41@joelfernandes.org \
    --to=joel@joelfernandes.org \
    --cc=Neeraj.Upadhyay@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=frederic@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jiangshanlai@gmail.com \
    --cc=jonas.oberhauser@huaweicloud.com \
    --cc=josh@joshtriplett.org \
    --cc=jstultz@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkmm@lists.linux.dev \
    --cc=longman@redhat.com \
    --cc=maged.michael@gmail.com \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox