From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02F2DCFA754 for ; Sat, 5 Oct 2024 16:05:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E0036B0352; Sat, 5 Oct 2024 12:05:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 190806B0353; Sat, 5 Oct 2024 12:05:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 033066B0354; Sat, 5 Oct 2024 12:05:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CEFE76B0352 for ; Sat, 5 Oct 2024 12:05:08 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4E7A8AC3B5 for ; Sat, 5 Oct 2024 16:05:08 +0000 (UTC) X-FDA: 82640022696.17.547E805 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf16.hostedemail.com (Postfix) with ESMTP id D198F180008 for ; Sat, 5 Oct 2024 16:05:05 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JzVe3MKM; spf=none (imf16.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728144175; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4KIERTcyxCtPE+r337tQWEVOYM2iePWuMmqGmmdDWTE=; b=MFAY16dkD+t4NwelXToUJI9Wo9shS80F6GLn9XlmyucNYRswLeN7lG7LM4XYh+2uV7kPtf hg5n11svfVNouyLCeAWZRcOUW/7bubrDiidWsdgem4wftAgOHiXjaFhgKKgUuLBwS/ZaAv ulFryKdw7OIMiJge2Ut9KDP8+JCQj8k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728144175; a=rsa-sha256; cv=none; b=jDRWN6to9kI2qUDX/TV+8Bc51LgaEf6257sl/BVJhu7UGr+n4BjMgJe55kJKi8Ql9ZlonH BGUSpOyEXycuVS/WdjGNUnY404RpePy2R00uNEL5qtOsoIuK2WJQNztUBqedn3Ts6NAQzI eQrnR93Gt0yeSTiS2Xrqtuwq+W2Mltc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JzVe3MKM; spf=none (imf16.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=4KIERTcyxCtPE+r337tQWEVOYM2iePWuMmqGmmdDWTE=; b=JzVe3MKMqi654mbCDMS0KxSxY1 VonDI86HmKykf3DmK76Pe0GQNBehSTti2vjiznCSzTreUrG+0zxNVkM5WJtIB66E34dot/rBU48SQ 5TU/I5DYFgTQpYGueBHjOISd9POuHLUV8179Ok2UZXmBDaLdt/ryqmGsMoXr3jQIjL4lHOJFFN1Nm DHIlTi6rak0xMX+lw8oUzR6QuMx0hf47h4ioXUi2d1OUMhmemWDOjYjPMPdQV9Gzfj0ETuQaG3lwY e7Ys8wxeuuoADEoe/qSY+sDPA3Nif4IlMo1NEEF62ArN/IXXL5mBUuFEwhpxaiapI2mkJcqWK3/+z 6IRCnZXg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1sx7Gv-0000000DAdH-0Xwy; Sat, 05 Oct 2024 16:04:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id C8509300777; Sat, 5 Oct 2024 18:04:44 +0200 (CEST) Date: Sat, 5 Oct 2024 18:04:44 +0200 From: Peter Zijlstra To: Mathieu Desnoyers Cc: Boqun Feng , linux-kernel@vger.kernel.org, Linus Torvalds , Andrew Morton , Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Subject: Re: [RFC PATCH v2 3/4] hp: Implement Hazard Pointers Message-ID: <20241005160444.GA18071@noisy.programming.kicks-ass.net> References: <20241004182734.1761555-1-mathieu.desnoyers@efficios.com> <20241004182734.1761555-4-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241004182734.1761555-4-mathieu.desnoyers@efficios.com> X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D198F180008 X-Stat-Signature: tqsx54ns44ify3wszm9gkxamyq7e8o6f X-HE-Tag: 1728144305-335105 X-HE-Meta: U2FsdGVkX1+ZMAOXybOkR83Xda/RKIli5IaemDQ9bueDN2BGd1eMswqGARCQgaQ/piYm+I6IaaTKUfpZRNxa76dajWV6SCqcdvtFfVHcbcw9TgoTA9MWCM77VtLvzbUDE9hB8a328rg08X2lt//upx1SeqDxqo6OIX/8GRIEz0Kjfz3yep+gCeqBYUW1gr2tDabCheWJdxf0qdlSlsW/F7KHgW9SESL2yFJYgMiIWfH7ZwkbAu5hsyy+lRas4sFYWSmwsyGD83mfcNH87ptnSN86f8Jf2GUUfg5KORMXH7zPV0sD6fufkrxr7XAPAUOvXODEbunDdzp0HSuYRql7s2JcKh5f5IckDht/FW2APdvAhJCjneB442/0ao/93AbU3Ay6voYLkFsBG80CjlkXg95Rs9OPd1ud9C2s/eyvEQ5AA1wp6+wtcUu8kuIyHcdFff6bIFZ4vE2zOXMZchjlzg9CmxOK40o10aE1WfV4RT9Lqb6SgMemN37JP1Dl9optF2ozu8jAVgmNCkvocFcXltOxyU2F8tZHLFAbZLewQfwC6yeNiLHyyDpMaLjlR14emC08wr7bMinjaaNYhumMVV8Q/EzoAizvDWVKWIDfIK0qtIxxreVXElwsUn9LxPnXS5ojIZa8ec0fTAI/xN6gYUlycOMJtvyqqs1DwdUoMgNrM8nf+XnA6hBMpodl57BcoBalQZthptRHJQDXJkVz9Xqlxwuq7vmMkfMXO6bjzC0Rj1b7zkociL7oLbSx3GcmsufqbN4RLpDV3OPdhouw/4U7q/0gXnqOvJ9Teo5gdbB32l8IqMxswq64QhEU8MSgmXAoKJpspgJ/KsSCLu87Lzwk4hUApD9rXGWXUpDCy9bYvTM0dN+USyO1m8UTgYJrXYsGscXnwxocMXEn8v/QLHWiaFVXkEMJLl7I1yH+tm9C7J9QeWauVAiK6srYHsCOKNO4lb1Caehruh5Kwpx RZys/0ni KjUk8sOP6Rm/TP1qOxkT0vtw4fYgR3n3iWJOPx/XnpYgX9MG4+4hIuJNntxBoK15nM+sTOeExn+ssnTOFCm1n110UF/XVZ/4X3Q3qewyvyj+5bHms9FqAZJetvs1J7PbOSrPAyiQexrxtqdkjiNI8os3kOVmHhXI70oYYsfntIWKAaLxnWzufG1uRuLymqa5BHDNpp0og0GyI+R0WuZ3XOaof45uD3wtAqGgtyuACpOuttWMSoqmMN64hgqUaip9fhOlurhjkGb13v9sikiD1mF5bHWO5K14cOCvhwjZRuuVTcKjIivwZtJi1MfEhOM9NjVUJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 04, 2024 at 02:27:33PM -0400, Mathieu Desnoyers wrote: > include/linux/hp.h | 158 +++++++++++++++++++++++++++++++++++++++++++++ > kernel/Makefile | 2 +- > kernel/hp.c | 46 +++++++++++++ > 3 files changed, 205 insertions(+), 1 deletion(-) > create mode 100644 include/linux/hp.h > create mode 100644 kernel/hp.c > > diff --git a/include/linux/hp.h b/include/linux/hp.h > new file mode 100644 > index 000000000000..e85fc4365ea2 > --- /dev/null > +++ b/include/linux/hp.h > @@ -0,0 +1,158 @@ > +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers > +// > +// SPDX-License-Identifier: LGPL-2.1-or-later > + > +#ifndef _LINUX_HP_H > +#define _LINUX_HP_H > + > +/* > + * HP: Hazard Pointers > + * > + * This API provides existence guarantees of objects through hazard > + * pointers. > + * > + * It uses a fixed number of hazard pointer slots (nr_cpus) across the > + * entire system for each HP domain. > + * > + * Its main benefit over RCU is that it allows fast reclaim of > + * HP-protected pointers without needing to wait for a grace period. > + * > + * It also allows the hazard pointer scan to call a user-defined callback > + * to retire a hazard pointer slot immediately if needed. This callback > + * may, for instance, issue an IPI to the relevant CPU. > + * > + * References: > + * > + * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for > + * lock-free objects," in IEEE Transactions on Parallel and > + * Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 > + */ > + > +#include > + > +/* > + * Hazard pointer slot. > + */ > +struct hp_slot { > + void *addr; > +}; > + > +/* > + * Hazard pointer context, returned by hp_use(). > + */ > +struct hp_ctx { > + struct hp_slot *slot; > + void *addr; > +}; > + > +/* > + * hp_scan: Scan hazard pointer domain for @addr. > + * > + * Scan hazard pointer domain for @addr. > + * If @retire_cb is NULL, wait to observe that each slot contains a value > + * that differs from @addr. > + * If @retire_cb is non-NULL, invoke @callback for each slot containing > + * @addr. > + */ > +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, > + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)); struct hp_domain { struct hp_slot __percpu *slots }; might clarify things a wee little. > + > +/* Get the hazard pointer context address (may be NULL). */ > +static inline > +void *hp_ctx_addr(struct hp_ctx ctx) > +{ > + return ctx.addr; > +} >From where I'm sitting this seems like superfluous fluff, what's wrong with ctx.addr ? > +/* > + * hp_allocate: Allocate a hazard pointer. > + * > + * Allocate a hazard pointer slot for @addr. The object existence should > + * be guaranteed by the caller. Expects to be called from preempt > + * disable context. > + * > + * Returns a hazard pointer context. So you made the WTF'o'meter crack, this here function does not allocate nothing. Naming is bad. At best this is something like try-set-hazard-pointer or somesuch. > + */ > +static inline > +struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *addr) > +{ > + struct hp_slot *slot; > + struct hp_ctx ctx; > + > + if (!addr) > + goto fail; > + slot = this_cpu_ptr(percpu_slots); > + /* > + * A single hazard pointer slot per CPU is available currently. > + * Other hazard pointer domains can eventually have a different > + * configuration. > + */ > + if (READ_ONCE(slot->addr)) > + goto fail; > + WRITE_ONCE(slot->addr, addr); /* Store B */ > + ctx.slot = slot; > + ctx.addr = addr; > + return ctx; > + > +fail: > + ctx.slot = NULL; > + ctx.addr = NULL; > + return ctx; > +} > + > +/* > + * hp_dereference_allocate: Dereference and allocate a hazard pointer. > + * > + * Returns a hazard pointer context. Expects to be called from preempt > + * disable context. > + */ More terrible naming. Same as above, but additionally, I would expect a 'dereference' to actually dereference the pointer and have a return value of the dereferenced type. This function seems to double check and update the hp_ctx thing. I'm not at all sure yet wtf this is doing -- and the total lack of comments aren't helping. > +static inline > +struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_slots, void * const * addr_p) > +{ > + void *addr, *addr2; > + struct hp_ctx ctx; > + > + addr = READ_ONCE(*addr_p); > +retry: > + ctx = hp_allocate(percpu_slots, addr); > + if (!hp_ctx_addr(ctx)) > + goto fail; > + /* Memory ordering: Store B before Load A. */ > + smp_mb(); > + /* > + * Use RCU dereference without lockdep checks, because > + * lockdep is not aware of HP guarantees. > + */ > + addr2 = rcu_access_pointer(*addr_p); /* Load A */ > + /* > + * If @addr_p content has changed since the first load, > + * clear the hazard pointer and try again. > + */ > + if (!ptr_eq(addr2, addr)) { > + WRITE_ONCE(ctx.slot->addr, NULL); > + if (!addr2) > + goto fail; > + addr = addr2; > + goto retry; > + } > + /* > + * Use addr2 loaded from rcu_access_pointer() to preserve > + * address dependency ordering. > + */ > + ctx.addr = addr2; > + return ctx; > + > +fail: > + ctx.slot = NULL; > + ctx.addr = NULL; > + return ctx; > +} > + > +/* Retire the hazard pointer in @ctx. */ > +static inline > +void hp_retire(const struct hp_ctx ctx) > +{ > + smp_store_release(&ctx.slot->addr, NULL); > +} > + > +#endif /* _LINUX_HP_H */ > diff --git a/kernel/Makefile b/kernel/Makefile > index 3c13240dfc9f..ec16de96fa80 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -7,7 +7,7 @@ obj-y = fork.o exec_domain.o panic.o \ > cpu.o exit.o softirq.o resource.o \ > sysctl.o capability.o ptrace.o user.o \ > signal.o sys.o umh.o workqueue.o pid.o task_work.o \ > - extable.o params.o \ > + extable.o params.o hp.o \ > kthread.o sys_ni.o nsproxy.o \ > notifier.o ksysfs.o cred.o reboot.o \ > async.o range.o smpboot.o ucount.o regset.o ksyms_common.o > diff --git a/kernel/hp.c b/kernel/hp.c > new file mode 100644 > index 000000000000..b2447bf15300 > --- /dev/null > +++ b/kernel/hp.c > @@ -0,0 +1,46 @@ > +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers > +// > +// SPDX-License-Identifier: LGPL-2.1-or-later > + > +/* > + * HP: Hazard Pointers > + */ > + > +#include > +#include > + > +/* > + * hp_scan: Scan hazard pointer domain for @addr. > + * > + * Scan hazard pointer domain for @addr. > + * If @retire_cb is non-NULL, invoke @callback for each slot containing > + * @addr. > + * Wait to observe that each slot contains a value that differs from > + * @addr before returning. > + */ > +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, > + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)) > +{ > + int cpu; > + > + /* > + * Store A precedes hp_scan(): it unpublishes addr (sets it to > + * NULL or to a different value), and thus hides it from hazard > + * pointer readers. > + */ > + > + if (!addr) > + return; > + /* Memory ordering: Store A before Load B. */ > + smp_mb(); > + /* Scan all CPUs slots. */ > + for_each_possible_cpu(cpu) { > + struct hp_slot *slot = per_cpu_ptr(percpu_slots, cpu); > + > + if (retire_cb && smp_load_acquire(&slot->addr) == addr) /* Load B */ > + retire_cb(cpu, slot, addr); Is retirce_cb allowed to cmpxchg the thing? > + /* Busy-wait if node is found. */ > + while ((smp_load_acquire(&slot->addr)) == addr) /* Load B */ > + cpu_relax(); This really should be using smp_cond_load_acquire() > + } > +}