From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A69B3CF8857 for ; Fri, 4 Oct 2024 21:25:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0EB136B03E0; Fri, 4 Oct 2024 17:25:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 09AD26B03E1; Fri, 4 Oct 2024 17:25:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7E0A6B03E2; Fri, 4 Oct 2024 17:25:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C5C7E6B03E0 for ; Fri, 4 Oct 2024 17:25:30 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id F2CA5C066F for ; Fri, 4 Oct 2024 21:25:29 +0000 (UTC) X-FDA: 82637201178.08.F47843F Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf01.hostedemail.com (Postfix) with ESMTP id 3457240009 for ; Fri, 4 Oct 2024 21:25:28 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=PaJ+eq8E; spf=pass (imf01.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.128.175 as permitted sender) smtp.mailfrom=joel@joelfernandes.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728077062; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VY/vX/bFOTh9W1p1r+rO5zOmUIuqL5TOWyGjCo/iVjE=; b=PtK8gBHQGqtGe31dA8Qxww3fIPb7qegBhYfkhgTBhLKUieZVpII45LZ9jMP06C7ZVND2pG aUq5PMzw7iQHdKNMF0RGFh6Jta1UGd/otBW3dEkiw8Mmq+87RF6I+JQmorvWMGaTFYd1AW egixCbNVSUfwnpDxYotd7Rk9NONX+Zo= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=joelfernandes.org header.s=google header.b=PaJ+eq8E; spf=pass (imf01.hostedemail.com: domain of joel@joelfernandes.org designates 209.85.128.175 as permitted sender) smtp.mailfrom=joel@joelfernandes.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728077062; a=rsa-sha256; cv=none; b=o2JjSm6+K48Q+X3DlURspb54UHt5Q9amzBtkenLWpX+pUh9cciF1++KMWbm69FjmBWl8rT avj/7qcNy3Co/zGBIex3Rhlvp4Do8VhqODyooh+RGTdd4tUF7gYyKLwGEda+pQh++bz2A1 OUUZnc386HRu7uCy/wuEG4NQ8RuR5Fo= Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-6e29d8b9fb5so21308467b3.3 for ; Fri, 04 Oct 2024 14:25:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1728077127; x=1728681927; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VY/vX/bFOTh9W1p1r+rO5zOmUIuqL5TOWyGjCo/iVjE=; b=PaJ+eq8E+o5bGOW6jr9oP56PHbkxy2RKCq6Z2Ht0nrXYJUZwdg3kgIKQMaGYFZcKHe P1K5XX4nKctaBe4t0PZON1xnx54xmNHjbI71U/V+fiVa7SSfqgMNL2+SXJZB+OI9+LLR TTr24N8qJCC4AZDpyO5qNreC25L3DsbhbEcQE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728077127; x=1728681927; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VY/vX/bFOTh9W1p1r+rO5zOmUIuqL5TOWyGjCo/iVjE=; b=j8RTMDtaqT4KC1z2UQUFZmio9/5/j9PQIbuYzUdcHb7qyi/oLmlJW/G/AwLl9DvOd2 EguaRwZR4hCIqMbOquRTEUnicjqigNgwnFsYM+O1dutZB/2oez1WvMRFIkKKgUoLTEvR 7R8bhD7SA/onRVEKPTdUhybxi0VK1j7R8U2RivFdJOUXPVuNjhtXhcJUsqpwJMvGzVAJ lC3pa4RzmCSLlbOfpzHQIIU1UKmQBFlzvhiq9T48dUgozpn6a7pX+Ng43ECwDeAw1tWo h+H2iI2PZ7bNat7WKnin/jHQyX3EdXYwuIhs+SfBT0tz6/T0P+3jFSmPnrW15dvYJ1p6 zBRA== X-Forwarded-Encrypted: i=1; AJvYcCV/fUrT1yRNlGKHSSfZo7lfLkyOLiG7EIFHpHlG4J5Eoiy32eSeZl7spj+aNZYmPVaYZyVMIgSuGg==@kvack.org X-Gm-Message-State: AOJu0Yww600Eu64nhsQXUBMEAWa9F2Qaw7u/oiB68zowSYUoZwCDeDlW jwOpyHK2agCXXmjLdmYB82Qun9K9EvOizbuFmFp/LPtlMdxyOzIOhOTwBQHQpxJ0qkG16tQqfqg NGTt6OpWOAwfDoO82H7+yYtsl+/FxKICd6VHopw== X-Google-Smtp-Source: AGHT+IG4dzdT+dwrRjzLGqEwPGzxdMKAYORd4LvnKjT+a7MKxe5B6QDsfMA7Z11kuWZtJ2eoPYvrx9RSappLuXALV5Q= X-Received: by 2002:a05:690c:6504:b0:6dd:76ce:d6c9 with SMTP id 00721157ae682-6e2c72a0474mr43316727b3.42.1728077127177; Fri, 04 Oct 2024 14:25:27 -0700 (PDT) MIME-Version: 1.0 References: <20241004182734.1761555-1-mathieu.desnoyers@efficios.com> <20241004182734.1761555-4-mathieu.desnoyers@efficios.com> In-Reply-To: <20241004182734.1761555-4-mathieu.desnoyers@efficios.com> From: Joel Fernandes Date: Fri, 4 Oct 2024 17:25:16 -0400 Message-ID: Subject: Re: [RFC PATCH v2 3/4] hp: Implement Hazard Pointers To: Mathieu Desnoyers Cc: Boqun Feng , linux-kernel@vger.kernel.org, Linus Torvalds , Andrew Morton , Peter Zijlstra , Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 3457240009 X-Stat-Signature: t5mum8rahp89o4wbjnmwknijyoer8fqc X-HE-Tag: 1728077128-42079 X-HE-Meta: U2FsdGVkX18Ir2WSLZ3i7wrjMv7DUP0yz44ajYqziL3xTSP1c/7qYZW0F4VNMYW67JxDatTjWuVYhqk2lt2Un2gjCcKv5lNgdtydjeAXtwBqPnYCFM/b03oBBpJQHIcSABvk8KXWeklkWtpaof2AdyPXJcCarLIjk+GLDpvF/cq9dYJ51aiP1zgfhNAurXgn7zSpZa2TPsbIf/fw3Ac1/0Q91SN+ZC99WkQQAFlvlHvu6WyePb7dEl9nOR9QGX39/Fal/Y+UuiGX7PqgI9JGaNjv6rO0rYx2NJORdO4pCDbr0f0TydtoJCi0L1oZW0F0r0POPngb5LDii5CbWl2Ls31DtUPhYggZUcponxTq2vaLKxZV8LzvFRdceo+2yIHd0aaQkV0MDMPnjfsBhy/fHrjpGSuxctjsPOYURvUm98Wbi/uG90f7axVz+DR3Ab6wHuArA9Tz29FKI57u6Dd8CjeZn+tnbSBX+1zoZkqaghCFiU5XDLn9tKEtMqFy4610JDokQs9hy6UH7q8qjQ0hRzgdcq+koEbetwJ0l2WLF71A9L6e0u1BqxFCnNIiRYsGAgsi4ZnH4jgEKuTDPn7OgFoIKSoCeQqyMJSHZAlVOgFmrdfge+Cd/5QVby+QY9VAj6SoncFaDZpBX1JMHMhUStrP4PHL8W9qj/FX50ZZBsXQxdTeaLaY8Gtw0I4xhBrzqG/8i2Yk8A1zbLtG7le3I1/LMKGBJS6SByzbwXwRiMYqB8y7e+Onv7G0M5zfY3wPCHMLgCvIxMQzFJ+STa7PTFaGO1xtTPjTmvAWpkaHBJMfeMrAH5XgOZEB5EdCvId7wLjv5el+UUKdA3q8WhDKZDpMPLQy8ankL9WYpbOiH5aEC7fieFs5VV0NjZ81fZH552Yf1Jk9L4Cf7Sjg3NlxU5f60wHHWIUN4hfFlrLNRHxozLPdNPYysoBL5hR/Qoa9FOziwqriZrqjsHN2jT3 qyFz0NN3 Xb2kqk1SX9zlX9YqHAZvL7kkhYpzb06bLFCWYJIDOlRSNXcfMw2C2r+F+r1wMqh0ZD5cRTfoE/bpXsUxhSbzf1rUfGGU0Yf1C8RDdGdAxdByqf4rgPW0zB6LXrn9tr3k3L2Iqo6QXTfqa1rPfYuo9u7/+ZydvuoeKeFmvVOfVFCQDFPF+7WLUe/k93A7gDZ+GvC1edrw2TKiHBdQQ+/i79EuWdjALu+Va+C361eEeUKXZZPe/ZGJRz1sUM4KaB1L5q9aBUhTeI6e/xJd6KXGx5iVPmExV1QP53JJwlp6q7Ghwut+vDzg5WOzTUPmVzDjNdb7Ux/+J3aPX2kA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 4, 2024 at 2:29=E2=80=AFPM Mathieu Desnoyers wrote: > > This API provides existence guarantees of objects through Hazard > Pointers (HP). This minimalist implementation is specific to use > with preemption disabled, but can be extended further as needed. > > Each HP domain defines a fixed number of hazard pointer slots (nr_cpus) > across the entire system. > > Its main benefit over RCU is that it allows fast reclaim of > HP-protected pointers without needing to wait for a grace period. > > It also allows the hazard pointer scan to call a user-defined callback > to retire a hazard pointer slot immediately if needed. This callback > may, for instance, issue an IPI to the relevant CPU. > > There are a few possible use-cases for this in the Linux kernel: > > - Improve performance of mm_count by replacing lazy active mm by HP. > - Guarantee object existence on pointer dereference to use refcount: > - replace locking used for that purpose in some drivers, > - replace RCU + inc_not_zero pattern, > - rtmutex: Improve situations where locks need to be taken in > reverse dependency chain order by guaranteeing existence of > first and second locks in traversal order, allowing them to be > locked in the correct order (which is reverse from traversal > order) rather than try-lock+retry on nested lock. > > References: > > [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for > lock-free objects," in IEEE Transactions on Parallel and > Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 [ ... ] > --- > Changes since v0: > - Remove slot variable from hp_dereference_allocate(). > --- > include/linux/hp.h | 158 +++++++++++++++++++++++++++++++++++++++++++++ > kernel/Makefile | 2 +- > kernel/hp.c | 46 +++++++++++++ Just a housekeeping comment, ISTR Linus looking down on adding bodies of C code to header files (like hp_dereference_allocate). I understand maybe the rationale is that the functions included are inlined. But do all of them have to be inlined? Such headers also hurt code browsing capabilities in code browsers like clangd. clangd doesn't understand header files because it can't independently compile them -- it uses the compiler to generate and extract the AST for superior code browsing/completion. Also have you looked at the benefits of inlining for hp.h? hp_dereference_allocate() seems large enough that inlining may not matter much, but I haven't compiled it and looked at the asm myself. Will continue staring at the code. thanks, - Joel > 3 files changed, 205 insertions(+), 1 deletion(-) > create mode 100644 include/linux/hp.h > create mode 100644 kernel/hp.c > > diff --git a/include/linux/hp.h b/include/linux/hp.h > new file mode 100644 > index 000000000000..e85fc4365ea2 > --- /dev/null > +++ b/include/linux/hp.h > @@ -0,0 +1,158 @@ > +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers > +// > +// SPDX-License-Identifier: LGPL-2.1-or-later > + > +#ifndef _LINUX_HP_H > +#define _LINUX_HP_H > + > +/* > + * HP: Hazard Pointers > + * > + * This API provides existence guarantees of objects through hazard > + * pointers. > + * > + * It uses a fixed number of hazard pointer slots (nr_cpus) across the > + * entire system for each HP domain. > + * > + * Its main benefit over RCU is that it allows fast reclaim of > + * HP-protected pointers without needing to wait for a grace period. > + * > + * It also allows the hazard pointer scan to call a user-defined callbac= k > + * to retire a hazard pointer slot immediately if needed. This callback > + * may, for instance, issue an IPI to the relevant CPU. > + * > + * References: > + * > + * [1]: M. M. Michael, "Hazard pointers: safe memory reclamation for > + * lock-free objects," in IEEE Transactions on Parallel and > + * Distributed Systems, vol. 15, no. 6, pp. 491-504, June 2004 > + */ > + > +#include > + > +/* > + * Hazard pointer slot. > + */ > +struct hp_slot { > + void *addr; > +}; > + > +/* > + * Hazard pointer context, returned by hp_use(). > + */ > +struct hp_ctx { > + struct hp_slot *slot; > + void *addr; > +}; > + > +/* > + * hp_scan: Scan hazard pointer domain for @addr. > + * > + * Scan hazard pointer domain for @addr. > + * If @retire_cb is NULL, wait to observe that each slot contains a valu= e > + * that differs from @addr. > + * If @retire_cb is non-NULL, invoke @callback for each slot containing > + * @addr. > + */ > +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, > + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)= ); > + > +/* Get the hazard pointer context address (may be NULL). */ > +static inline > +void *hp_ctx_addr(struct hp_ctx ctx) > +{ > + return ctx.addr; > +} > + > +/* > + * hp_allocate: Allocate a hazard pointer. > + * > + * Allocate a hazard pointer slot for @addr. The object existence should > + * be guaranteed by the caller. Expects to be called from preempt > + * disable context. > + * > + * Returns a hazard pointer context. > + */ > +static inline > +struct hp_ctx hp_allocate(struct hp_slot __percpu *percpu_slots, void *a= ddr) > +{ > + struct hp_slot *slot; > + struct hp_ctx ctx; > + > + if (!addr) > + goto fail; > + slot =3D this_cpu_ptr(percpu_slots); > + /* > + * A single hazard pointer slot per CPU is available currently. > + * Other hazard pointer domains can eventually have a different > + * configuration. > + */ > + if (READ_ONCE(slot->addr)) > + goto fail; > + WRITE_ONCE(slot->addr, addr); /* Store B */ > + ctx.slot =3D slot; > + ctx.addr =3D addr; > + return ctx; > + > +fail: > + ctx.slot =3D NULL; > + ctx.addr =3D NULL; > + return ctx; > +} > + > +/* > + * hp_dereference_allocate: Dereference and allocate a hazard pointer. > + * > + * Returns a hazard pointer context. Expects to be called from preempt > + * disable context. > + */ > +static inline > +struct hp_ctx hp_dereference_allocate(struct hp_slot __percpu *percpu_sl= ots, void * const * addr_p) > +{ > + void *addr, *addr2; > + struct hp_ctx ctx; > + > + addr =3D READ_ONCE(*addr_p); > +retry: > + ctx =3D hp_allocate(percpu_slots, addr); > + if (!hp_ctx_addr(ctx)) > + goto fail; > + /* Memory ordering: Store B before Load A. */ > + smp_mb(); > + /* > + * Use RCU dereference without lockdep checks, because > + * lockdep is not aware of HP guarantees. > + */ > + addr2 =3D rcu_access_pointer(*addr_p); /* Load A */ > + /* > + * If @addr_p content has changed since the first load, > + * clear the hazard pointer and try again. > + */ > + if (!ptr_eq(addr2, addr)) { > + WRITE_ONCE(ctx.slot->addr, NULL); > + if (!addr2) > + goto fail; > + addr =3D addr2; > + goto retry; > + } > + /* > + * Use addr2 loaded from rcu_access_pointer() to preserve > + * address dependency ordering. > + */ > + ctx.addr =3D addr2; > + return ctx; > + > +fail: > + ctx.slot =3D NULL; > + ctx.addr =3D NULL; > + return ctx; > +} > + > +/* Retire the hazard pointer in @ctx. */ > +static inline > +void hp_retire(const struct hp_ctx ctx) > +{ > + smp_store_release(&ctx.slot->addr, NULL); > +} > + > +#endif /* _LINUX_HP_H */ > diff --git a/kernel/Makefile b/kernel/Makefile > index 3c13240dfc9f..ec16de96fa80 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -7,7 +7,7 @@ obj-y =3D fork.o exec_domain.o panic.o \ > cpu.o exit.o softirq.o resource.o \ > sysctl.o capability.o ptrace.o user.o \ > signal.o sys.o umh.o workqueue.o pid.o task_work.o \ > - extable.o params.o \ > + extable.o params.o hp.o \ > kthread.o sys_ni.o nsproxy.o \ > notifier.o ksysfs.o cred.o reboot.o \ > async.o range.o smpboot.o ucount.o regset.o ksyms_common.o > diff --git a/kernel/hp.c b/kernel/hp.c > new file mode 100644 > index 000000000000..b2447bf15300 > --- /dev/null > +++ b/kernel/hp.c > @@ -0,0 +1,46 @@ > +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers > +// > +// SPDX-License-Identifier: LGPL-2.1-or-later > + > +/* > + * HP: Hazard Pointers > + */ > + > +#include > +#include > + > +/* > + * hp_scan: Scan hazard pointer domain for @addr. > + * > + * Scan hazard pointer domain for @addr. > + * If @retire_cb is non-NULL, invoke @callback for each slot containing > + * @addr. > + * Wait to observe that each slot contains a value that differs from > + * @addr before returning. > + */ > +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, > + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)= ) > +{ > + int cpu; > + > + /* > + * Store A precedes hp_scan(): it unpublishes addr (sets it to > + * NULL or to a different value), and thus hides it from hazard > + * pointer readers. > + */ > + > + if (!addr) > + return; > + /* Memory ordering: Store A before Load B. */ > + smp_mb(); > + /* Scan all CPUs slots. */ > + for_each_possible_cpu(cpu) { > + struct hp_slot *slot =3D per_cpu_ptr(percpu_slots, cpu); > + > + if (retire_cb && smp_load_acquire(&slot->addr) =3D=3D add= r) /* Load B */ > + retire_cb(cpu, slot, addr); > + /* Busy-wait if node is found. */ > + while ((smp_load_acquire(&slot->addr)) =3D=3D addr) /* Lo= ad B */ > + cpu_relax(); > + } > +} > -- > 2.39.2 >