From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32829CF9C6C for ; Wed, 25 Sep 2024 05:58:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB8A96B007B; Wed, 25 Sep 2024 01:58:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A68E06B0082; Wed, 25 Sep 2024 01:58:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 930E06B0083; Wed, 25 Sep 2024 01:58:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 76C806B007B for ; Wed, 25 Sep 2024 01:58:18 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 254531609A3 for ; Wed, 25 Sep 2024 05:58:18 +0000 (UTC) X-FDA: 82602205476.30.A28CD5A Received: from frasgout12.his.huawei.com (frasgout12.his.huawei.com [14.137.139.154]) by imf15.hostedemail.com (Postfix) with ESMTP id 1CEF2A0002 for ; Wed, 25 Sep 2024 05:58:13 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of jonas.oberhauser@huaweicloud.com designates 14.137.139.154 as permitted sender) smtp.mailfrom=jonas.oberhauser@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727243777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q7Jn8IjMG6ZoMBwZeN9OkLGJtxawC0/9FTFZM+gC5AI=; b=cD/EPOqWAh4Y2uITZlrf8ULdMeeW2NjDKmE73xMmky7Ry/xN8fU7Nt0+Gyfcn8LhSmKqP4 xSPh9iJrjtu1A53DHiaTxt2P3kCtWeqWt9JgyhuYH5bjQV2Z4WCQfN19rd/ePtQQc9ugTa 00bYP0We986/FA9CDK/W+YHYqxziNO0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727243777; a=rsa-sha256; cv=none; b=Y0N7iCztWFevcSuTxE13lJ+VmeLq/zU05UKjHv10M4O7suKtMcA+rL0mfXynNTjrsO0xQo g2wVLesxs8k+NGnA8L6Z53583viZviJH0BCWCDO3IlzHdWJCPmAKccXg/3FSlEamqO3u3u x8/ce/OlxtQdAEp9Hpmi1WtprfEZmCg= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of jonas.oberhauser@huaweicloud.com designates 14.137.139.154 as permitted sender) smtp.mailfrom=jonas.oberhauser@huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.29]) by frasgout12.his.huawei.com (SkyGuard) with ESMTP id 4XD5366DqFz9v7JM for ; Wed, 25 Sep 2024 13:32:34 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id 890B81404F6 for ; Wed, 25 Sep 2024 13:58:00 +0800 (CST) Received: from [10.45.145.184] (unknown [10.45.145.184]) by APP1 (Coremail) with SMTP id LxC2BwAnuC9ZpvNmFYKXAQ--.24203S2; Wed, 25 Sep 2024 06:57:59 +0100 (CET) Message-ID: Date: Wed, 25 Sep 2024 07:57:43 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter To: Mathieu Desnoyers , Boqun Feng , "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, Will Deacon , Peter Zijlstra , Alan Stern , John Stultz , Linus Torvalds , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev References: <20240921164210.256278-1-mathieu.desnoyers@efficios.com> From: Jonas Oberhauser In-Reply-To: <20240921164210.256278-1-mathieu.desnoyers@efficios.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-CM-TRANSID:LxC2BwAnuC9ZpvNmFYKXAQ--.24203S2 X-Coremail-Antispam: 1UD129KBjvJXoW3Jw1UGw4xJr15CFy8KryUJrb_yoWxZFyxpF W3urs3tFZrJF4I9a10van8uayv9F1F9a18W348Jry3Z39rJ34UJ3WUtrWUAa1fCrykXw4a vw4Y9FZrJF9rtFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvmb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVWUJVW8JwA2z4x0Y4vEx4A2jsIEc7CjxV AFwI0_Gr0_Gr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40E x7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x 0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lFIxGxcIEc7CjxVA2Y2ka0xkIwI1lc7CjxVAaw2AF wI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4 xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a6rW5 MIIYY7kG6xAYrwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2I x0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2 z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnU UI43ZEXa7IUYyUDJUUUUU== X-CM-SenderInfo: 5mrqt2oorev25kdx2v3u6k3tpzhluzxrxghudrp/ X-Rspamd-Queue-Id: 1CEF2A0002 X-Stat-Signature: qzum6cy11tna71utzsn9j96s16y9eccr X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1727243893-380288 X-HE-Meta: U2FsdGVkX18fksia+ApHPC43IXpPiwM4b9Zv37JBTGRkhApZxFIEtsnWEnw0j22e2eJ9ly3wj5OcnEqiCtk8eZhFbLohTlz5JfkkZz5ApMtOBIgCYIM51Am/9uwX382b4WxYMpshaFWFegEYkbbGw2HR4NojH8t1s3tBuhLK5cnvXceO96zAMzmZMIX3gtdcb/u+LHCTs59/T3nzBi9yCSjz2WdrjJJQKofav82sqG1pT/DAfWPIEv/zxhlzc0hOJCxv6yhrQQDoSn1swMWjoCyF2F3H8UrUx82Yx8/K0/ugbKZO0anqjL6q7AzVesJEukQSWBECPoD+rJAyhk/5DAgTdFkfxOHU/qLy7SJsLkEclj8vkIvNS5ZP7oowixVgR1xq8SIyhL59X6La2HZWxbh6QvwCDoTHejx0ECwbcB1LW2K5kW86buo2f3X93i992EzKAO7ik0QE1jWZOgy8oON1UcCP9E1Q1MDMPbIhsBkhJi2mrRn+LnHuWRrQr18M3pFfir5wMoyshE+MHPHE70lBXCCQhbLn2oU2S8aXlnjqE0RT/PPGxsVNdIGK1JdqFh0GX/QEoAHPoZujdFI5Cz2pnKm8TtCief8V9fMmaWWMu7D1tW3yEXvWbQMFV3wjrJMqRrlUxx+pkcIF9eLVSuqeHfsDm1Zwc/bNwsF9rUT9A9yUnVNMtjo2NndxeqNOsl38jCbvuMdfMagKxPhRDWHNKPmnfNIHImjgO9fDCzACls+N8RM5aQ4VP91+tX79qE5Fn96Qe8uZtrsuUcThBYkMUrx8OX6G56HLx6EZLYQgOG/EP5RUUqDu+sXRB8D8u2ZdA3dLR+CVERcrnYEkszEfXK3YwD0eh7uylSVbhIDwAKXeH9BEyLVVQ27bH7zIJvz8goBf0ILO/NIFxvnE/Qxk39XXAkYTPDAknbkgQI17066ji1OVji5Zhfpqn7jvFU3bYS+nVSLA7dPszPg JQnZj4MA gKTfUfgA+5BOFkFd0UiCS5XTCAzTPMdva2/XYukePaz2CxCjaMWPYqxO1YG2HTtETmBxsj5pvMTkKlXOBuXn59bmQXPoj/Q/6zCiwBA8DJ3h3OJmJajMeRNlaufwsbCyiQKvmKSj2sz+mC7mMdXA/8SjIZ7Pn8nvv8qWqFbZ4kTHeLnQbLlNHAuf4q1HqfVUY2/05puenaKcWOvq/0jiUgmVtiDBGg/x4+2s3O8NRKpkCFELEn2ls+WbkSvoZ9v9fJhzXs0BC5KqjGDF0FOtTAl+TvUglcXfYhSFhKiIjOm63qft3KBiMOLTRwHmM885jsisN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Mathieu, interesting idea. Conceptually it looks good. There's another approach of using hazard pointer to optimize shared reference counting (to make it lock-free in common cases). https://github.com/cmuparlay/concurrent_deferred_rc It doesn't go as far as what you're doing, but they also use the hazard pointer to protect the refcount (like you do in the promote function). I haven't read your code in detail but it seems to me you have an ABA bug: as I explained elsewhere, you could read the same pointer after ABA but you don't synchronize with the newer store that gave you node2, leaving you to speculatively read stale values through *ctx->hp. (I am assuming here that ctx->hp is essentially an out parameter used to let the caller know which node got protected). Have fun, jonas > +/* > + * hpref_hp_get: Obtain a reference to a stable object, protected either > + * by hazard pointer (fast-path) or using reference > + * counter as fall-back. > + */ > +static inline > +bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx) > +{ > + int cpu = rseq_current_cpu_raw(); > + struct hpref_percpu_slots *cpu_slots = rseq_percpu_ptr(hpref_percpu_slots, cpu); > + struct hpref_slot *slot = &cpu_slots->slots[cpu_slots->current_slot]; > + bool use_refcount = false; > + struct hpref_node *node, *node2; > + unsigned int next_slot; > + > +retry: > + node = uatomic_load(node_p, CMM_RELAXED); > + if (!node) > + return false; > + /* Use rseq to try setting current slot hp. Store B. */ > + if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID, > + (intptr_t *) &slot->node, (intptr_t) NULL, > + (intptr_t) node, cpu)) { > + slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT]; > + use_refcount = true; > + /* > + * This may busy-wait for another reader using the > + * emergency slot to transition to refcount. > + */ > + caa_cpu_relax(); > + goto retry; > + } > + /* Memory ordering: Store B before Load A. */ > + cmm_smp_mb(); > + node2 = uatomic_load(node_p, CMM_RELAXED); /* Load A */ > + if (node != node2) { > + uatomic_store(&slot->node, NULL, CMM_RELAXED); > + if (!node2) > + return false; > + goto retry; > + } > + ctx->type = HPREF_TYPE_HP; > + ctx->hp = node; ^---- here > + ctx->slot = slot; > + if (use_refcount) { > + hpref_promote_hp_to_ref(ctx); > + return true; > + } > + /* > + * Increment current slot (racy increment is OK because it is > + * just a position hint). Skip the emergency slot. > + */ > + next_slot = uatomic_load(&cpu_slots->current_slot, CMM_RELAXED) + 1; > + if (next_slot >= HPREF_EMERGENCY_SLOT) > + next_slot = 0; > + uatomic_store(&cpu_slots->current_slot, next_slot, CMM_RELAXED); > + return true; > +} > + > +static inline > +void hpref_put(struct hpref_ctx *ctx) > +{ > + if (ctx->type == HPREF_TYPE_REF) { > + urcu_ref_put(&ctx->hp->refcount, hpref_release); > + } else { > + /* Release HP. */ > + uatomic_store(&ctx->slot->node, NULL, CMM_RELEASE); > + } > + ctx->hp = NULL; > +} > + > +/* > + * hpref_set_pointer: Store pointer @node to @ptr, with RCU publication > + * guarantees. > + */ > +static inline > +void hpref_set_pointer(struct hpref_node **ptr, struct hpref_node *node) > +{ > + if (__builtin_constant_p(node) && node == NULL) > + uatomic_store(ptr, NULL, CMM_RELAXED); > + else > + uatomic_store(ptr, node, CMM_RELEASE); > +} > + > +/* > + * Return the content of the hpref context hazard pointer field. > + */ > +static inline > +struct hpref_node *hpref_ctx_pointer(struct hpref_ctx *ctx) > +{ > + return ctx->hp; > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _URCU_HPREF_H */ > diff --git a/src/Makefile.am b/src/Makefile.am > index b555c818..7312c9f7 100644 > --- a/src/Makefile.am > +++ b/src/Makefile.am > @@ -19,7 +19,8 @@ RCULFHASH = rculfhash.c rculfhash-mm-order.c rculfhash-mm-chunk.c \ > lib_LTLIBRARIES = liburcu-common.la \ > liburcu.la liburcu-qsbr.la \ > liburcu-mb.la liburcu-bp.la \ > - liburcu-memb.la liburcu-cds.la > + liburcu-memb.la liburcu-cds.la \ > + liburcu-hpref.la > > # > # liburcu-common contains wait-free queues (needed by call_rcu) as well > @@ -50,6 +51,9 @@ liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c lfstack.c \ > workqueue.c workqueue.h $(RCULFHASH) $(COMPAT) > liburcu_cds_la_LIBADD = liburcu-common.la > > +liburcu_hpref_la_SOURCES = hpref.c > +liburcu_hpref_la_LIBADD = -lrseq > + > pkgconfigdir = $(libdir)/pkgconfig > pkgconfig_DATA = liburcu-cds.pc liburcu.pc liburcu-bp.pc liburcu-qsbr.pc \ > liburcu-mb.pc liburcu-memb.pc > diff --git a/src/hpref.c b/src/hpref.c > new file mode 100644 > index 00000000..f63530f5 > --- /dev/null > +++ b/src/hpref.c > @@ -0,0 +1,78 @@ > +// SPDX-FileCopyrightText: 2024 Mathieu Desnoyers > +// > +// SPDX-License-Identifier: LGPL-2.1-or-later > + > +/* > + * HPREF: Hazard pointers with reference counters > + */ > + > +#define _LGPL_SOURCE > +#include > +#include /* Per-CPU memory */ > + > +static struct rseq_mempool *mempool; > +struct hpref_percpu_slots *hpref_percpu_slots; > + > +void hpref_release(struct urcu_ref *ref) > +{ > + struct hpref_node *node = caa_container_of(ref, struct hpref_node, refcount); > + > + node->release(node); > +} > + > +/* > + * hpref_synchronize: Wait for any reader possessing a hazard pointer to > + * @node to clear its hazard pointer slot. > + */ > +void hpref_synchronize(struct hpref_node *node) > +{ > + int nr_cpus = rseq_get_max_nr_cpus(), cpu; > + > + /* Memory ordering: Store A before Load B. */ > + cmm_smp_mb(); > + /* Scan all CPUs slots. */ > + for (cpu = 0; cpu < nr_cpus; cpu++) { > + struct hpref_percpu_slots *cpu_slots = rseq_percpu_ptr(hpref_percpu_slots, cpu); > + struct hpref_slot *slot; > + unsigned int i; > + > + for (i = 0; i < HPREF_NR_PERCPU_SLOTS; i++) { > + slot = &cpu_slots->slots[i]; > + /* Busy-wait if node is found. */ > + while (uatomic_load(&slot->node, CMM_ACQUIRE) == node) /* Load B */ > + caa_cpu_relax(); > + } > + } > +} > + > +/* > + * hpref_synchronize_put: Wait for any reader possessing a hazard > + * pointer to clear its slot and put reference > + * count. > + */ > +void hpref_synchronize_put(struct hpref_node *node) > +{ > + if (!node) > + return; > + hpref_synchronize(node); > + urcu_ref_put(&node->refcount, hpref_release); > +} > + > +static __attribute__((constructor)) > +void hpref_init(void) > +{ > + mempool = rseq_mempool_create("hpref", sizeof(struct hpref_percpu_slots), NULL); > + if (!mempool) > + abort(); > + hpref_percpu_slots = rseq_mempool_percpu_zmalloc(mempool); > + if (!hpref_percpu_slots) > + abort(); > +} > + > +static __attribute__((destructor)) > +void hpref_exit(void) > +{ > + rseq_mempool_percpu_free(hpref_percpu_slots); > + if (rseq_mempool_destroy(mempool)) > + abort(); > +}