From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFC59CDD561 for ; Thu, 19 Sep 2024 00:13:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC7706B007B; Wed, 18 Sep 2024 20:13:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E762C6B0082; Wed, 18 Sep 2024 20:13:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3E196B0083; Wed, 18 Sep 2024 20:13:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B2F9A6B007B for ; Wed, 18 Sep 2024 20:13:31 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1BF631C50E9 for ; Thu, 19 Sep 2024 00:13:31 +0000 (UTC) X-FDA: 82579563822.14.711C95D Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com [209.85.167.42]) by imf11.hostedemail.com (Postfix) with ESMTP id 2E9144000D for ; Thu, 19 Sep 2024 00:13:28 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rOLdmqtW; spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726704777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d+8DakgAQqt4SQhUht3h9GkR0aukfPFngDOzdJJ0d8E=; b=qaanJ6VVfrtP1rYQBsz54trs80X9cZbvhLZkgi2ZLTKI88v7Qd0ZSlkmA1vwRkI5iGSoGo QbD7rjN0zNg6OFxD0XPJgrcsqKzBaKLY691Bmoekv4UrmqWKrqvR3ay3r3x0QGJFlIwPal A2yhE/qlTmqcqunLd9+pVjb7S9OSAXQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rOLdmqtW; spf=pass (imf11.hostedemail.com: domain of jannh@google.com designates 209.85.167.42 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726704777; a=rsa-sha256; cv=none; b=5I6smxcW7GOqfIDC86Y9wKqkP4MG8lYeBxCywYGebNVCinMQaTYN/ocYCcGxwuxKUHROrz Ya7JTLOxH38e2sZDDe5p/8XwomnLHVE8e1LXgCKu0ReqrGqgRmkV0Diqi1vlE5+6EArjjo 9jLiMtL/U9GrglSvZsLFhMjptotEUFQ= Received: by mail-lf1-f42.google.com with SMTP id 2adb3069b0e04-53690eb134bso37368e87.0 for ; Wed, 18 Sep 2024 17:13:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726704807; x=1727309607; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=d+8DakgAQqt4SQhUht3h9GkR0aukfPFngDOzdJJ0d8E=; b=rOLdmqtWbU2DOGwZBtCs3PuBR+rutnIfjOJdaLbx3G7mD8w1Hnsoi1v/Gaf6y3Qi78 dhX91tJG0eL20hSl2FQAbskCBNevRZcG278ZkZHUQmZfwGur/DfeoB+ar2LunSOlp1bR yU9RKKr0PvY3jX3s5SGxagtA30xbXNmDIWS1BEux7kBysWAZ2daGh3RxHROVAXeqHD0G wDh8ZAi3riwuQ4BqV6gdclDNQimuM2lYy2zqal8AknO5tlqhYHWOS8HxIQ3si2LtqM7d syXe0EnsicZWN6sfyEcnTXh2yIJ71dQeBCdl7D9DRzJsrfMYHnHN0Nt59hjl2re6L/f5 AFNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726704807; x=1727309607; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d+8DakgAQqt4SQhUht3h9GkR0aukfPFngDOzdJJ0d8E=; b=QE9aNoCeb3lwQr1aEX6fGyyvW0rUoGKQBiH6ESAnrucaVr1bOG8Y3aHoo3vuf/xAOh 0t8r+5XMVUNhUC0qDj61vvqNUmILLzHTvuJlQmu0Bgpsnyba9MZIjKCMUwiOzO0P/9R3 3XKJ1CrGqFiGfih/rVaGxvQhTU5e2Btcpoy7/F0+lxgsfgf9iHNAlFw1uMQLvQ2aFMAz 1EFjqo41kGbdFnfOeyo887T5Upt0IxQd+wD0HwizDLAg/kJrSZMkodo4zdIfAG278XMV dM9B4Nq/Exn8RKI6kle1FXQJ2TaSs8F/Bzs+HEue9AUb0Xn+erBKZq6kuWL1FURfnuqI ReSA== X-Forwarded-Encrypted: i=1; AJvYcCVtj6W5BNybrjH40ZMdy5MQ4U84/t6cSjZDwOFL3R2GBiJaQst0+iYNKWyI2A/zSEMt0NatcyVsbA==@kvack.org X-Gm-Message-State: AOJu0YzmacubU1XcEHb6CCzbGgaAS3nIwApT/BCXoubj4IvDTjRV14fp ioxwMzxa3Kl71THCHlAJdnnD7tW0ZKXG5gQ/xu/2ARkdNbehXVMgtCt3pE6JoWAoC3V4ckdI0eL hSG/oM5RFoV5aCTAs/cPy07Nw+zyqz5b2pxsF X-Google-Smtp-Source: AGHT+IEx+DSA61LMo7F7C1PLH+vJWUY14tFKsCWTJNw6Hkr3gKYEnXZ988BJj+EtKWbVgAwQ18MS//3CsYJHyCGcRwQ= X-Received: by 2002:a05:6512:3f26:b0:52c:dd94:73f3 with SMTP id 2adb3069b0e04-536a7bfa7b6mr37667e87.3.1726704806347; Wed, 18 Sep 2024 17:13:26 -0700 (PDT) MIME-Version: 1.0 References: <20240917143402.930114-1-boqun.feng@gmail.com> <20240917143402.930114-2-boqun.feng@gmail.com> In-Reply-To: <20240917143402.930114-2-boqun.feng@gmail.com> From: Jann Horn Date: Thu, 19 Sep 2024 02:12:48 +0200 Message-ID: Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers To: Boqun Feng , "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@vger.kernel.org, Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Mark Rutland , Thomas Gleixner , Kent Overstreet , Linus Torvalds , Vlastimil Babka , maged.michael@gmail.com, Neeraj Upadhyay Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: zedh4ic81s94ezojqhk39tc1bmfngwq8 X-Rspamd-Queue-Id: 2E9144000D X-Rspamd-Server: rspam11 X-HE-Tag: 1726704808-346571 X-HE-Meta: U2FsdGVkX181S+iJp9y+ORxgJgykpJQUCOJEaW66s7J46QTwhYCiTrvxMS0jQrZatQL8uHQF838eCuGt/xMl9kUSnOGsL0BHkg3R11NytnXzAAAhMqpef6gVifBX6vz1ecBVVwzYbT/cmylKmEs1N+kOVymc6l/eLVsJn2XKhtm41gZige3+VfiWScYe/NM9obeaDEByQshRrhOmaQiXRqkyu82qmT0xU0IGhdSHids8npT5ygaHl3H5YkLWlAlBnIixakZgMYCoaw0KDA0ot2FxYI//HKgV9GHhnEz5njxkNgnR5swjBHcgmajQKf6W94Dnri3bsgYSmOZgjX5/KzcsMcOd81fOxA10jYwVh8DvWMM/xa70Om56l4JD04xepTWxlsy5hOgDLbEpVP6st/tH4rDJtzlCg23dpx2pzyt5H5AkqUMSjWTTGAUUX/GgD0ielw8+Llukwj1RG/qB/Zm80bAPWcA4lmcT4sw4RSyy5L34fUWMNuspmykfcYObg93aV8sugKFq5Kd8cBL5rMfI6CKHfNzq/bL59L7ircuB9G03f5+IkDFDZlCrGGYzLp5URYsH8ObAtrhgvkrsp7tlPnBBOT0u0J8eekiqDRdJ+oAoQhv/t3tql19rw9nly9iu3v5ijH3czCeA8sV9WFPOSR7Z6o/a7DVc7Tg4k2kL/qVEvK6Suo1zHqfPuiQQIxDANdAYo/M+ummZKxYIVQZzu9hnGW7VVqnEugzZcyUuhsTeWOlTaTi1yS1yj4YlDH1WjvO7eAP3LypBIRVGxOqrKr65xP55ECOrBQIZUfyCzTWP82cF93OCbq7i8Mr+o5Q5w6xdhnlzerjD1VbTo3DUH83+1WwPiW3ntsSo3kM7qp0fG72Oe+a/08O9jXutKxY7oPxACNFYSO1WtTCpvTM0Bxnh6F5F4N4HvoHcs66tEsTILY91dqJmvsYL+bJBvqw3hZ3paKUFOy1bYb7 W4rl/AhD LWW6NvDrXqLMqmOlekb6WfhO8a2M7B4mCdMsdvbSBUw8UDTA3IyliNmObGazpR65EujBkz2Mhgc8qTPHdN+EsFTRsCbX7Q3y4d6CKjWHV/YmlbA4ySdstbYoxVUdLqsBTIQ5enA/h7UJAGuZR3W7GIbF8X08R376+tjTv/KOXFFNqFQjw7aWUIud77c4oHQAqOs7F//AWy9nFO1ZhcOUPkTEFtSEhlduKF64hTEOgjckVq9wz627If2J1a4OMsb+rSztetCvfTmgil7D9UiXWFH54mvplikxJajkS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 17, 2024 at 4:33=E2=80=AFPM Boqun Feng w= rote: > Hazard pointers [1] provide a way to dynamically distribute refcounting > and can be used to improve the scalability of refcounting without > significant space cost. > +static inline void *__hazptr_tryprotect(hazptr_t *hzp, > + void *const *p, > + unsigned long head_offset) > +{ > + void *ptr; > + struct callback_head *head; > + > + ptr =3D READ_ONCE(*p); > + > + if (ptr =3D=3D NULL) > + return NULL; > + > + head =3D (struct callback_head *)(ptr + head_offset); > + > + WRITE_ONCE(*hzp, head); > + smp_mb(); > + > + ptr =3D READ_ONCE(*p); // read again > + > + if (ptr + head_offset !=3D head) { // pointer changed > + WRITE_ONCE(*hzp, NULL); // reset hazard pointer > + return NULL; > + } else > + return ptr; > +} I got nerdsniped by the Plumbers talk. So, about that smp_mb()... I think you should be able to avoid the smp_mb() using relaxed atomics (on architectures that have those), at the cost of something like a cmpxchg-acquire sandwiched between a load-acquire and a relaxed load? I'm not sure how their cost compares to an smp_mb() though. Something like this, *assuming there can only be one context at a time waiting for a given hazptr_t*: typedef struct { /* consists of: marker bit in least significant bit, rest is normal point= er */ atomic_long_t value; } hazptr_t; /* assumes that hzp is currently set to NULL (but it may contain a marker bit) */ static inline void *__hazptr_tryprotect(hazptr_t *hzp, void *const *p) { /* note that the loads of these three operations are ordered using acquire semantics */ void *ptr =3D smp_load_acquire(p); /* set pointer while leaving marker bit intact */ unsigned long hazard_scanning =3D atomic_long_fetch_or_acquire((unsigned long)ptr, &hzp->value); if (unlikely(hazard_scanning)) { BUG_ON(hazard_scanning !=3D 1); /* slowpath, concurrent hazard pointer waiter */ smp_mb(); } if (READ_ONCE(*p) =3D=3D ptr) { /* recheck */ atomic_long_and(~1UL, &hzp->value); return NULL; } return ptr; } /* simplified for illustration, assumes there's only a single hazard pointer @hzp that could point to @ptr */ static void remove_pointer_and_wait_for_hazard(hazptr_t *hzp, void *ptr, void *const *p) { WRITE_ONCE(*p, NULL); smb_mb(); /* set marker bit */ atomic_long_or(1UL, &hzp->value); while ((void*)(atomic_long_read(&hzp->value) & ~1UL) =3D=3D ptr)) wait(); /* clear marker bit */ atomic_long_and(~1UL, &hzp->value); } The idea would be that the possible orderings when these two functions race against each other are: Ordering A: The atomic_long_fetch_or_acquire() in __hazptr_tryprotect() happens after the atomic_long_or(), two subcases: Ordering A1 (slowpath): atomic_long_fetch_or_acquire() is ordered before the atomic_long_and() in remove_pointer_and_wait_for_hazard(), so the marker bit is observed set, "hazard_scanning" is true. We go on the safe slowpath which is like the original patch, so it's safe. Ordering A2 (recheck fails): atomic_long_fetch_or_acquire() is ordered after the atomic_long_and() in remove_pointer_and_wait_for_hazard(), so the subsequent READ_ONCE(*p) is also ordered after the atomic_long_and(), which is ordered after the WRITE_ONCE(*p, NULL), so the READ_ONCE(*p) recheck must see a NULL pointer and fail. Ordering B (hazard pointer visible): The atomic_long_fetch_or_acquire() in __hazptr_tryprotect() happens before the atomic_long_or(). In that case, it also happens before the atomic_long_read() in remove_pointer_and_wait_for_hazard(), so the hazard pointer will be visible to remove_pointer_and_wait_for_hazard(). But this seems pretty gnarly/complicated, so even if my 2AM reasoning ability is correct, actually implementing this might not be a good idea... and it definitely wouldn't help on X86 at all, since X86 doesn't have such nice relaxed RMW ops.