From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B839CCFB43F for ; Mon, 7 Oct 2024 14:52:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 494956B00A3; Mon, 7 Oct 2024 10:52:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 444F66B00A5; Mon, 7 Oct 2024 10:52:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E53A6B00A8; Mon, 7 Oct 2024 10:52:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0D9716B00A3 for ; Mon, 7 Oct 2024 10:52:49 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AA04616060D for ; Mon, 7 Oct 2024 14:52:48 +0000 (UTC) X-FDA: 82647098016.23.F76A175 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by imf09.hostedemail.com (Postfix) with ESMTP id DB25E140012 for ; Mon, 7 Oct 2024 14:52:45 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=nK9FgC2o; spf=pass (imf09.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728312588; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NawXspKaXRul+L7M39N/Dk/m0cgybQ1z0tmnmK7Xucg=; b=l3gYPRUr14auTn513peukJa9iED2vDRFH052e23hDgQUwFXX/BEYmc9SCvXAsLAYSeJ6QW yHWWgBeVu7V65Ij0/ppTkfEl6IPmtTvC6g2bzlin2yok2A2jPzDlbfcEOn6ADLLvMIYa0a WbYIAoLt2CQ+BN1qirovqIiAEJAfeE8= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=efficios.com header.s=smtpout1 header.b=nK9FgC2o; spf=pass (imf09.hostedemail.com: domain of mathieu.desnoyers@efficios.com designates 167.114.26.122 as permitted sender) smtp.mailfrom=mathieu.desnoyers@efficios.com; dmarc=pass (policy=none) header.from=efficios.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728312588; a=rsa-sha256; cv=none; b=P04EOjupVUT7qIRmjQ3QHS1lQUGEHLDApBQio1px//dzSrCgJLxAG2gxXB3VfOFlXz6Cz0 Q4yE4Sm3VdAAZEPqLhR1yF1dC/qjX8Uo2lWixeAxcsnxpPkrGYrIDfJIWJJ/bGDleSY+sN czxnDIqjquuQX/ULCYch5UiNVAeo04c= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1728312764; bh=by3HtsyG3tgBDfTrQEa3j4QGF/KBOlt4hTjDqg6f5pM=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=nK9FgC2oUZVOf+o+gXPkdUNlgPqbSY9okuEqGMpj7boUHKmf1Xm/AJNdhKXIk55Sq iH7IfFH7TpD1m8PaDLmM/yHn9O60vk9jsiMlosJ9hd+4TkFFKmjFN+giYrZkorkdPf LIx8VTUQ8DpnGbSQdKNoayA6Cf3cSVcdVdkaJi96ABxnaAmiGGvKiOTdCnmg0A/ucg aXpVxtoEnMbS8hjHSAETwSzsUrmPuwCRxmUFRcVsAAjl0T1/GGsr3YaTXAVzCHto+p ufZ+GSXblmS6uMG7RecAU/LWkNPQELuiLzV4HwakzQjGzOY3ulKLfx+wr4FU1DA7cJ Kue9uMLcRc2Yg== Received: from [172.16.0.134] (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XMhvv71C7z3DS; Mon, 7 Oct 2024 10:52:43 -0400 (EDT) Message-ID: <1de5d6ac-7f44-48d6-a5f3-9839d436891a@efficios.com> Date: Mon, 7 Oct 2024 10:50:46 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 3/4] hp: Implement Hazard Pointers To: Peter Zijlstra Cc: Boqun Feng , linux-kernel@vger.kernel.org, Linus Torvalds , Andrew Morton , Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , "Paul E. McKenney" , Will Deacon , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev References: <20241004182734.1761555-1-mathieu.desnoyers@efficios.com> <20241004182734.1761555-4-mathieu.desnoyers@efficios.com> <20241005160444.GA18071@noisy.programming.kicks-ass.net> <20241007104030.GB4879@noisy.programming.kicks-ass.net> From: Mathieu Desnoyers Content-Language: en-US In-Reply-To: <20241007104030.GB4879@noisy.programming.kicks-ass.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: DB25E140012 X-Stat-Signature: xh4e93r95pg7rk3sj6yshzpzt8ehzanm X-Rspam-User: X-HE-Tag: 1728312765-752887 X-HE-Meta: U2FsdGVkX1+WKWIIkKnd8Owh78jHd9kj4CrkBlMQnBAPdIhIxAtIadP1IRm64THaSYFTCR2z35vHRfZ61vyqeZMDe7rs3FZUae91TNsCnGJpFH3dlUpii+LQNZh+Xj7bCPHFr0j/z1NfTVlS2FMBa/G42p0VnrMbj0s5VLh112RylM5Nk9ttzgOL9dxmIuK2fGduIbAJcwZO4+fTpDbEY08bG6EIPIVSY/L3BrNZrrkl9DQbK/QWH3YMwMPhSmeCxksTrfdO+oTBde9PU/5FerK/2lnKU50eyX+y8WJQApqcjsGozbWr1nFerT5pcP9YHAYQBD6RGjhBJLLvXJQNnVJ8phjTivuuNR1BOWZCM/iEaLEEzPZuls3d4KBqPO6HEFx31RGYMghCwbp0lzewEODBSiWsvSHqckwa+7M9Xo+GZI9r2cAx8paCkd0ooTwhKmh9CAGnRQeSFlu6NJ7okJuWmZWxNllPg7gMC9xJGtlVRu9ql8Pkq0JQIpI1tpFrOnmBQ5u9aQgXEFWeBhIqjLdEvrYUNp6P8Vi22seJbO7l3gF7RpZmXcPtOHykKoU2MeEzcT5bVYwnf3jLbC/h0tVUIDegbfCI5LXbzqfTWPDSZM/pIoLAXk7g7cLL4ZkieLEqw3QSZ3WhPqbVfUcTgqDTQY22bpash+wy3o770oEBDk+gFCSBhHQfhPwMX9z0agYOygAlJA40e3F+ny0tB9LuNIkIE+b6fhvq18awWO3FQValM1aB6LA40b5c4d9PmFDvn50P2OiJmQHVJGzBJ7v/Jme2GmyJEEFIqvRGx3pAMtXxkSeQXSX6W+FFkV/DJT++9ydn+Hzv+QXUfWaVAWPlDReKAtr0YOqSMisoFPBKp3olu/Q+7xKN/SrXDSV+uD38/eyW0yS22EUEWD7E4tIArbIXBI8z+bszbG0LGNFAzQNBx5lWxoNxoBl0Fl4z05ZiSDehtDkoLDr/ies T5DdXJHb +ghHJ2sqzbQ60S5RVNRFe7rKFmfxMoGWANL3X2tkcE++4mlKRVS0LINsuA0fuq6C2MMoCK92MR1yiik2qpjHYq5VJ9zziQlCe0jz+sf+fMrtFU4fH3JG30Jy9pVOyt6RaRZvjwJKlYq1EncmbpivT0c6z9jUqTp4Bg9jbefa/xAF8UzdPZqM3lnTqdig4qoNGYWTimwvaAljQ+PzFGKierIyc4bl9CEIZ85+COzyijEQx0tLBEM1O9ksDXew8QM0OgqBSc3NfFZ5DJSfNhumoPlzjuHWGJo5FyFhojEEstVq2hODlwsUyX+uf3LeE478U0YNx77VLU5VSsd+BXzD3CvjPQihEJDYkasm9fI2P+NeLjkVPQdVuaRxd6bqNT47vqyMj4finE5IV/4S/rFW/9tnY3ngqPs/MeFBQc/p3UYg0YrE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024-10-07 12:40, Peter Zijlstra wrote: > On Sat, Oct 05, 2024 at 02:50:17PM -0400, Mathieu Desnoyers wrote: >> On 2024-10-05 18:04, Peter Zijlstra wrote: > > >>>> +/* >>>> + * hp_allocate: Allocate a hazard pointer. >>>> + * >>>> + * Allocate a hazard pointer slot for @addr. The object existence should >>>> + * be guaranteed by the caller. Expects to be called from preempt >>>> + * disable context. >>>> + * >>>> + * Returns a hazard pointer context. >>> >>> So you made the WTF'o'meter crack, this here function does not allocate >>> nothing. Naming is bad. At best this is something like >>> try-set-hazard-pointer or somesuch. >> >> I went with the naming from the 2004 paper from Maged Michael, but I >> agree it could be clearer. >> >> I'm tempted to go for "hp_try_post()" and "hp_remove()", basically >> "posting" the intent to use a pointer (as in on a metaphorical billboard), >> and removing it when it's done. > > For RCU we've taken to using the word: 'publish', no? I'm so glad you suggest this, because it turns out that from all the possible words you could choose from, 'publish' is probably the most actively confusing. I'll explain. Let me first do a 10'000 feet comparison of RCU vs Hazard Pointers through a simple example: [ Note: I've renamed the HP dereference try_post to HP load try_post based on further discussion below. ] *** RCU *** * Dereference RCU-protected pointer: rcu_read_lock(); // [ Begin read transaction ] l_p = rcu_dereference(p); // [ Load p: @addr or NULL ] if (l_p) [ use *l_p ...] rcu_read_unlock(); // [ End read transaction ] * Publish @addr: addr = kmalloc(); init(addr); rcu_assign_pointer(p, addr); * Reclaim @addr: rcu_assign_pointer(p, NULL); // [ Unpublish @addr ] synchronize_rcu(); // Wait for all pre-existing // read transactions to complete. kfree(addr); *** Hazard Pointers *** * Load and post a HP-protected pointer: l_p = hp_load_try_post(domain, &p, &slot); if (l_p) { [ use *l_p ...] hp_remove(&slot, l_p); } * Publish @addr: addr = kmalloc(); init(addr); rcu_assign_pointer(p, addr); * Reclaim @addr: rcu_assign_pointer(p, NULL); // [ Unpublish @addr ] hp_scan(domain, addr, NULL); kfree(addr); Both HP and RCU have publication guarantees, which can in fact be implemented in the same way (e.g. rcu_assign_pointer paired with something that respects address dependencies ordering). A stronger implementation of this would be pairing a store-release with a load-acquire: it works, but it would add needless overhead on weakly-ordered CPUs. How the two mechanisms differ is in how they track when it is safe to reclaim @addr. RCU tracks reader "transactions" begin/end, and makes sure that all pre-existing transactions are gone before synchronize_rcu() is allowed to complete. HP does this by tracking "posted" pointer slots with a HP domain. As long as hp_scan observes that HP readers are showing interest in @addr, it will wait. One notable difference between RCU and HP is that HP knows exactly which pointer is blocking progress, and from which CPU (at least with my per-CPU HP domain implementation). Therefore, it is possible for HP to issue an IPI and make sure the HP user either completes its use of the pointer quickly, or stops using it right away (e.g. making the active mm use idle mm instead). One strength of RCU is that it can track use of a whole set of RCU pointers just by tracking reader transaction begin/end, but this is also one of its weaknesses: a long reader transaction can postpone completion of grace period for a long time and increase the memory footprint. In comparison, HP can immediately complete as soon as the pointer it is scanning for is gone. Even better, it can send an IPI to the belate CPU and abort use of the pointer using a callback. > > >>>> +/* >>>> + * hp_dereference_allocate: Dereference and allocate a hazard pointer. >>>> + * >>>> + * Returns a hazard pointer context. Expects to be called from preempt >>>> + * disable context. >>>> + */ >>> >>> More terrible naming. Same as above, but additionally, I would expect a >>> 'dereference' to actually dereference the pointer and have a return >>> value of the dereferenced type. >> >> hp_dereference_try_post() ? >> >>> >>> This function seems to double check and update the hp_ctx thing. I'm not >>> at all sure yet wtf this is doing -- and the total lack of comments >>> aren't helping. >> >> The hp_ctx contains the outputs. >> >> The function loads *addr_p to then try_post it into a HP slot. On success, >> it re-reads the *addr_p (with address dependency) and if it still matches, >> use that as output address pointer. >> >> I'm planning to remove hp_ctx, and just have: >> >> /* >> * hp_try_post: Try to post a hazard pointer. >> * >> * Post a hazard pointer slot for @addr. The object existence should >> * be guaranteed by the caller. Expects to be called from preempt >> * disable context. >> * >> * Returns true if post succeeds, false otherwise. >> */ >> static inline >> bool hp_try_post(struct hp_domain *hp_domain, void *addr, struct hp_slot **_slot) >> [...] >> >> /* >> * hp_dereference_try_post: Dereference and try to post a hazard pointer. >> * >> * Returns a hazard pointer context. Expects to be called from preempt >> * disable context. >> */ >> static inline >> void *__hp_dereference_try_post(struct hp_domain *hp_domain, >> void * const * addr_p, struct hp_slot **_slot) >> [...] >> >> #define hp_dereference_try_post(domain, p, slot_p) \ >> ((__typeof__(*(p))) __hp_dereference_try_post(domain, (void * const *) p, slot_p)) > > This will compile, but do the wrong thing when p is a regular pointer, no? Right, at least in some cases the compiler may not complain, and people used to rcu_dereference() will expect that "p" is the pointer to load rather than the address of that pointer. This would be unexpected. I must admit that passing the address holding the pointer to load rather than the pointer to load itself makes it much less troublesome in terms of macro layers. But perhaps this is another example where we should wander away from the beaten path and use a word different from "dereference" here. E.g.: /* * Use a comma expression within typeof: __typeof__((void)**(addr_p), *(addr_p)) * to generate a compile error if addr_p is not a pointer to a pointer. */ #define hp_load_try_post(domain, addr_p, slot_p) \ ((__typeof__((void)**(addr_p), *(addr_p))) __hp_load_try_post(domain, (void * const *) (addr_p), slot_p)) > >> >> /* Clear the hazard pointer in @slot. */ >> static inline >> void hp_remove(struct hp_slot *slot) >> [...] > > Differently weird, but better I suppose :-) If you find a better word than "remove" to pair with "post", I'm all in :) > > >>>> +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, >>>> + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)) >>>> +{ >>>> + int cpu; >>>> + >>>> + /* >>>> + * Store A precedes hp_scan(): it unpublishes addr (sets it to >>>> + * NULL or to a different value), and thus hides it from hazard >>>> + * pointer readers. >>>> + */ >>>> + >>>> + if (!addr) >>>> + return; >>>> + /* Memory ordering: Store A before Load B. */ >>>> + smp_mb(); >>>> + /* Scan all CPUs slots. */ >>>> + for_each_possible_cpu(cpu) { >>>> + struct hp_slot *slot = per_cpu_ptr(percpu_slots, cpu); >>>> + >>>> + if (retire_cb && smp_load_acquire(&slot->addr) == addr) /* Load B */ >>>> + retire_cb(cpu, slot, addr); >>> >>> Is retirce_cb allowed to cmpxchg the thing? >> >> It could, but we'd need to make sure the slot is not re-used by another >> hp_try_post() before the current user removes its own post. It would >> need to synchronize with the current HP user (e.g. with IPI). >> >> I've actually renamed retire_cb to "on_match_cb". > > Hmm, I think I see. Would it make sense to pass the expected addr to > hp_remove() and double check we don't NULL out something unexpected? -- > maybe just for a DEBUG option. > > I'm always seeing the NOHZ_FULL guys hating on this :-) That's a fair point. Sure, we can do this as an extra safety net. For now I will just make the check always present, we can always move it to a debug option later. And now I notice that hp_remove is also used for CPU hotplug (grep matches for cpuhp_remove_state()). I wonder if we should go for something more grep-friendly than "hp_", e.g. "hazptr_" and rename hp.h to hazptr.h ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. https://www.efficios.com