From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9D3FCFB453 for ; Mon, 7 Oct 2024 19:06:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E5876B0085; Mon, 7 Oct 2024 15:06:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 794FD6B0088; Mon, 7 Oct 2024 15:06:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 636196B0089; Mon, 7 Oct 2024 15:06:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 427F16B0085 for ; Mon, 7 Oct 2024 15:06:53 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id BB888160E6B for ; Mon, 7 Oct 2024 19:06:52 +0000 (UTC) X-FDA: 82647738264.26.916F36C Received: from nyc.source.kernel.org (nyc.source.kernel.org [147.75.193.91]) by imf22.hostedemail.com (Postfix) with ESMTP id 0701DC000B for ; Mon, 7 Oct 2024 19:06:50 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=izGuzbKl; spf=pass (imf22.hostedemail.com: domain of "SRS0=eCVx=RD=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 147.75.193.91 as permitted sender) smtp.mailfrom="SRS0=eCVx=RD=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org"; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728327942; a=rsa-sha256; cv=none; b=YGue/aqGD8l0de1Lupyvyq/Gq3kubLvWdMwlVP/gc5x/FjISYQCFS9zrUFK3v9fphmwCIz P4NWZ1L9g2avwdfeiOmwkhH21SerXqk7BhxlrckMPrrG5fqlx+DDIPxUW2Ttcd03DEIKo/ VVXvUWlUx7gJmvm89XzRmRUm4UcAOqo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=izGuzbKl; spf=pass (imf22.hostedemail.com: domain of "SRS0=eCVx=RD=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 147.75.193.91 as permitted sender) smtp.mailfrom="SRS0=eCVx=RD=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org"; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728327942; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bfw+fzt94QKEyEQeXUiRVYQCeiGjoJJvJ0LwjHt3rvI=; b=mq5pexJyVCUIUq+l9aUJqSwMfPakUX3LIpEzmYLulnIniphuoWjE5KsFs0vW+hXD2c7Mre LgEjys/SDbKhwQPXZnwxUXvw4vsK9z56Gsd3NgvWHmdCtBg+Q+xd6KNnL7haRCiFXeHBLJ aTdfaLWcJmqLm1Us+SJLP5mLGRlBFiA= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 1C7E1A4213B; Mon, 7 Oct 2024 19:06:41 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4CC0DC4CEC6; Mon, 7 Oct 2024 19:06:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1728328009; bh=7WEw/7kbFYdfUNdJz3Sie5fTiFK1weYIIf46W8ELYuw=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=izGuzbKl/6GmG/X90OcT0sFMS73U5BXDTradCzYRt+igyh1qk1xLJiW7iNF5JoODx ZYLw3Dk4ItXjrcdtIfeODMuAVm1ULsDzdVePRI2Yhkd3jHpaPkM6kD/4DdELK6J1f/ dz/z46SoH69jzAAhcrsG/khwY/1Vu70ZO3ivgazMZCIeUEVv9MAEirL1/SrOAQuY6y RGhmGsocFkeGai36BD7ZHd9qt1o5ChiUpnxw9A0HOuegbwjnjoLj2GFqYBcaMivsZo j33Nb1trYoqmbYCrrANMNl2yCtVyyBh/nKbw14nCGydHcu3UHz2XullpWFV5nJDmRR FeyDgWQ+xO1FA== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id E7A95CE1594; Mon, 7 Oct 2024 12:06:48 -0700 (PDT) Date: Mon, 7 Oct 2024 12:06:48 -0700 From: "Paul E. McKenney" To: Mathieu Desnoyers Cc: Peter Zijlstra , Boqun Feng , linux-kernel@vger.kernel.org, Linus Torvalds , Andrew Morton , Nicholas Piggin , Michael Ellerman , Greg Kroah-Hartman , Sebastian Andrzej Siewior , Will Deacon , Alan Stern , John Stultz , Neeraj Upadhyay , Frederic Weisbecker , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Lai Jiangshan , Zqiang , Ingo Molnar , Waiman Long , Mark Rutland , Thomas Gleixner , Vlastimil Babka , maged.michael@gmail.com, Mateusz Guzik , Jonas Oberhauser , rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@lists.linux.dev Subject: Re: [RFC PATCH v2 3/4] hp: Implement Hazard Pointers Message-ID: <139df2b9-0871-4919-bb96-fc5cadddb752@paulmck-laptop> Reply-To: paulmck@kernel.org References: <20241004182734.1761555-1-mathieu.desnoyers@efficios.com> <20241004182734.1761555-4-mathieu.desnoyers@efficios.com> <20241005160444.GA18071@noisy.programming.kicks-ass.net> <20241007104030.GB4879@noisy.programming.kicks-ass.net> <1de5d6ac-7f44-48d6-a5f3-9839d436891a@efficios.com> <5d695bdd-afd2-4e9f-ba92-376a4b302566@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5d695bdd-afd2-4e9f-ba92-376a4b302566@paulmck-laptop> X-Stat-Signature: gknzxn678pic6homuwxw56pap58rzxds X-Rspamd-Queue-Id: 0701DC000B X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1728328010-976552 X-HE-Meta: U2FsdGVkX1+N4yJa+HzvrHq/viHsJyHba12y608/G3JgOQeiUJX9mbrQVYmN9EAMFO6t4KulVWGhD9IQCbz4HWcJHhA1mwN7P3SlU7pRV0u1BRR9fAlKsH3o7VmiR26b+tLqJqXBFH7Oo2b59KnunE2yONSj+WPAiz4Nhvx68FpzBow6zSV0ldUVM3NjnE4xhWF8h1m1ZR3FuuRr4jBlfql9H0bCQ/JWdeH2O6uBp31Os0xoKfHa6y6pjBoox+O2rNoneYJ8ISa1ugW4oqr7qaweWpF5Zb9q0UVhmjrfSeyVLjhyNfp8/JwFIQtWrP5V4S32vbj1zp6V8egrwDsPD1Gf1D/N2kEHl2fD4QVvDDeO1ak03awKKoxtj0SQKdYJhaXd3evzt+h8TarRe7lkTLPcyfIEq4rloOqKp7ws5zlkiFHw8RZNxk6RV7NnvlbmV2RZhfIlijSGBKklymuJ2w4FRpBUu9K2pSQdvikW5qtJHjlhwJax/C+470VkhOY1FpaqDMT3Rm7HOinDabxqSPtKhJJsELRcjFriW4aLX1dZfidbmcfy4xKhz2avgJSUEM+5mhh2UPKMQlkGW5FVnyRJ6Bzka4X6RkdkDpfLkqh+M8syq0JJNEgnsy2P1v1WHxD0T4NSL9EvQk/pOWl9o7/s0Lf959ZC6sw1uBEV16EJdCfFP9Vo1EfT1PI0byx8HH6GcTwJ6QV/YNWwGRZ3Caa5ssnY3pTI8Ho/levGcAR+62TASEzYeNbGizdd/0ccACzh7t8ujyx6IGsiX3Srqy2NpOt280+uUA3SXbenMCL4df1K9ykPIgu7Yr9pJUKoEHzpKhGae5mcfQZVI03ntE0DcBpbnuWIqSebE8v/zNK+Dc+nx3BrXfL/7viQhc1q7x+982FFgUTpdJWoDoNBbNH5TsLz5fQ+k2TkFPKPE5zpz6Bpa8FpShWJFuIsku0RNE+nsTz5YTqV3XyH/3r sN9BnKwe +lFKBK+wLJaT9LACvz0AlbM+pJ6g2IRGPGB0qEiOtKGQWkbuy+k91axz6twyZiOqqsWx3Io/SEBovV2I1fe9OyzkJ9v7Xxn+zgvTnbXdOfG4XrrvMQxs3UkOF2LmFNHWRfoT5sGTxgu7+TAZ/2C1BBql6IqXpvb+4MDVf6c0zHE0fcdV7KfKe4Um1P56Zb0sPHN5TY1kuqwPlShjTAFYdn+dqnWPLt7KlULavFNCrwGycGEaQ+H7IYyVGloG9+uZna8/0W18+uG1mZj+nKtMcaRxwLaRV0AharD9l6LHY6l1bGlzgcyuOSuz8a5x73PmzAKOT/5Nvr7gB3z0ljV0HjSNCkp+WnIXtX7YzTY8iQxOpaLqopHMMHXzVyW1VkvJnU8y2BRDxKRYa5s09LXPtM1UagmP2ewnIe9qTFG9rgJsgUn+QUJKyAr0c54B7YsVNdOHakCAd3etX/xKf+yofD2UxDdChnK+vUAWaPMRkKdL6dOjy+B5n5tJURUXzKImy1VkR5LxMWiEknMlhq+wG9lYJWDvMF9/TcC15rT/H5+gv+hzxouiwlB58FNtO4kZK56M+2Qnb3qLEkJ6QPVqJiBkzrwUndaMFlpsf8FdI6D02LLg4kGtvMZM7oc6IWNOwGeQ9Qd2gP/MbOOslTYhHP5PUJHhcBfsGDwHRLIblxtKkQOsTnAzWzE4qkzQwd51AluMCygVoSzM6EcriwXZcCHz/T0FER6Qrdrou X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 07, 2024 at 11:18:14AM -0700, Paul E. McKenney wrote: > On Mon, Oct 07, 2024 at 10:50:46AM -0400, Mathieu Desnoyers wrote: > > On 2024-10-07 12:40, Peter Zijlstra wrote: > > > On Sat, Oct 05, 2024 at 02:50:17PM -0400, Mathieu Desnoyers wrote: > > > > On 2024-10-05 18:04, Peter Zijlstra wrote: > > > > > > > > > > > > +/* > > > > > > + * hp_allocate: Allocate a hazard pointer. > > > > > > + * > > > > > > + * Allocate a hazard pointer slot for @addr. The object existence should > > > > > > + * be guaranteed by the caller. Expects to be called from preempt > > > > > > + * disable context. > > > > > > + * > > > > > > + * Returns a hazard pointer context. > > > > > > > > > > So you made the WTF'o'meter crack, this here function does not allocate > > > > > nothing. Naming is bad. At best this is something like > > > > > try-set-hazard-pointer or somesuch. > > > > > > > > I went with the naming from the 2004 paper from Maged Michael, but I > > > > agree it could be clearer. > > > > > > > > I'm tempted to go for "hp_try_post()" and "hp_remove()", basically > > > > "posting" the intent to use a pointer (as in on a metaphorical billboard), > > > > and removing it when it's done. > > > > > > For RCU we've taken to using the word: 'publish', no? > > > > I'm so glad you suggest this, because it turns out that from all > > the possible words you could choose from, 'publish' is probably the > > most actively confusing. I'll explain. > > > > Let me first do a 10'000 feet comparison of RCU vs Hazard Pointers > > through a simple example: > > > > [ Note: I've renamed the HP dereference try_post to HP load try_post > > based on further discussion below. ] > > > > *** RCU *** > > > > * Dereference RCU-protected pointer: > > rcu_read_lock(); // [ Begin read transaction ] > > l_p = rcu_dereference(p); // [ Load p: @addr or NULL ] > > if (l_p) > > [ use *l_p ...] > > rcu_read_unlock(); // [ End read transaction ] > > > > * Publish @addr: addr = kmalloc(); > > init(addr); > > rcu_assign_pointer(p, addr); > > > > * Reclaim @addr: rcu_assign_pointer(p, NULL); // [ Unpublish @addr ] > > synchronize_rcu(); // Wait for all pre-existing > > // read transactions to complete. > > kfree(addr); > > > > > > *** Hazard Pointers *** > > > > * Load and post a HP-protected pointer: > > l_p = hp_load_try_post(domain, &p, &slot); > > if (l_p) { > > [ use *l_p ...] > > hp_remove(&slot, l_p); > > } > > > > * Publish @addr: addr = kmalloc(); > > init(addr); > > rcu_assign_pointer(p, addr); > > > > * Reclaim @addr: rcu_assign_pointer(p, NULL); // [ Unpublish @addr ] > > hp_scan(domain, addr, NULL); > > kfree(addr); > > > > Both HP and RCU have publication guarantees, which can in fact be > > implemented in the same way (e.g. rcu_assign_pointer paired with > > something that respects address dependencies ordering). A stronger > > implementation of this would be pairing a store-release with a > > load-acquire: it works, but it would add needless overhead on > > weakly-ordered CPUs. > > > > How the two mechanisms differ is in how they track when it is > > safe to reclaim @addr. RCU tracks reader "transactions" begin/end, > > and makes sure that all pre-existing transactions are gone before > > synchronize_rcu() is allowed to complete. HP does this by tracking > > "posted" pointer slots with a HP domain. As long as hp_scan observes > > that HP readers are showing interest in @addr, it will wait. > > > > One notable difference between RCU and HP is that HP knows exactly > > which pointer is blocking progress, and from which CPU (at least > > with my per-CPU HP domain implementation). Therefore, it is possible > > for HP to issue an IPI and make sure the HP user either completes its > > use of the pointer quickly, or stops using it right away (e.g. making > > the active mm use idle mm instead). > > > > One strength of RCU is that it can track use of a whole set of RCU > > pointers just by tracking reader transaction begin/end, but this is > > also one of its weaknesses: a long reader transaction can postpone > > completion of grace period for a long time and increase the memory > > footprint. In comparison, HP can immediately complete as soon as the > > pointer it is scanning for is gone. Even better, it can send an IPI > > to the belate CPU and abort use of the pointer using a callback. > > Plus, in contrast to hazard pointers, rcu_dereference() cannot say "no". > > This all sounds like arguments *for* use of the term "publish" for > hazard pointers rather than against it. What am I missing here? OK, one thing that I was missing is that this was not about the counterpart to rcu_assign_pointer(), for which I believe "publish" makes a lot of sense, but rather about the setting of a hazard pointer. Here, "protect" is the traditional term of power, which has served users well for some years. Thanx, Paul > > > > > > +/* > > > > > > + * hp_dereference_allocate: Dereference and allocate a hazard pointer. > > > > > > + * > > > > > > + * Returns a hazard pointer context. Expects to be called from preempt > > > > > > + * disable context. > > > > > > + */ > > > > > > > > > > More terrible naming. Same as above, but additionally, I would expect a > > > > > 'dereference' to actually dereference the pointer and have a return > > > > > value of the dereferenced type. > > > > > > > > hp_dereference_try_post() ? > > > > > > > > > > > > > > This function seems to double check and update the hp_ctx thing. I'm not > > > > > at all sure yet wtf this is doing -- and the total lack of comments > > > > > aren't helping. > > > > > > > > The hp_ctx contains the outputs. > > > > > > > > The function loads *addr_p to then try_post it into a HP slot. On success, > > > > it re-reads the *addr_p (with address dependency) and if it still matches, > > > > use that as output address pointer. > > > > > > > > I'm planning to remove hp_ctx, and just have: > > > > > > > > /* > > > > * hp_try_post: Try to post a hazard pointer. > > > > * > > > > * Post a hazard pointer slot for @addr. The object existence should > > > > * be guaranteed by the caller. Expects to be called from preempt > > > > * disable context. > > > > * > > > > * Returns true if post succeeds, false otherwise. > > > > */ > > > > static inline > > > > bool hp_try_post(struct hp_domain *hp_domain, void *addr, struct hp_slot **_slot) > > > > [...] > > > > > > > > /* > > > > * hp_dereference_try_post: Dereference and try to post a hazard pointer. > > > > * > > > > * Returns a hazard pointer context. Expects to be called from preempt > > > > * disable context. > > > > */ > > > > static inline > > > > void *__hp_dereference_try_post(struct hp_domain *hp_domain, > > > > void * const * addr_p, struct hp_slot **_slot) > > > > [...] > > > > > > > > #define hp_dereference_try_post(domain, p, slot_p) \ > > > > ((__typeof__(*(p))) __hp_dereference_try_post(domain, (void * const *) p, slot_p)) > > > > > > This will compile, but do the wrong thing when p is a regular pointer, no? > > > > Right, at least in some cases the compiler may not complain, and people used to > > rcu_dereference() will expect that "p" is the pointer to load rather than the > > address of that pointer. This would be unexpected. > > > > I must admit that passing the address holding the pointer to load rather than > > the pointer to load itself makes it much less troublesome in terms of macro > > layers. But perhaps this is another example where we should wander away from the > > beaten path and use a word different from "dereference" here. E.g.: > > > > /* > > * Use a comma expression within typeof: __typeof__((void)**(addr_p), *(addr_p)) > > * to generate a compile error if addr_p is not a pointer to a pointer. > > */ > > #define hp_load_try_post(domain, addr_p, slot_p) \ > > ((__typeof__((void)**(addr_p), *(addr_p))) __hp_load_try_post(domain, (void * const *) (addr_p), slot_p)) > > > > > > > > > > > > > /* Clear the hazard pointer in @slot. */ > > > > static inline > > > > void hp_remove(struct hp_slot *slot) > > > > [...] > > > > > > Differently weird, but better I suppose :-) > > > > If you find a better word than "remove" to pair with "post", I'm all in :) > > > > > > > > > > > > > > +void hp_scan(struct hp_slot __percpu *percpu_slots, void *addr, > > > > > > + void (*retire_cb)(int cpu, struct hp_slot *slot, void *addr)) > > > > > > +{ > > > > > > + int cpu; > > > > > > + > > > > > > + /* > > > > > > + * Store A precedes hp_scan(): it unpublishes addr (sets it to > > > > > > + * NULL or to a different value), and thus hides it from hazard > > > > > > + * pointer readers. > > > > > > + */ > > > > > > + > > > > > > + if (!addr) > > > > > > + return; > > > > > > + /* Memory ordering: Store A before Load B. */ > > > > > > + smp_mb(); > > > > > > + /* Scan all CPUs slots. */ > > > > > > + for_each_possible_cpu(cpu) { > > > > > > + struct hp_slot *slot = per_cpu_ptr(percpu_slots, cpu); > > > > > > + > > > > > > + if (retire_cb && smp_load_acquire(&slot->addr) == addr) /* Load B */ > > > > > > + retire_cb(cpu, slot, addr); > > > > > > > > > > Is retirce_cb allowed to cmpxchg the thing? > > > > > > > > It could, but we'd need to make sure the slot is not re-used by another > > > > hp_try_post() before the current user removes its own post. It would > > > > need to synchronize with the current HP user (e.g. with IPI). > > > > > > > > I've actually renamed retire_cb to "on_match_cb". > > > > > > Hmm, I think I see. Would it make sense to pass the expected addr to > > > hp_remove() and double check we don't NULL out something unexpected? -- > > > maybe just for a DEBUG option. > > > > > > I'm always seeing the NOHZ_FULL guys hating on this :-) > > > > That's a fair point. Sure, we can do this as an extra safety net. For now I > > will just make the check always present, we can always move it to a debug > > option later. > > > > And now I notice that hp_remove is also used for CPU hotplug (grep > > matches for cpuhp_remove_state()). I wonder if we should go for something > > more grep-friendly than "hp_", e.g. "hazptr_" and rename hp.h to hazptr.h ? > > > > Thanks, > > > > Mathieu > > > > > > -- > > Mathieu Desnoyers > > EfficiOS Inc. > > https://www.efficios.com > >