From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2B40CF396A for ; Thu, 19 Sep 2024 20:31:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A98A6B0082; Thu, 19 Sep 2024 16:31:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5328D6B0085; Thu, 19 Sep 2024 16:31:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3ACF36B008A; Thu, 19 Sep 2024 16:31:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1C7C36B0082 for ; Thu, 19 Sep 2024 16:31:06 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id BC9744103B for ; Thu, 19 Sep 2024 20:31:05 +0000 (UTC) X-FDA: 82582632090.09.BF5005C Received: from frasgout11.his.huawei.com (frasgout11.his.huawei.com [14.137.139.23]) by imf13.hostedemail.com (Postfix) with ESMTP id 5E59920006 for ; Thu, 19 Sep 2024 20:31:01 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of jonas.oberhauser@huaweicloud.com designates 14.137.139.23 as permitted sender) smtp.mailfrom=jonas.oberhauser@huaweicloud.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726777750; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tDcW6yVJqjkdBfjrD5KrUM0gCzPKLGCFVmewjeIlVO0=; b=DM1J85aDzF6T0s+dOi/0tW3whzrL6ZbsQYZhE5ENJVykm+hzIsqWj8Mcv5Ma9vfn854CvC dQDs6Cj8hFsPl9Nqmw+NjYcwq0xoIYIjqGB2NdmMWmEutsggERbx51H35nYrPa7RpQEFrj OkF5DSHONMVlvZNoLL3XpsUUVL5Xwbk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726777750; a=rsa-sha256; cv=none; b=cC77TidySKiL09CPyhs6Vld47gj6sWNgu+wyuo0vJCq2PIsfKwNwIvUuhL2xL2diQzOLu5 tqzwS0spH/YJqIhkQSWH0GPwl9mZqTib1qs7ryZvkX3w/Ni768qMxWgrDShzijBtRkvRmW nsVFK9LDAeCxxIo/ZXA8eepPXymj58w= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of jonas.oberhauser@huaweicloud.com designates 14.137.139.23 as permitted sender) smtp.mailfrom=jonas.oberhauser@huaweicloud.com; dmarc=none Received: from mail.maildlp.com (unknown [172.18.186.51]) by frasgout11.his.huawei.com (SkyGuard) with ESMTP id 4X8mqk5HCcz9v7Hq for ; Fri, 20 Sep 2024 04:11:14 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id 302C21401F1 for ; Fri, 20 Sep 2024 04:30:54 +0800 (CST) Received: from [10.81.207.148] (unknown [10.81.207.148]) by APP1 (Coremail) with SMTP id LxC2BwAXqTDsiexm4cs+AQ--.1103S2; Thu, 19 Sep 2024 21:30:53 +0100 (CET) Message-ID: Date: Thu, 19 Sep 2024 22:30:34 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard pointers To: Jann Horn , Boqun Feng , "Paul E. McKenney" Cc: linux-kernel@vger.kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, lkmm@vger.kernel.org, Frederic Weisbecker , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Uladzislau Rezki , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Mark Rutland , Thomas Gleixner , Kent Overstreet , Linus Torvalds , Vlastimil Babka , maged.michael@gmail.com, Neeraj Upadhyay , lkmm@lists.linux.dev References: <20240917143402.930114-1-boqun.feng@gmail.com> <20240917143402.930114-2-boqun.feng@gmail.com> From: Jonas Oberhauser In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:LxC2BwAXqTDsiexm4cs+AQ--.1103S2 X-Coremail-Antispam: 1UD129KBjvJXoWxur18XrW3urWxZw4ftFyDtrb_yoWrAF48pr WUKF1jyF4vywn2k34DZw42q3s7Gr1fZFy5G3s5K34UA3y5uF1SvFy3KrWa9FWkur4vyw10 vrsxZas7tr98JFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvYb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIEc7CjxV AFwI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7MxkF7I0En4kS 14v26r4a6rW5MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I 8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVW8ZVWr XwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x 0267AKxVW8JVWxJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_ Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBIdaVFxhVjvjDU0xZFpf9x07 jxCztUUUUU= X-CM-SenderInfo: 5mrqt2oorev25kdx2v3u6k3tpzhluzxrxghudrp/ X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5E59920006 X-Stat-Signature: z3631umtzh3pikn8cp394xdc8mayopoq X-HE-Tag: 1726777861-183871 X-HE-Meta: U2FsdGVkX19z2jHe1daOMvznCyMIC68cWqy0iINSXnCh7PWeD6m6PKeOUpblTBAYEMEqFyy8IMLEM+Khh3/zLEkv/FWWmEVFW6l7lM/JqpTk/jbKkkKA9OUMfawj/SwflAqzmCX6Deulis5xrYuOIDIrFVWrR0ak5YcduWSlFzfBocxM/oKfTXIGiL3xzuTiMLJbMDx7MX7QTI552iXllcV6PA9NKCNQ8CR8suj3cY6qeBX/w29U7krv1UyZgHtjk3scZ7fKs+VPhN/Q0GEcAg7HtG4h1ZBq/DLyNA7mLXnYuM3yjHMygi5kpdKYGKbku+tdWrJ+qtdP9tC8fkE4MX/rp6pVb0MJU9Ha3l3ADEiuOMiAmXiadgtuRaCATHgW7ijk7FfwZEVT7PPMl7emi46WtqwORuonmzj3fpVk8OPAxsoKrHGCyhrAAC4s7l6pZ2X92z+LthwqVwtjPKDvFV7M6Hz+HDpsxBP0LBx1WIJxzeSMvOVHjoUrLKFZhNSlvG23u2Jk3h76jeIYqiezTCM2HRZ4A9oRDi8R/64SUZwRCxnJKggYM9gR5Xabr83ft12NkrvWke+IWk0lw9/oUDcHAP2w2NMoknACYT36Xp/yVX6jVnOOtnOEW9oQ5qOzhI5NT0Bh1EldavKRAX4NvPnZLWQoLq31YvKc2sB1ZvbF25eZqBk9msqWofBwTp+etyOQnye7FMvNErgWa31A+Ds8sdctugYMGqu8PyAV6gKT0BE4bMLwSYYVAgyy+F1XyUTy7pBX7XnTqxQCHy6p9kzVMdm6gDZ31sgyxuWVubmMF98w2DBJr2Xc55PWbp8hRdgNrR/OYEsxVYAkPF6KqmQq5F9r4OCFcwlpekaNALRcKmMQ3BEMvwO5/YQc1LgTK6ZfJR2gmg6js5yAEB8AK19vT/TcnHCS6+dHfOiWz8D7/yQ1b6gjanMv+nx7S8aT2qSlsdhYY76LWkzS9kV op9/a+Kg PO0jsjq4SvO2thH5YeGpBdkDbZE4Hs2URGgDOwmQuWvTpkt8sb8tHmrD1FTPoDRnfmWw5YBQ4VCCNt3uFcQHnpwD5guQBtz4rUrwgEOxlsZEjac9z89E9sKR0APfMaU8pGAWNHXoQJN/i9C3HT93Vn0cghNJFgU/tOR8vIJkI5qksvDByi9ReP2ndVqk3G5hoWrzPYIVR7Ya/IpY0iPjufkAUUhln4MY8G0cMtTmh61yhcBLEImfe8M0626aeY+kSaBf/tKUgIzuj0ydzDYR8/p6Pyw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Am 9/19/2024 um 2:12 AM schrieb Jann Horn: > On Tue, Sep 17, 2024 at 4:33 PM Boqun Feng wrote: >> Hazard pointers [1] provide a way to dynamically distribute refcounting >> and can be used to improve the scalability of refcounting without >> significant space cost. > >> +static inline void *__hazptr_tryprotect(hazptr_t *hzp, >> + void *const *p, >> + unsigned long head_offset) >> +{ >> + void *ptr; >> + struct callback_head *head; >> + >> + ptr = READ_ONCE(*p); >> + >> + if (ptr == NULL) >> + return NULL; >> + >> + head = (struct callback_head *)(ptr + head_offset); >> + >> + WRITE_ONCE(*hzp, head); >> + smp_mb(); >> + >> + ptr = READ_ONCE(*p); // read again >> + >> + if (ptr + head_offset != head) { // pointer changed >> + WRITE_ONCE(*hzp, NULL); // reset hazard pointer >> + return NULL; >> + } else >> + return ptr; >> +} > > I got nerdsniped by the Plumbers talk. So, about that smp_mb()... > > I think you should be able to avoid the smp_mb() using relaxed atomics > (on architectures that have those), at the cost of something like a > cmpxchg-acquire sandwiched between a load-acquire and a relaxed load? > I'm not sure how their cost compares to an smp_mb() though. We have done a similar scheme before, and on some architectures (not x86) the RMW is slightly cheaper than the mb. Your reasoning is a bit simplified because it seems to assume a stronger concept of ordering than LKMM has, but even with LKMM's ordering your code seems fine. I feel it can even be simplified a little, the hazard bit does not seem necessary. Assuming atomic operations for everything racy, relaxed unless stated otherwise: (R)eader: old = read p // I don't think this needs acq, because of address dependencies (*) haz ||=_acq old if (read p != old) retry; *old (W)riter: p =_??? ... // assuming we don't set it to null this needs a rel --- mb --- haz ||= 0 while (read_acq haz == old) retry; delete old In order to get a use-after-free, both of the R:read p need to read before W:p = ... , so because of the W:mb, they execute before W:haz||=0 . Also, for the use-after-free, W:read_acq haz needs to read before (the write part of) R:haz||=_acq old . Then the W:haz ||= 0 also needs to read before (the write part of) R:haz||=_acq old because of coherence on the same location. Since both of them are atomic RMW, the W:haz||= 0 also needs to write before (the write part of) R:haz||=_acq old , and in the absence of non-RMW writes (so assuming no thread will just store to the hazard pointer), this implies that the latter reads-from an rmw-sequence-later store than W:haz||=0 , which therefore executes before R:haz||=_acq old . But because of the acquire barrier, R:haz||=_acq old executes before the second R:read p . This gives a cycle (2nd)R:read p ->xb W:haz||=0 ->xb R:haz||=_acq ->xb (2nd)R:read p and therefore is forbidden. Note that in this argument, the two R:read p are not necessarily reading from the same store. Because of ABA, it can happen that they read from distinct stores and see the same value. It does require clearing hazard pointers with an RMW like atomic_and(0) (**). The combination of the two RMW (for setting & clearing the hazard pointer) might in total be slower again than an mb. (I took the liberty to add an acquire barrier in the writer's while loop for the case where we read from the (not shown) release store of the reader that clears the hazard pointer. It's arguable whether that acq is needed since one could argue that a delete is a kind of store, in which case control dependency would handle it.) have fun, jonas (* you talk about sandwiching, and the data dependency does guarantee some weaker form of sandwiching than the acq, but I don't think sandwiching is required at all. If you happened to be able to set the hazard pointer before reading the pointer, that would just extend the protected area, wouldn't it? ) (** I think if you clear the pointer with a store, the hazard bit does not help. You could be overwriting the hazard bit, and the RMWs that set the hazard bit might never propagate to your CPU. Also in your code you probably meant to clear the whole hazard pointer in the retry code of the reader, not just the hazard bit.)