From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9883EC02180 for ; Wed, 15 Jan 2025 15:35:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0E376B007B; Wed, 15 Jan 2025 10:35:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A97436B0082; Wed, 15 Jan 2025 10:35:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 937B06B0083; Wed, 15 Jan 2025 10:35:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 74A786B007B for ; Wed, 15 Jan 2025 10:35:24 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BD2D7A097C for ; Wed, 15 Jan 2025 15:35:23 +0000 (UTC) X-FDA: 83010085326.29.4CF6D86 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf05.hostedemail.com (Postfix) with ESMTP id 5111210000B for ; Wed, 15 Jan 2025 15:35:21 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=AXm0BFtd; spf=none (imf05.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736955322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3HTKz80HDkCdBoGCl80Fbou++jlhoRe3M27zjaUp9J0=; b=i2zlHilOPXUnKaU0dSHt8DOABLP2Rm0lh5Qy2JJyk0JGpvLAISILCL/un11Esx1DWrRCTA rY+DFkdppP7JdQQtiPvNG1qHpejNOMtaw06FiMPv7hMgr3dxAxos7cliKKuCSJSeY6vv2k h9k2fJK4VkMayG8cXvtb+YW9Gx+IAQE= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=AXm0BFtd; spf=none (imf05.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736955322; a=rsa-sha256; cv=none; b=1NS4z3awCulAYTQTgol/tWE1C3UM3vqxfJ8nP7hvubBHXQ1x7814UlimPXHY5E81wdVGWe ZknpbmzXqQ3OLZlkpTy+Jyw6rGJbYERTvcCNg9F/7vcmoJW7vcvH2ud6Mg6Iu7H+8judlz VPkTXM57s+Ic+VabGC9uGwl+hwE1eSQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=3HTKz80HDkCdBoGCl80Fbou++jlhoRe3M27zjaUp9J0=; b=AXm0BFtdykEtYDgyEGlMUERUiW mXKI4ISR2rIRogZm7NcvNnfSX4sHGjbTTj/NFWPyjVXp5lTtLEPp+pjaxl9ZRrEiBWF+tuqGLLnnS WayZ12ByZ8h94dWCCBKVVApWzsX6Sfq9YXWbBBeAiWQulhn7MjZK2J+Bu2yygoVl868zHuOK8qsxs WP3lQU/JEYqnYoAUFmINaLy5fv+KsDGQeIDktQRQODaETrRkLUKSLseyrTDI0/1oNrq6pbm989Ba/ kAGjPltgb47owRWidjnUA2nqkfFvGWkaG0jR6iTryrogOVkNmvOWCq9Ni04h1XqXVNkI7y0wInkMA iY0Mn/ww==; Received: from 77-249-17-89.cable.dynamic.v4.ziggo.nl ([77.249.17.89] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1tY5QC-0000000FTzx-2PCY; Wed, 15 Jan 2025 15:35:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id CAD233004DE; Wed, 15 Jan 2025 16:35:07 +0100 (CET) Date: Wed, 15 Jan 2025 16:35:07 +0100 From: Peter Zijlstra To: Suren Baghdasaryan Cc: Mateusz Guzik , akpm@linux-foundation.org, willy@infradead.org, liam.howlett@oracle.com, lorenzo.stoakes@oracle.com, david.laight.linux@gmail.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, oliver.sang@intel.com, mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com, hughd@google.com, lokeshgidra@google.com, minchan@google.com, jannh@google.com, shakeel.butt@linux.dev, souravpanda@google.com, pasha.tatashin@soleen.com, klarasmodin@gmail.com, richard.weiyang@gmail.com, corbet@lwn.net, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v9 11/17] mm: replace vm_lock and detached flag with a reference count Message-ID: <20250115153507.GF8362@noisy.programming.kicks-ass.net> References: <20250111042604.3230628-1-surenb@google.com> <20250111042604.3230628-12-surenb@google.com> <20250115104841.GX5388@noisy.programming.kicks-ass.net> <20250115111334.GE8385@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 5111210000B X-Stat-Signature: 9bn7odrk3isfw1f8xo5bdg4wb8yk49fd X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1736955321-684723 X-HE-Meta: U2FsdGVkX18sBolJJDy5oi+8TTY5aR/M2uW5KhuxDIRNfLXkn2wDwXQIvA7UsOOBursDUHZktJHmHhWqt5pzQBg4KIRi6jvIH919Z7HYdTy4HuP4E0gaDmRDFBX8aw+yDtYsjX0UaBR0qZ5uDvhd7acOW2R93G/5xzek9kj7RGmFtyy5Kzn7VqdJhwj8FAV5Dedb44HZhWhSA4zYxTpfV+psQWfz9aA/3FbmdXO9UywqeC+tagNBGCkuhg7h468k+pKlvzl/cqZFdyajAcg6xDvx2xsq65gqcssYHBqH/iBONqbmmTJHiMLD+rZOE3EY3ycrOw0ak8G0IiKzJGYnTyXc7/E+x/USulfE/Q1LmDnPQmiYB2ALF1aW5nlpK+FblMAYMa1TBPzfFnznHo3bMB0RVJfDnMcBdAUZ5LF0a1ue51OLCdr3eGj/+LIgebvHGFc+mrFBGNhMwD6Gva5Rwjxc8UGwiAYUOXTSl7omLjhiNEu16xT+fFoZsHYaf88lItJ8ueJE3QdsGWYLqvl9Ziso7lxjqFYdiAFZdLy0NNfTI346diQ7lYu7/IympuqUL1yf/bAyqOiTwfx/qRBmVG8FGhVpFZ9zyIAhWJkzFp75IJAIaoqVT6qwOP8cFOpSKNpRrUNVPr5YIO/CbzSWR1BA82I8i3O82xjmy6s2xsBzVOCevdJCkTBfxjuREm7yRegyogaLDWeOQLc5cx8CseXrnIz2FWPqEASmZ6E/ARl+dfJOb8lnLjz5K0Ua/Ao2dTP5TfkC7gasfh3c1p30EyhgvK7Dn84k45MDkJDS0GbduAYyLNYg8guTZxS6QfLgTrWhtVlqtngl0AN94kv3DsNdl+cPGBoVCG5uQ6sK9NycFLItRIckLXBKmJVpJJL6ZZqx63liNHl+TuIiDPVTkaeLtGsOJ+RTNRjNHJ3Hyb505xyTlafz8nasR4zgcD1fLcCKNnsQTL6n5qauahm TQw9PrXK v/HiXjcGlqGOUKRvOzCPscsU3UWG7QkkkKxnEQxZK3fJoTUD1q4Z8v8d0STXjhKw7RrYDDFtJwfGlqKaB83EOgL3IVxQWkwVntaDnL8J2ycjzV+3C+3kGZ8kpfdcSIh8ivKStgMzzIIzZNGDoQwpPOSRunJ1QfuVP5tvPBPT9Ocqz+Car4FGYZu97OOM58a06ZYqawWDlrvhvSwiovPm2cZ7wc82Icf8EAzhiFSG5om4KOxq4lgFjsG8OlRsNQgeYJ6EBzUXJJhQV1PUL9+CaHuOHQg5NTw6L35XBwCm2dzLe32JhfuEkIx2fSa3EO9M4PV4lfkmwczIMYGnsroyF0K0w+QgGef/rRUVakv6XWS+heiHrytKegqJ9/M5eaqEbuSZorkKyUMocLBo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 15, 2025 at 07:00:37AM -0800, Suren Baghdasaryan wrote: > On Wed, Jan 15, 2025 at 3:13 AM Peter Zijlstra wrote: > > > > On Wed, Jan 15, 2025 at 11:48:41AM +0100, Peter Zijlstra wrote: > > > On Sat, Jan 11, 2025 at 12:14:47PM -0800, Suren Baghdasaryan wrote: > > > > > > > > Replacing down_read_trylock() with the new routine loses an acquire > > > > > fence. That alone is not a problem, but see below. > > > > > > > > Hmm. I think this acquire fence is actually necessary. We don't want > > > > the later vm_lock_seq check to be reordered and happen before we take > > > > the refcount. Otherwise this might happen: > > > > > > > > reader writer > > > > if (vm_lock_seq == mm_lock_seq) // check got reordered > > > > return false; > > > > vm_refcnt += VMA_LOCK_OFFSET > > > > vm_lock_seq == mm_lock_seq > > > > vm_refcnt -= VMA_LOCK_OFFSET > > > > if (!__refcount_inc_not_zero_limited()) > > > > return false; > > > > > > > > Both reader's checks will pass and the reader would read-lock a vma > > > > that was write-locked. > > > > > > Hmm, you're right. That acquire does matter here. > > > > Notably, it means refcount_t is entirely unsuitable for anything > > SLAB_TYPESAFE_BY_RCU, since they all will need secondary validation > > conditions after the refcount succeeds. > > Thanks for reviewing, Peter! > Yes, I'm changing the code to use atomic_t instead of refcount_t and > it comes out quite nicely I think. I had to add two small helper > functions: > vm_refcount_inc() - similar to refcount_add_not_zero() but with an > acquired fence. > vm_refcnt_sub() - similar to refcount_sub_and_test(). I could use > atomic_sub_and_test() but that would add unnecessary acquire fence in > the pagefault path, so I'm using refcount_sub_and_test() logic > instead. Right. > For SLAB_TYPESAFE_BY_RCU I think we are ok with the > __vma_enter_locked()/__vma_exit_locked() transition in the > vma_mark_detached() before freeing the vma and would not need > secondary validation. In __vma_enter_locked(), vm_refcount gets > VMA_LOCK_OFFSET set, which prevents readers from taking the refcount. > In __vma_exit_locked() vm_refcnt transitions to 0, so again that > prevents readers from taking the refcount. IOW, the readers won't get > to the secondary validation and will fail early on > __refcount_inc_not_zero_limited(). I think this transition correctly > serves the purpose of waiting for current temporary readers to exit > and preventing new readers from read-locking and using the vma. Consider: CPU0 CPU1 rcu_read_lock(); vma = vma_lookup(mm, vaddr); ... cpu goes sleep for a *long time* ... __vma_exit_locked(); vma_area_free() .. vma = vma_area_alloc(); vma_mark_attached(); ... comes back once vma is re-used ... vma_start_read() vm_refcount_inc(); // success!! At which point we need to validate vma is for mm and covers vaddr, which is what patch 15 does, no? Also, I seem to have forgotten some braces back in 2008 :-) --- diff --git a/include/linux/slab.h b/include/linux/slab.h index 10a971c2bde3..c1356b52f8ea 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -115,9 +115,10 @@ enum _slab_flag_bits { * rcu_read_lock(); * obj = lockless_lookup(key); * if (obj) { - * if (!try_get_ref(obj)) // might fail for free objects + * if (!try_get_ref(obj)) { // might fail for free objects * rcu_read_unlock(); * goto begin; + * } * * if (obj->key != key) { // not the object we expected * put_ref(obj);