From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA996CFA465 for ; Wed, 23 Oct 2024 20:10:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2B8466B00A3; Wed, 23 Oct 2024 16:10:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 268686B00A4; Wed, 23 Oct 2024 16:10:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E24F6B00A5; Wed, 23 Oct 2024 16:10:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id E1F906B00A3 for ; Wed, 23 Oct 2024 16:10:41 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5BEF1C036D for ; Wed, 23 Oct 2024 20:10:22 +0000 (UTC) X-FDA: 82705959336.23.06AA103 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf01.hostedemail.com (Postfix) with ESMTP id 1CA774001D for ; Wed, 23 Oct 2024 20:10:24 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IegbwoeX; dmarc=none; spf=none (imf01.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729714115; a=rsa-sha256; cv=none; b=xOr3WFeXk1CgIeuzOgHInu4IZ3Tv3Vi/0Ja7fiXl7RnFM1EN+5yyBkdDG3GRaAuR/zvQJg UMoOeqvSZuFkqsgDzlmmxUZK9KqC2KhgHtKn2NUHEwbEYnJ9BDtjulwNk6/7Y/EVc4Gg8Y e6Bc3R/gJysUSyEoldRMnAqaTPhRRis= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IegbwoeX; dmarc=none; spf=none (imf01.hostedemail.com: domain of peterz@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=peterz@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729714115; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k0kAR5oFjFmERcfAhxKq77ZBkdV3j2tk6jq69iAXIpk=; b=qFSmwr9FFBt917sROH1NA1uHdtZAJlDxSfb2TeOaUelq7t4OPmEhZEfjeye1ISJooyGAve BNbEplPLMA7CiI9xTt8gUVmkg4RAI1dk62GUJKQdUIU+MlUK/d1qi+FnHq0j8kXxbGDKmG 54ouwYebmIgSmEz+rpEhAfB0EmalxGE= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=k0kAR5oFjFmERcfAhxKq77ZBkdV3j2tk6jq69iAXIpk=; b=IegbwoeXWLDI9S3bJ1Edwr1HxX DfFu7/Gy9o9gQOxI+Ks2lINBvVQXlt5al9v8StJjEFc+iSal5hFl98T8J6xYCu308FBuDnHpD9Szg ATQLIROpIUwzU2hYvNGxqy+womlZ8R13BfBHphc5+cZtm8YNQh6o6VK75mqudL3XBoMGXK/8JmIFN YGZUDj7fln1t2Sl23QYY3JER5PpJDjO/1Erk7TWSMDcNeMpLUdrrfeK5G/K05RGASlM4M07R/+f3b yoglMgVd6MsUCSKdxsxqtGfE8u0Wb41puKCDs1avWmNHL+7LBRULgnwFwRM1ormRjBfhohfm4eSed w4uWKSQQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98 #2 (Red Hat Linux)) id 1t3hgd-00000003JY4-3N8d; Wed, 23 Oct 2024 20:10:32 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id 756DE30073F; Wed, 23 Oct 2024 22:10:31 +0200 (CEST) Date: Wed, 23 Oct 2024 22:10:31 +0200 From: Peter Zijlstra To: Andrii Nakryiko Cc: linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, surenb@google.com, akpm@linux-foundation.org, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, mhocko@kernel.org, vbabka@suse.cz, shakeel.butt@linux.dev, hannes@cmpxchg.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com Subject: Re: [PATCH v3 tip/perf/core 1/4] mm: introduce mmap_lock_speculation_{start|end} Message-ID: <20241023201031.GF11151@noisy.programming.kicks-ass.net> References: <20241010205644.3831427-1-andrii@kernel.org> <20241010205644.3831427-2-andrii@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241010205644.3831427-2-andrii@kernel.org> X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1CA774001D X-Stat-Signature: 878p5d8ztbhn7h9jn8obh64dxix4o8xn X-Rspam-User: X-HE-Tag: 1729714224-939244 X-HE-Meta: U2FsdGVkX19JYYpFhltxSCVHr6fErljMZJ5C+oR8+PmkQBMImNBE+Xik71HNlkmMyV8hawJKx5yUso/YTUBafjPMiPRFS8ALDkAtV4ohlNn+WlA1JEvpvWp0a51H7atrvda6SbgUbWaGjyIgmxNfbIl5rUTQ3RpLAekK4KQcp2G6Pbi56khbf7AJzONdg8aj8fyE5pLfMcJXKpQEwmPGwiv+duOeAN2k1DHTXsPoYxdZbEGSq3CTwynb80saQcZUfzPG/dBBLWI2hnmHWyiZb2dwbrPKPPUTHKxUzCX356yley5QWjXOF2LDiykPraa7UJJRdtKZtRIp/BTUhtlOo6OJ56fkH76wxKBlH2wHiJh/P01Y3p4AAU0oyE2QkMfhH/2SE6ciPXXL9pM9m9MxHATRdxrEmR7/Cyuhj90VpIWf0WzNpE2N3JOV7n5Pjx01ucKxgaJ7E4T6hZbNeENg8fxEcLQsHFAIbYTxAtLjxUakB/c2rI1lvEaDBYbaSKLndVKv7pqEXk8O1+qkuQ7jQEHBtODjgO3QHDmITZaPTW+T7bgELC1LOx/kHJcqRlC03vASLUg5G5YCun2vMj3u2ZJgZCyKyHHAV/6doQvYygHnQApnLrSar0PIpnc9JMdNo7P3LyGi+7AVJpmO3g7Vm67GUwOKzDVVI01Zk5Gx6506MWrl5ngXxso9ICAyT7dHmNDB+KQVLhd8o4tpOMsRxC3UJCl84B7IixxVqdQ5x+pJZd8zk25Xf0NKIWBZsSjCR53gUJiMBD4wEqW7PPNu7AU7eVjAaffxnGyBIMB6QqiOXSQg4HNkOdkVhz4rHeWTFHKWEWbEBgDCcx4HlXhQrdHmkr4hEvPYIWVapNrDO9093aij1qB6aqSw2AgQLfNkCmoFbG0sfj2WrCfy5UWbqFf75PZ9v41YmZlfrqI5MhOOzcQtrLeRHwntEEQMnRWH9Ofv+6ZKe6lSr8o2SCk 1JaZ/Pfi YWhfkOTc4fqpsV6dZdUW3yn+5b73VkXdLkW4Y9hUjLRrh5RrEZ413UEj4ZY+oIG8b+HcmkLrfnUDlGL95NLsH4voTkyBEnQM45noHjLyGDpPZrreNJyLUs78+kTuSNxvE9Zltdn+bMDtSSg1S4eY++t0M7x+GNIkYyHME63Tx2ZuqqAKqC2gWMyjNvppfSAD0rKJN7r5HP3uhH/qC3y0HXn4NjKEbKLwK+/BGOey0iFLrcpkxd9e4Ak6DdRQIPF446vBDe/rOEYb+GHOtMDjp4tzKv0se9OXVPEzdPqemnK2yze5WtaMhsmLm2Cb/mSpFi35IfXYEf5hxotB7TKCzQg0bpdJNBtrrN/S2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 10, 2024 at 01:56:41PM -0700, Andrii Nakryiko wrote: > From: Suren Baghdasaryan > > Add helper functions to speculatively perform operations without > read-locking mmap_lock, expecting that mmap_lock will not be > write-locked and mm is not modified from under us. > > Suggested-by: Peter Zijlstra > Signed-off-by: Suren Baghdasaryan > Signed-off-by: Andrii Nakryiko > Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google.com > --- > include/linux/mm_types.h | 3 ++ > include/linux/mmap_lock.h | 72 ++++++++++++++++++++++++++++++++------- > kernel/fork.c | 3 -- > 3 files changed, 63 insertions(+), 15 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 6e3bdf8e38bc..5d8cdebd42bc 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -887,6 +887,9 @@ struct mm_struct { > * Roughly speaking, incrementing the sequence number is > * equivalent to releasing locks on VMAs; reading the sequence > * number can be part of taking a read lock on a VMA. > + * Incremented every time mmap_lock is write-locked/unlocked. > + * Initialized to 0, therefore odd values indicate mmap_lock > + * is write-locked and even values that it's released. > * > * Can be modified under write mmap_lock using RELEASE > * semantics. > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h > index de9dc20b01ba..9d23635bc701 100644 > --- a/include/linux/mmap_lock.h > +++ b/include/linux/mmap_lock.h > @@ -71,39 +71,84 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm) > } > > #ifdef CONFIG_PER_VMA_LOCK > +static inline void init_mm_lock_seq(struct mm_struct *mm) > +{ > + mm->mm_lock_seq = 0; > +} > + > /* > - * Drop all currently-held per-VMA locks. > - * This is called from the mmap_lock implementation directly before releasing > - * a write-locked mmap_lock (or downgrading it to read-locked). > - * This should normally NOT be called manually from other places. > - * If you want to call this manually anyway, keep in mind that this will release > - * *all* VMA write locks, including ones from further up the stack. > + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE semantics) > + * or write-unlocked (RELEASE semantics). > */ > -static inline void vma_end_write_all(struct mm_struct *mm) > +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire) > { > mmap_assert_write_locked(mm); > /* > * Nobody can concurrently modify mm->mm_lock_seq due to exclusive > * mmap_lock being held. > - * We need RELEASE semantics here to ensure that preceding stores into > - * the VMA take effect before we unlock it with this store. > - * Pairs with ACQUIRE semantics in vma_start_read(). > */ > - smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); > + > + if (acquire) { > + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); > + /* > + * For ACQUIRE semantics we should ensure no following stores are > + * reordered to appear before the mm->mm_lock_seq modification. > + */ > + smp_wmb(); Strictly speaking this isn't ACQUIRE, nor do we care about ACQUIRE here. This really is about subsequent stores, loads are irrelevant. > + } else { > + /* > + * We need RELEASE semantics here to ensure that preceding stores > + * into the VMA take effect before we unlock it with this store. > + * Pairs with ACQUIRE semantics in vma_start_read(). > + */ Again, not strictly true. We don't care about loads. Using RELEASE here is fine and probably cheaper on a few platforms, but we don't strictly need/care about RELEASE. > + smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); > + } > +} Also, it might be saner to stick closer to the seqcount naming of things and use two different functions for these two different things. /* straight up copy of do_raw_write_seqcount_begin() */ static inline void mm_write_seqlock_begin(struct mm_struct *mm) { kcsan_nestable_atomic_begin(); mm->mm_lock_seq++; smp_wmb(); } /* straigjt up copy of do_raw_write_seqcount_end() */ static inline void mm_write_seqcount_end(struct mm_struct *mm) { smp_wmb(); mm->mm_lock_seq++; kcsan_nestable_atomic_end(); } Or better yet, just use seqcount... > + > +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int *seq) > +{ > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > + *seq = smp_load_acquire(&mm->mm_lock_seq); > + /* Allow speculation if mmap_lock is not write-locked */ > + return (*seq & 1) == 0; > +} > + > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int seq) > +{ > + /* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */ > + smp_rmb(); > + return seq == READ_ONCE(mm->mm_lock_seq); > } Because there's nothing better than well known functions with a randomly different name and interface I suppose... Anyway, all the actual code proposed is not wrong. I'm just a bit annoyed its a random NIH of seqcount.