From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 285A7D0BB42 for ; Wed, 23 Oct 2024 22:17:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE1A36B009B; Wed, 23 Oct 2024 18:17:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A91256B009D; Wed, 23 Oct 2024 18:17:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9591E6B009E; Wed, 23 Oct 2024 18:17:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 725536B009B for ; Wed, 23 Oct 2024 18:17:20 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E09FB1A0A7E for ; Wed, 23 Oct 2024 22:16:47 +0000 (UTC) X-FDA: 82706278242.19.7E55A94 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf04.hostedemail.com (Postfix) with ESMTP id 59E444000F for ; Wed, 23 Oct 2024 22:16:55 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ovj2TUJC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of surenb@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729721725; a=rsa-sha256; cv=none; b=RvzmLdqDRBi8dNg7OZ1HMMTU9DdSgOudQKtXo2Mp6qwfKD2QwbIsy8XqXSHEOJCJc+t7EP MPZkBoThilyoshs+ivVzRl/WrAD5WdWDQkYyPpicjWdzUM0Bbe75EwiqpCHgJ8I7oc0BdX 9u+gt/pXnyeICDtPWAqh/87O4mtWEeo= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ovj2TUJC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of surenb@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729721725; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AYnah6UupQZI0qWCXiKPjf/9J0hvWERnQJhCh6/oVVA=; b=XE+xei0B6G/6ewT2GuIQfpZ4lSKuVQgEtVYWGudo+CDlCstH7qesRwiB6HIu/PJvt//nwL fYKSp9bG9sIm/KjhpjllJC7CpxpbjLKG3Mbylm70p6kgXXULycqpwHQPibWO+iG0elfQLo ZE8m5z3b+l9/bxJocwDFLBkSKeM8Ebs= Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-4315ee633dcso28905e9.1 for ; Wed, 23 Oct 2024 15:17:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729721836; x=1730326636; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AYnah6UupQZI0qWCXiKPjf/9J0hvWERnQJhCh6/oVVA=; b=Ovj2TUJCIfi9kpeTxyQHh6QP0xkNtJqKCILAHV0dnxMRnVFTSHkxEC4wnadYxsZk64 DFNpD5AsT+oz0mThAMfQgY6wK6kcQ0Z4i4+JIWIdXVsudyVPh+P+Zfi4YJa2cLzH2iVG f0+ZCS+9kthS6IBM91OUIwztMdeEdOkbWXoSIoDlzrx2Wu3DFyfz577tg0d91w5VR4v4 /Lk0D4bJbeC5oQsJTVtyqnep34jmG2zKrjCbmZAu/Ew0JJzqZhBiSKbAVBHu9LzjFfzO jNgSp0DSyphYR0SmBRaofVFoPnzTg6ZJWhML4zbIxO7Hw0D1Ag4CP+VsBsgVyEvNbf8d cT7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729721836; x=1730326636; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AYnah6UupQZI0qWCXiKPjf/9J0hvWERnQJhCh6/oVVA=; b=nNshd9hIrAVVnkZfc2OW0P2db/VH8kXt9Esk75GbScUETc8oIWYZ5YRLU90am4Xpzs 6ZAYz7NqmtFk6qvSuygkubkxJE/ZaDV6LBSOFnvtp4KvIU7v0cmowh8fc5YcYeEVKUjZ vY12TRLubbn9UImjvsWDSY4vnhX2IJtL/virmWHtMhtRf4SYcDxSG3OOOoI67eaRRj/r 5YeEbtp3ydL3X0EpjFsQq79afuyfeBS5SKKI98rTPHiCMF3bPOpASom0jev4nqdugovo bjbOuvFCIpY1o9TuN9GOLHMOaLCrMwA4ncHLJJFru3FP0ZGfQe6EoLhjYBLVKHQZvZaN PeJQ== X-Forwarded-Encrypted: i=1; AJvYcCWBPZJ2ZxQhYXVNyYaBdFt3J47pZ0tc6uMJUZ0nGcRVCnnSECLRgae/9pC4OPwaVPAlX9oL9hpDmg==@kvack.org X-Gm-Message-State: AOJu0YyYUqUKRjy4oK1EJJ/o+2UShUJjxce+vT69R/avlmo0EoJ7hDWe wHzkEXEBDV7G/dsgi6e4HJnBoH6yddqzeJA6xrZ2i4rFUJqQAnfu4OOZeK2kbm3OkGGAlFLVQhv McFlstad1g+CqUvS99vVrOxr+f7EKMz7FJPmn X-Google-Smtp-Source: AGHT+IG5NEdVkdVtPRBWmJxwh6d5994BuIu+YOC4gDRr/xGcvo3Gj3xpwSvpVaHjGoNST0roRaIqGgsj1KgLmsaWduk= X-Received: by 2002:a05:600c:5023:b0:42c:b0b0:513a with SMTP id 5b1f17b1804b1-4318a50525emr1664815e9.2.1729721836377; Wed, 23 Oct 2024 15:17:16 -0700 (PDT) MIME-Version: 1.0 References: <20241010205644.3831427-1-andrii@kernel.org> <20241010205644.3831427-2-andrii@kernel.org> <20241023201031.GF11151@noisy.programming.kicks-ass.net> In-Reply-To: <20241023201031.GF11151@noisy.programming.kicks-ass.net> From: Suren Baghdasaryan Date: Wed, 23 Oct 2024 15:17:01 -0700 Message-ID: Subject: Re: [PATCH v3 tip/perf/core 1/4] mm: introduce mmap_lock_speculation_{start|end} To: Peter Zijlstra Cc: Andrii Nakryiko , linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, akpm@linux-foundation.org, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, mhocko@kernel.org, vbabka@suse.cz, shakeel.butt@linux.dev, hannes@cmpxchg.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 59E444000F X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 5hcpy8jzuaosxfm16srquinhe4wcgmgw X-HE-Tag: 1729721815-122458 X-HE-Meta: U2FsdGVkX1/MF/GNlqEMU5ZjrhUtwy+TNZiJnD4xdowSEfZNV65VZvhpD8NU56W1hHVEu8uowMUz0cybVfQrsuehZZeXS4I5iMGtYxxn7oM9KjMT1DSNY8eiFrl8L3Tsq7rDuG+fDJYGO6ip6SnFloh78hV6/S/a/DzHRkFbnmogQ6dJ0bo26cg8tPmUrBBEjqky8iQY2Cmvn2sAhpTdXZHjqInxKMIPkbHXZSdferG/mCfKkui2iZIwKqwFSZJq9geV9aL161iDVkW52NcoHPO2cEf7qCpCbX7tHJem/eNSv2vdfUZlJETzmUzcL0X6kca3Vg+bH0IwDvTxY2RIiNQulI+lLj2f57dcw85PohSpIY+/JSsPSeCK8nCBEBAV+a4bw2rjiHLhilLG9LLSAA5LqCK9E3KCTf4Ooc74eBeXF2JfE4Oc/1Pvql3+/r0aNnOnVhx++N//68jXCozYupyxbHW3NUjLPfw/XkaXLAMRQi0lZQhBDe9yxjfkC/gW6Vj1e5i6gPS6WOZdue1pDzRe0kJvD/cbie72CQoDpr6rAiIzLEPJXVPsusU6zadeczsJt7+K+/CQsbp7C+WheHJSYEhwI/FhRZqAB5pY45L8GWQiJApbxBmF0zYJvOVGOnrCl/Ca1HSSUP2Iu9a+gUUPKsalHIb+cS6p7YTfuRNy6MGMG22+fC9NqOlVw3jRWz7UBLAe+iK1jwuxXhBTTPCGUCVDlgi3rIOpHFN02QTysXzmFl1LV4zpY2FedV56Y5X7mFmaAEZFlCOB8p2HaZ9vLNyCu9++t0gELfP9QCy/98rCpyOYNZ7NLPsQHdw0mHxulevQw0PWe5HEatW9taAZ7XJ6YtHj6e9/BOuXkEM8NbHlbCTpc4z6JJ5rXLBsvIMMNUPv0UkzK892hyP/Wyky+ONDEmAcy3Ww7RjuzbeSRiFzmMA76kaXgAMU91lyhzk8JtaqoIL3YfDU3U+ FuXI98Ph 3pLT7AcHQAx+K+3J9gt2zmOSg5cGb1OvTvxCpREIsP6l5nB1ycjKMfBWPM7PArTZUtu2CO5NqXDKJ95ToUDDfZGzYWawAnUX56pv00MP4HqxEp3GI2KoBKGuWpb7ZNUs6xdsY4fIOdFnXRPolluCQ7GRXWMd2838ohq4qVsEt2NzQ9mzDzEOACxbeKZ0sVMFVnTHtShvrJtXIwnHTf/en40Ko7DrbbxN+l/jeWDujJPpr6OUa0iaud56vumgxONspeNQ2CAGw/CpWbudBj5gRIVyYkyxDtB1RsxMMIabpdqOJHWVSOMz6shHbfYB3KMlpasDoHifw1iLqPGpZK9Y9OuynblLvMcveNs1L X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 23, 2024 at 1:10=E2=80=AFPM Peter Zijlstra wrote: > > On Thu, Oct 10, 2024 at 01:56:41PM -0700, Andrii Nakryiko wrote: > > From: Suren Baghdasaryan > > > > Add helper functions to speculatively perform operations without > > read-locking mmap_lock, expecting that mmap_lock will not be > > write-locked and mm is not modified from under us. > > > > Suggested-by: Peter Zijlstra > > Signed-off-by: Suren Baghdasaryan > > Signed-off-by: Andrii Nakryiko > > Link: https://lore.kernel.org/bpf/20240912210222.186542-1-surenb@google= .com > > --- > > include/linux/mm_types.h | 3 ++ > > include/linux/mmap_lock.h | 72 ++++++++++++++++++++++++++++++++------- > > kernel/fork.c | 3 -- > > 3 files changed, 63 insertions(+), 15 deletions(-) > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index 6e3bdf8e38bc..5d8cdebd42bc 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -887,6 +887,9 @@ struct mm_struct { > > * Roughly speaking, incrementing the sequence number is > > * equivalent to releasing locks on VMAs; reading the seq= uence > > * number can be part of taking a read lock on a VMA. > > + * Incremented every time mmap_lock is write-locked/unloc= ked. > > + * Initialized to 0, therefore odd values indicate mmap_l= ock > > + * is write-locked and even values that it's released. > > * > > * Can be modified under write mmap_lock using RELEASE > > * semantics. > > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h > > index de9dc20b01ba..9d23635bc701 100644 > > --- a/include/linux/mmap_lock.h > > +++ b/include/linux/mmap_lock.h > > @@ -71,39 +71,84 @@ static inline void mmap_assert_write_locked(const s= truct mm_struct *mm) > > } > > > > #ifdef CONFIG_PER_VMA_LOCK > > +static inline void init_mm_lock_seq(struct mm_struct *mm) > > +{ > > + mm->mm_lock_seq =3D 0; > > +} > > + > > /* > > - * Drop all currently-held per-VMA locks. > > - * This is called from the mmap_lock implementation directly before re= leasing > > - * a write-locked mmap_lock (or downgrading it to read-locked). > > - * This should normally NOT be called manually from other places. > > - * If you want to call this manually anyway, keep in mind that this wi= ll release > > - * *all* VMA write locks, including ones from further up the stack. > > + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE s= emantics) > > + * or write-unlocked (RELEASE semantics). > > */ > > -static inline void vma_end_write_all(struct mm_struct *mm) > > +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire) > > { > > mmap_assert_write_locked(mm); > > /* > > * Nobody can concurrently modify mm->mm_lock_seq due to exclusiv= e > > * mmap_lock being held. > > - * We need RELEASE semantics here to ensure that preceding stores= into > > - * the VMA take effect before we unlock it with this store. > > - * Pairs with ACQUIRE semantics in vma_start_read(). > > */ > > - smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); > > + > > + if (acquire) { > > + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); > > + /* > > + * For ACQUIRE semantics we should ensure no following st= ores are > > + * reordered to appear before the mm->mm_lock_seq modific= ation. > > + */ > > + smp_wmb(); > > Strictly speaking this isn't ACQUIRE, nor do we care about ACQUIRE here. > This really is about subsequent stores, loads are irrelevant. > > > + } else { > > + /* > > + * We need RELEASE semantics here to ensure that precedin= g stores > > + * into the VMA take effect before we unlock it with this= store. > > + * Pairs with ACQUIRE semantics in vma_start_read(). > > + */ > > Again, not strictly true. We don't care about loads. Using RELEASE here > is fine and probably cheaper on a few platforms, but we don't strictly > need/care about RELEASE. > > > + smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); > > + } > > +} > > Also, it might be saner to stick closer to the seqcount naming of > things and use two different functions for these two different things. > > /* straight up copy of do_raw_write_seqcount_begin() */ > static inline void mm_write_seqlock_begin(struct mm_struct *mm) > { > kcsan_nestable_atomic_begin(); > mm->mm_lock_seq++; > smp_wmb(); > } > > /* straigjt up copy of do_raw_write_seqcount_end() */ > static inline void mm_write_seqcount_end(struct mm_struct *mm) > { > smp_wmb(); > mm->mm_lock_seq++; > kcsan_nestable_atomic_end(); > } > > Or better yet, just use seqcount... Yeah, with these changes it does look a lot like seqcount now... I can take another stab at rewriting this using seqcount_t but one issue that Jann was concerned about is the counter being int vs long. seqcount_t uses unsigned, so I'm not sure how to address that if I were to use seqcount_t. Any suggestions how to address that before I move forward with a rewrite? > > > + > > +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, i= nt *seq) > > +{ > > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > > + *seq =3D smp_load_acquire(&mm->mm_lock_seq); > > + /* Allow speculation if mmap_lock is not write-locked */ > > + return (*seq & 1) =3D=3D 0; > > +} > > + > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int= seq) > > +{ > > + /* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */ > > + smp_rmb(); > > + return seq =3D=3D READ_ONCE(mm->mm_lock_seq); > > } > > Because there's nothing better than well known functions with a randomly > different name and interface I suppose... > > > Anyway, all the actual code proposed is not wrong. I'm just a bit > annoyed its a random NIH of seqcount. Ack. Let's decide what we do about u32 vs u64 issue and I'll rewrite this.