From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 98E2EEEE270
	for <linux-mm@archiver.kernel.org>; Thu, 12 Sep 2024 22:53:22 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 16E046B0082; Thu, 12 Sep 2024 18:53:22 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1456D6B0083; Thu, 12 Sep 2024 18:53:22 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 00C616B0089; Thu, 12 Sep 2024 18:53:21 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id D2E086B0082
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 18:53:21 -0400 (EDT)
Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 7BCCBA0D5F
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 22:53:21 +0000 (UTC)
X-FDA: 82557589002.01.FE89EBC
Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48])
	by imf09.hostedemail.com (Postfix) with ESMTP id 98F5E140008
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 22:53:19 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=aIR+pz+Y;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf09.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726181571; a=rsa-sha256;
	cv=none;
	b=Jx1kbWVbO5KqM4N2rM6xOUO2MLtbvLxkQFNj9sIIxThxPz9awm7NcSjkJL9BjKBANhKqAT
	M56jtRToATDI7b7MBtTavvfd3/QL7a66K9268YiR8jPaVgNecRvf6q+Xm525vDQ2z0evKZ
	lyHXFUUl29xj+2/TX+r9JskiMsSja2I=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=aIR+pz+Y;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf09.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1726181571;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=kCNOG1CDmyqExdlHYX65JYGn2614r9IBeuPVpY+h7D0=;
	b=2hy3lvTY/2qEqhvWVWJ4Bh77sGmeBrxIPQNMDOxZERl646GEiaDC1ufkKUyiBt0cN8Dxnq
	D37WHUFWCkItPwRCJaMsJgnt/1X7+Rfl83Fb9Iqud7nxN4ExWale4gqQNC3DksAOvfqXM6
	66LKjc9bF4lL7CODoWjZUPzlTZYyCAc=
Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5c24374d8b6so5849a12.1
        for <linux-mm@kvack.org>; Thu, 12 Sep 2024 15:53:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1726181598; x=1726786398; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=kCNOG1CDmyqExdlHYX65JYGn2614r9IBeuPVpY+h7D0=;
        b=aIR+pz+Y5CIoXU6l00w834nD6Vpqyhxy4UsZlwQPxT9PFPDNcBX3doP0LkiSLSX63N
         9y4RyiA4uGOQJ+XsR1rs6FFGjPySps4cZYQvQ5Yf3N108JUSwIiyVCE2RNqhoEON/CjP
         Xrax8Fr8AxJnf92S5ZLBZ80WgT+386G33XeW8qnc50RkHaO9fclXs36dXPn5eaALWdew
         tmWauS52j/oj0qgEnPcm4LCDGVAJCv4zdP/TCrvHHFpv2L2sIZGUjbF3eZROH5/50SpX
         zDthgGYpU9i4JKKCBfPvB6diIuTVtbbcGAygqhWzXT4HF8LM8eZWYBfty9HY52YRsmwS
         P9kg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1726181598; x=1726786398;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=kCNOG1CDmyqExdlHYX65JYGn2614r9IBeuPVpY+h7D0=;
        b=pGYsNkQfo0jaiewjWerRQQl+6cLkEycRnyYJQdCCloy2xsIPNNWqzCqYoZlOjzWVdq
         9GRW2OOR49IcvY7tPCa1cQHHdzF1guImZvm5taqEWQl+tnVkGg4t8bdro7okJt03dboP
         dujV/3zn9SW+zyG7dFbEfV4Xs6gtfoHLRejx2vMtDUWpDc2byImeFZfdfPkRcYII1Y0p
         xjqH7y5/TMGdwiZ2CoSYid+IIlGhHva0qvcoPl6RBIH3WJauiVScgLvD0D76kSDaPxRU
         MeIQHc8rP7b1H8ZMW8sP+SF/1xaIu8Y4+zCd6YBS0dsOVd8MbkUQ/haeFGfOW3YItkDu
         vvWw==
X-Forwarded-Encrypted: i=1; AJvYcCVzzgbMaFh5COuy758vwYkEJbNN6IGpvNxj2G2zJNZS/+nj8T/BtMgspopvPqt/0uovSzxCq2Ay9g==@kvack.org
X-Gm-Message-State: AOJu0YyoD4QmcLfGSfVSuztZ7QPdU40g4/2vzcbdTg7/h3DJSbDckuz1
	fX1ezXTP96E9SZ3OG+5+kHxmiGZfLcdYRjpZQpqRvf9DrUkC+jpvBdCqE60Fn9hTkft0eGL4imi
	qaKjCptxBr1kRIR6m7BAC2fRB2+MMrDf4hOZs
X-Google-Smtp-Source: AGHT+IHkhZDhmWWPB1cRiDxB9O+hnujHgxHATcCOa7iU1U0cCGKQa4vDx7adryVU89//eg9XTwPdcw004ewXy/PqEHA=
X-Received: by 2002:a05:6402:27c7:b0:5c2:5641:af79 with SMTP id
 4fb4d7f45d1cf-5c414384e17mr546049a12.0.1726181597390; Thu, 12 Sep 2024
 15:53:17 -0700 (PDT)
MIME-Version: 1.0
References: <CAJuCfpFFqqUWYOob_WYG_aY=PurnKvZjxznnx7V0=ESbNzHr_w@mail.gmail.com>
 <20240912210222.186542-1-surenb@google.com>
In-Reply-To: <20240912210222.186542-1-surenb@google.com>
From: Jann Horn <jannh@google.com>
Date: Fri, 13 Sep 2024 00:52:39 +0200
Message-ID: <CAG48ez131NJWvo_RrxL7Ss0p4jd_aKOu71z1vm9wfaH7Qjn+qw@mail.gmail.com>
Subject: Re: [PATCH v2 1/1] mm: introduce mmap_lock_speculation_{start|end}
To: Suren Baghdasaryan <surenb@google.com>
Cc: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com, 
	rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, 
	linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, 
	willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org, 
	mjguzik@gmail.com, brauner@kernel.org, andrii@kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Queue-Id: 98F5E140008
X-Rspamd-Server: rspam01
X-Stat-Signature: szh489z4yxcaj8jpqj9bfhktbygdk4c4
X-HE-Tag: 1726181599-854822
X-HE-Meta: U2FsdGVkX1+7/FdBHF2NIBF5hrZRuAU2Lthvrm36C+CUkrxn4lgTxGDX3CjhKfHqmlwIB7I4iWKKgXgxXs+2HbqrVKm+yey+/Acz+OI00/wsjWrML7pvsqwAPTPmIQ6VGg0hossthXQLXdPvherRbRaEfr/ZwUlTUhD5Il4UpEYcRgtQPVvBXwXbu9NTevuY4GkdCDYzV7trK2+6Vrb/BJuqbkmwrpe+FcKxKQ9Gw7H6wW8/x6QgOMHk7NBfMK+BZETNS1QUbN16qchK82L7LR0/F29gdDNSnIOkZvx9My8WK0YWmx7/cjMe4IvUmbMqHBO5TfJAxIVQPZ1V7JdX411xuZJe41d6HKl5+zcqYowd7suoR0k05doTHWM5pgiCts8WtPIBsJKlatdRGzhZu+T7lHMlc/KqgRQle/KiAQLaEQwMdfUGKehYD/k8LIFFDqIqUTlHTchkLlUiN5CHJiJlDQ2jM/ADdonINzUwkt+IxhlNr4uFBxzmu/VwBYxp84M0OxXhxx12I3498PkioqutojIfE/2eu+TqSvxjZYdfnlA4Qg5vf4N+BiXK8x37nIqQV+URS7iNBfvhbJzGJiMEyO1vkIrc6TheB4mntUCUt1Sf3N+uzHh4jMaCfj78o2jpINcX90XP1Wqoj590lW0BUAsAE2FEl7asDnX242yO1jpsM5pmAgKarDBQTWDe/4X6pE6vIBo+NBXP2mr7+loVLzj3d77Wkq3eP20lABxJ1b2/B9Tzh89S0TeG76NTMX7cgf2HPJ5ZeetJAv5RmWHueFTkCqJ5vkTheWsgPRXGVk5I/RdBcAcjzD/kSSy55BORcESaZMWuXXbTT2Jwds7wtwQRNt+RMuXaPzvqui+g/CInivZRMgplZ9VkwJ7BiwElrX7bBCw/isb9T2pLqRi4zdAAx8/4RHvKlgCgXwhGpE+g+lZs1iHG7BfdaLoSt9yw1fcxyI9VdlTU7sM
 R9mu+e9Y
 LKf7NUZI76hiKUKbzV9nDfQMt3etb+MbwYzpjINVIbhRFbSUfKL4mcb4p+0294DU9wIMIdqSE5BZNk/JQuBLMRWJpySAWW4jXVZBqYASIn84MoXG6TL9SGjDn5P/SkcD5bQvkWKODkVqKnAnyTEYEDpfdRWpsjRtaBixq62qH4vjDhfQNF8XnNeVstDTcL/dMdr5DPzGKvPD1fc8ydwuBYP7RcvmEFcphOHrcibMhoLtCgR/XMyneQAepVgbrx8kANqHM+5hzVmGx9CA=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, Sep 12, 2024 at 11:02=E2=80=AFPM Suren Baghdasaryan <surenb@google.=
com> wrote:
> Add helper functions to speculatively perform operations without
> read-locking mmap_lock, expecting that mmap_lock will not be
> write-locked and mm is not modified from under us.

I think this is okay now, except for some comments that should be
fixed up. (Plus my gripe about the sequence count being 32-bit.)

> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6e3bdf8e38bc..5d8cdebd42bc 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -887,6 +887,9 @@ struct mm_struct {
>                  * Roughly speaking, incrementing the sequence number is
>                  * equivalent to releasing locks on VMAs; reading the seq=
uence
>                  * number can be part of taking a read lock on a VMA.
> +                * Incremented every time mmap_lock is write-locked/unloc=
ked.
> +                * Initialized to 0, therefore odd values indicate mmap_l=
ock
> +                * is write-locked and even values that it's released.

FWIW, I would still feel happier if this was a 64-bit number, though I
guess at least with uprobes the attack surface is not that large even
if you can wrap that counter... 2^31 counter increments are not all
that much, especially if someone introduces a kernel path in the
future that lets you repeatedly take the mmap_lock for writing within
a single syscall without doing much work, or maybe on some machine
where syscalls are really fast. I really don't like hinging memory
safety on how fast or slow some piece of code can run, unless we can
make strong arguments about it based on how many memory writes a CPU
core is capable of doing per second or stuff like that.

> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index de9dc20b01ba..a281519d0c12 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -71,39 +71,86 @@ static inline void mmap_assert_write_locked(const str=
uct mm_struct *mm)
>  }
>
>  #ifdef CONFIG_PER_VMA_LOCK
> +static inline void init_mm_lock_seq(struct mm_struct *mm)
> +{
> +       mm->mm_lock_seq =3D 0;
> +}
> +
>  /*
> - * Drop all currently-held per-VMA locks.
> - * This is called from the mmap_lock implementation directly before rele=
asing
> - * a write-locked mmap_lock (or downgrading it to read-locked).
> - * This should normally NOT be called manually from other places.
> - * If you want to call this manually anyway, keep in mind that this will=
 release
> - * *all* VMA write locks, including ones from further up the stack.
> + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE sem=
antics)
> + * or write-unlocked (RELEASE semantics).
>   */
> -static inline void vma_end_write_all(struct mm_struct *mm)
> +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire)
>  {
>         mmap_assert_write_locked(mm);

Not a memory barriers thing, but maybe you could throw in some kind of
VM_WARN_ON() in the branches below that checks that the sequence
number is odd/even as expected, just to make extra sure...

>         /*
>          * Nobody can concurrently modify mm->mm_lock_seq due to exclusiv=
e
>          * mmap_lock being held.
> -        * We need RELEASE semantics here to ensure that preceding stores=
 into
> -        * the VMA take effect before we unlock it with this store.
> -        * Pairs with ACQUIRE semantics in vma_start_read().
>          */
> -       smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> +
> +       if (acquire) {
> +               WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1);
> +               /*
> +                * For ACQUIRE semantics we should ensure no following st=
ores are
> +                * reordered to appear before the mm->mm_lock_seq modific=
ation.
> +                */
> +               smp_wmb();

This is not really a full ACQUIRE; smp_wmb() only orders *stores*, not
loads, while a real ACQUIRE also prevents reads from being reordered
up above the atomic access. Please reword the comment to make it clear
that we don't have a full ACQUIRE here.

We can still have subsequent loads reordered up before the
mm->mm_lock_seq increment. But I guess that's probably fine as long as
nobody does anything exceedingly weird that involves lockless users
*writing* data that we have to read consistently, which wouldn't
really make sense...

So yeah, I guess this is probably fine, and it matches what
do_raw_write_seqcount_begin() is doing.

> +       } else {
> +               /*
> +                * We need RELEASE semantics here to ensure that precedin=
g stores
> +                * into the VMA take effect before we unlock it with this=
 store.
> +                * Pairs with ACQUIRE semantics in vma_start_read().
> +                */
> +               smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
> +       }
> +}
> +
> +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int=
 *seq)
> +{
> +       /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */
> +       *seq =3D smp_load_acquire(&mm->mm_lock_seq);
> +       /* Allow speculation if mmap_lock is not write-locked */
> +       return (*seq & 1) =3D=3D 0;
> +}
> +
> +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int s=
eq)
> +{
> +       /* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */

(see above, it's not actually a full ACQUIRE)

> +       smp_rmb();
> +       return seq =3D=3D READ_ONCE(mm->mm_lock_seq);
>  }
> +
>  #else
> -static inline void vma_end_write_all(struct mm_struct *mm) {}
> +static inline void init_mm_lock_seq(struct mm_struct *mm) {}
> +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire) {=
}
> +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, int=
 *seq) { return false; }
> +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int s=
eq) { return false; }
>  #endif
>
> +/*
> + * Drop all currently-held per-VMA locks.
> + * This is called from the mmap_lock implementation directly before rele=
asing
> + * a write-locked mmap_lock (or downgrading it to read-locked).
> + * This should normally NOT be called manually from other places.
> + * If you want to call this manually anyway, keep in mind that this will=
 release
> + * *all* VMA write locks, including ones from further up the stack.

Outdated comment - now you are absolutely not allowed to call
vma_end_write_all() manually anymore, it would mess up the odd/even
state of the counter.

> + */
> +static inline void vma_end_write_all(struct mm_struct *mm)
> +{
> +       inc_mm_lock_seq(mm, false);
> +}