From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D141AEEE26C for ; Thu, 12 Sep 2024 22:20:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 637736B0082; Thu, 12 Sep 2024 18:20:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5E65F6B0083; Thu, 12 Sep 2024 18:20:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 486906B0089; Thu, 12 Sep 2024 18:20:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 285066B0082 for ; Thu, 12 Sep 2024 18:20:10 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AFEC81C1464 for ; Thu, 12 Sep 2024 22:20:09 +0000 (UTC) X-FDA: 82557505338.22.52A281D Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) by imf10.hostedemail.com (Postfix) with ESMTP id D3746C0003 for ; Thu, 12 Sep 2024 22:20:07 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Khtr841+; spf=pass (imf10.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726179555; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SagmcUfMRj7CswGgvyMGzL+KR4HzY8oA3rfSHnW0rGw=; b=iOKfFqZqXgiK7vqBjoJZ+6LMv65/7J27y2aRi4IzX1LJWQ8w9hwwgrxeHMP53FuV/noEw7 QErDCY/1PGXIAMNeRA/nvvkhMsq0570SodcM9MykocbxDx5wWGhpHsl0DtFk9qjZbKhCIo BOU/ox73pq2wc0DghwXu1O4FD8acQIc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Khtr841+; spf=pass (imf10.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.215.172 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726179555; a=rsa-sha256; cv=none; b=Y8/ATVrhZlpg9Vg3u8jfL6jq26oGpYViPTTj5jbpByqFRUYVSqX30Sr+Rl4/C2HdzuxFBy GC1q6B4bC+Ep3kTSRJNR6jVJnA9ZV42HIcZisWYTPp9R4JG40sq3S1/NbPtqBP/2IDq2GV vtMFCyLsfgO+fJknSF3T3q4QNEVt37w= Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-7db238d07b3so1094100a12.2 for ; Thu, 12 Sep 2024 15:20:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726179606; x=1726784406; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SagmcUfMRj7CswGgvyMGzL+KR4HzY8oA3rfSHnW0rGw=; b=Khtr841+hYS4lkSPM4DVhvvkbW5vkHdZdLN9T8+048IXnyrfJGdWS0ICBH1/3XAdI7 80a4J6sXnPKTVyzEN5B5h0/DddG5h9AuZyoCAtTP2/jW94SuOzdlYwTwMmeIdZUaBw/L 58HasosYcLNQS9ha1ep58WgJWv9eyESZPJv1ZIXUYE/ffe64SK0RJkltV3iXrzXbtnud 1+T/m3TA3ONG2EpDxJeja+nEg9E5L4K8y/ucAxZZbq6xojCUOEs2FxDhmoRLw6o6hopN 0+8Hy9vCMNe7vLgC3MZn05RnpsmvxXeH6gLJHT74skAkPKNT9+CL9eolo0KKfgON6OLc 53pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726179606; x=1726784406; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SagmcUfMRj7CswGgvyMGzL+KR4HzY8oA3rfSHnW0rGw=; b=HOMOwa3JARHPdQyc3J21UjQE049qBP/53jfuoLwdPAwPhE5TgiWospTQ3tolFbAAxp ons/7Xn81tksHwrn7Hbed9uLa0d9pnHEyh2dZT0xuAKY2w9zj7HiPP/sGdQFPFxv98QR ilfTzwusr6vxQyG+yPIAFCoidwwtAvrwOHFOvF5FNtKGs8bDG8s09Ux3Vx20ReUV6zKc QQRGxE0sqbf6wz91c10Vl4/PDv7xZYoGiy32cORcVT2fcc4lTDlYpgDBzUJXNXdvr/V0 MR9LT08vJlarZvpv7xYh9Lfh3ioiWBw2YPrqy7laHhQQQHlrP1X/vBUc30ga7WBwJFb8 hKwA== X-Forwarded-Encrypted: i=1; AJvYcCWXiPEcN4hMcmRnpTrFiuU6thWnH/r9yQ7otKW9vq3FKxRAhhf9A+bMMbFTSwVX4mqXQEGe6nEoTA==@kvack.org X-Gm-Message-State: AOJu0Yz8O5690DRUU0saptEGISKF08liL+MdnRDcKUnHYXxuFNHC1aCU qhwhW2/0UGJQHu6QlwsGI9wHJVXg3sV9dqvddm/7EbvcGFucBB7VE+19ZdyacjLQ6Hw3yUeVE2H 60SYr+J+U+ke9rSGDZRt7NCrkcmz/sQ== X-Google-Smtp-Source: AGHT+IGfqDytp8ykySDhYO5cmlto0zg26xLHQ2nm/xbuMBqLsy5H3krZS+pfz6TUXijAno29TpeKY7wnK6r3A8gBe+4= X-Received: by 2002:a17:90a:ab0b:b0:2d8:97b7:449f with SMTP id 98e67ed59e1d1-2dba0065e20mr5259536a91.38.1726179606298; Thu, 12 Sep 2024 15:20:06 -0700 (PDT) MIME-Version: 1.0 References: <20240912210222.186542-1-surenb@google.com> In-Reply-To: From: Andrii Nakryiko Date: Thu, 12 Sep 2024 15:19:54 -0700 Message-ID: Subject: Re: [PATCH v2 1/1] mm: introduce mmap_lock_speculation_{start|end} To: Suren Baghdasaryan Cc: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, andrii@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: D3746C0003 X-Stat-Signature: zsdrcb3z4rtua86h6gtimag7e3f67hob X-HE-Tag: 1726179607-359031 X-HE-Meta: U2FsdGVkX19MEFkJU8z5/HWBIrjHcQVoUw6RvCB5znWtgMQmJPz8i1viY7ipeCqxRDMPuItG2SiHE/9M05Py1jUUJ3bCY/jkMkVot4RyFotwG+UuYuhjbAvr8Rwx0FW6tstUw08mF0167CBWyHJ9pvpBzaxM12Ota/mwIROJIueMfslHV561Wmk5wlVsbVmxczKXQGHzevMKmwB7bG3sY7fKLipTP5XRHGS6HnCRvtX//91F2v23Xx20x9lrJuE7QOSx7KaO6HQ4ZI11esvYROdGOhq5cPS5Iblupz32DLELamlljjQreLxAYO9YvUrEmmK1RpjxT/j6kq9CPxgf3O8R6zXGSQetXVPBAlyqP2EeeKI1g2jkcLgAyfPrEa7oZ+Y9A1YoEiAqfFbmnbGXuNuEor4hUZiWaj5OawDDW8Z8eR0wM/QTklOpIIKkSNQCHgmaUeDMaK3cf+xmNt9o4rA+ndoC+QQEkpdKf6JLchxBFaNbXBxWkEECffoa7vHnqa6vbDWUCjnulBynGALgC+ir9s5XlzKcQjNtIEj9gzNqZtzjBdtvi1rkhbF8I4sF2ZSQGAyPqN5qvzoWPRZGLHU4mUNea+GvOQx7mL2eoFIz6KEswE7IfQO0672M1y0psJiGItjTfWF+dG68DcIZb2Aao2QSVxO6rA/tLN0y9hrEz4kEb6Trmd3JYcw14hxzIynYM+WTD8iHnZ+RW1bDWFjd2uHaUSxSWScXJBBfqx2ojG7TDFNXD3imOPQhOGtA1A6nHDsaL++2GQkh/QPBAAClC1MQ1mt060x+I2X5fNVQjW3dCcUfjz4DsgZm+AAO3A7q+qQhovTjfFkUioGWxku2pVyWqOrb8NGssUs4eqD0B/qQjW+ABSf424sh4oE+YB5Crw7EIqClv/6Z5OfOR/SGsM78O0elihXLVCz8Y+Y6qCuHGLbDCW7qe7p3SDF933J4SSbpvUc7l2867gE MmjeNiW/ gQpjSSSHmbut1FWJPcDSF2Tl3LBeCMcDPgIzkVFE3yXVmo/cXEriAi0xBR9tSLG6G+abnNxqisyFLXjfWCvl8W+hgLimFUla+jLw+8haoMuhrxViXtfelcEckdI7d9fTlYqdZT7LV7ZJbwdsukuMQeFToQ+zMRPC5N7aRy8UnRk4HgMvwjIX2zTq/qIrQQyvZVMmxMMgAKikjGGgtOHJJAIoiUnwGnF6Q95MhDnpjPmGMU6B7XE7aKH9X/u6hffTLebPXEuhJ4VX6YzuaHMjVtCOJaM4oZAy28yBdIr2nADNnTcDR4r0jmgigrv9KI5IDlx0ezJeRQvBTwhX+CBQBqqp3lKOfhkCL5aGh5S1H0F13jm1HegGchKVrj6qCQfzS1VyJZ8PyiZ0kVCo624RMbx51L4OejY/JjTKLG2/sqteuyBgqwyF5wp9+Pz/x6ARLMYKk0n3MveHc9wtg9VYkInWD3nldhVsCm7yQKLeB1VIJWe63uhxwbUigdIoGgR+bkEJg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 12, 2024 at 2:04=E2=80=AFPM Suren Baghdasaryan wrote: > > On Thu, Sep 12, 2024 at 2:02=E2=80=AFPM Suren Baghdasaryan wrote: > > > > Add helper functions to speculatively perform operations without > > read-locking mmap_lock, expecting that mmap_lock will not be > > write-locked and mm is not modified from under us. > > Here you go. I hope I got the ordering right this time around, but I > would feel much better if Jann reviewed it before it's included in > your next patchset :) Thanks a lot! And yes, I'll give it a bit of time for reviews before sending a new revision. Did you by any chance get any new ideas for possible benchmarks (beyond what I did with will-it-scale)? > Thanks, > Suren. > > > > > Suggested-by: Peter Zijlstra > > Signed-off-by: Suren Baghdasaryan > > Signed-off-by: Andrii Nakryiko > > --- > > Changes since v1 [1]: > > - Made memory barriers in inc_mm_lock_seq and mmap_lock_speculation_end > > more strict, per Jann Horn > > > > [1] https://lore.kernel.org/all/20240906051205.530219-2-andrii@kernel.o= rg/ > > > > include/linux/mm_types.h | 3 ++ > > include/linux/mmap_lock.h | 74 ++++++++++++++++++++++++++++++++------- > > kernel/fork.c | 3 -- > > 3 files changed, 65 insertions(+), 15 deletions(-) > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > index 6e3bdf8e38bc..5d8cdebd42bc 100644 > > --- a/include/linux/mm_types.h > > +++ b/include/linux/mm_types.h > > @@ -887,6 +887,9 @@ struct mm_struct { > > * Roughly speaking, incrementing the sequence number i= s > > * equivalent to releasing locks on VMAs; reading the s= equence > > * number can be part of taking a read lock on a VMA. > > + * Incremented every time mmap_lock is write-locked/unl= ocked. > > + * Initialized to 0, therefore odd values indicate mmap= _lock > > + * is write-locked and even values that it's released. > > * > > * Can be modified under write mmap_lock using RELEASE > > * semantics. > > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h > > index de9dc20b01ba..a281519d0c12 100644 > > --- a/include/linux/mmap_lock.h > > +++ b/include/linux/mmap_lock.h > > @@ -71,39 +71,86 @@ static inline void mmap_assert_write_locked(const s= truct mm_struct *mm) > > } > > > > #ifdef CONFIG_PER_VMA_LOCK > > +static inline void init_mm_lock_seq(struct mm_struct *mm) > > +{ > > + mm->mm_lock_seq =3D 0; > > +} > > + > > /* > > - * Drop all currently-held per-VMA locks. > > - * This is called from the mmap_lock implementation directly before re= leasing > > - * a write-locked mmap_lock (or downgrading it to read-locked). > > - * This should normally NOT be called manually from other places. > > - * If you want to call this manually anyway, keep in mind that this wi= ll release > > - * *all* VMA write locks, including ones from further up the stack. > > + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE s= emantics) > > + * or write-unlocked (RELEASE semantics). > > */ > > -static inline void vma_end_write_all(struct mm_struct *mm) > > +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire) > > { > > mmap_assert_write_locked(mm); > > /* > > * Nobody can concurrently modify mm->mm_lock_seq due to exclus= ive > > * mmap_lock being held. > > - * We need RELEASE semantics here to ensure that preceding stor= es into > > - * the VMA take effect before we unlock it with this store. > > - * Pairs with ACQUIRE semantics in vma_start_read(). > > */ > > - smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); > > + > > + if (acquire) { > > + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); > > + /* > > + * For ACQUIRE semantics we should ensure no following = stores are > > + * reordered to appear before the mm->mm_lock_seq modif= ication. > > + */ > > + smp_wmb(); > > + } else { > > + /* > > + * We need RELEASE semantics here to ensure that preced= ing stores > > + * into the VMA take effect before we unlock it with th= is store. > > + * Pairs with ACQUIRE semantics in vma_start_read(). > > + */ > > + smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1= ); > > + } > > +} > > + > > +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, i= nt *seq) > > +{ > > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > > + *seq =3D smp_load_acquire(&mm->mm_lock_seq); > > + /* Allow speculation if mmap_lock is not write-locked */ > > + return (*seq & 1) =3D=3D 0; > > +} > > + > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int= seq) > > +{ > > + /* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */ > > + smp_rmb(); > > + return seq =3D=3D READ_ONCE(mm->mm_lock_seq); > > } > > + > > #else > > -static inline void vma_end_write_all(struct mm_struct *mm) {} > > +static inline void init_mm_lock_seq(struct mm_struct *mm) {} > > +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquire)= {} > > +static inline bool mmap_lock_speculation_start(struct mm_struct *mm, i= nt *seq) { return false; } > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, int= seq) { return false; } > > #endif > > > > +/* > > + * Drop all currently-held per-VMA locks. > > + * This is called from the mmap_lock implementation directly before re= leasing > > + * a write-locked mmap_lock (or downgrading it to read-locked). > > + * This should normally NOT be called manually from other places. > > + * If you want to call this manually anyway, keep in mind that this wi= ll release > > + * *all* VMA write locks, including ones from further up the stack. > > + */ > > +static inline void vma_end_write_all(struct mm_struct *mm) > > +{ > > + inc_mm_lock_seq(mm, false); > > +} > > + > > static inline void mmap_init_lock(struct mm_struct *mm) > > { > > init_rwsem(&mm->mmap_lock); > > + init_mm_lock_seq(mm); > > } > > > > static inline void mmap_write_lock(struct mm_struct *mm) > > { > > __mmap_lock_trace_start_locking(mm, true); > > down_write(&mm->mmap_lock); > > + inc_mm_lock_seq(mm, true); > > __mmap_lock_trace_acquire_returned(mm, true, true); > > } > > > > @@ -111,6 +158,7 @@ static inline void mmap_write_lock_nested(struct mm= _struct *mm, int subclass) > > { > > __mmap_lock_trace_start_locking(mm, true); > > down_write_nested(&mm->mmap_lock, subclass); > > + inc_mm_lock_seq(mm, true); > > __mmap_lock_trace_acquire_returned(mm, true, true); > > } > > > > @@ -120,6 +168,8 @@ static inline int mmap_write_lock_killable(struct m= m_struct *mm) > > > > __mmap_lock_trace_start_locking(mm, true); > > ret =3D down_write_killable(&mm->mmap_lock); > > + if (!ret) > > + inc_mm_lock_seq(mm, true); > > __mmap_lock_trace_acquire_returned(mm, true, ret =3D=3D 0); > > return ret; > > } > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 61070248a7d3..c86e87ed172b 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -1259,9 +1259,6 @@ static struct mm_struct *mm_init(struct mm_struct= *mm, struct task_struct *p, > > seqcount_init(&mm->write_protect_seq); > > mmap_init_lock(mm); > > INIT_LIST_HEAD(&mm->mmlist); > > -#ifdef CONFIG_PER_VMA_LOCK > > - mm->mm_lock_seq =3D 0; > > -#endif > > mm_pgtables_bytes_init(mm); > > mm->map_count =3D 0; > > mm->locked_vm =3D 0; > > > > base-commit: 015bdfcb183759674ba1bd732c3393014e35708b > > -- > > 2.46.0.662.g92d0881bb0-goog > >