From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31BBEEEE26E for ; Thu, 12 Sep 2024 22:25:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 661596B0089; Thu, 12 Sep 2024 18:25:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 611AB6B008A; Thu, 12 Sep 2024 18:25:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FFA06B008C; Thu, 12 Sep 2024 18:25:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 323B46B0089 for ; Thu, 12 Sep 2024 18:25:07 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C1B99160B03 for ; Thu, 12 Sep 2024 22:25:06 +0000 (UTC) X-FDA: 82557517812.10.488F8EC Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf04.hostedemail.com (Postfix) with ESMTP id E54FD4000B for ; Thu, 12 Sep 2024 22:25:04 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c5kotxV+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of surenb@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726179819; a=rsa-sha256; cv=none; b=5qfs37+ilTejRLjFzMqBNZZ3DsK79UnkUFroaWUen0TDjNDwr0QNHlDrCtMQetpSVmHzq8 sZ4xYl/uKjxWCEbI3AgYyH3ut+dNWeGXuBCkCPsD4+6fEuY3hYpEb+LsCGRZ9jGuIFL6O7 X/VTTYXQ+hb4xSELzlbCW7/qe5SxhPA= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c5kotxV+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of surenb@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726179819; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r08X9SW48hwdcFyWCetupv6po9aUVBR2VI/xglM7xt0=; b=qOKVxlOMCFvdo1AE7HOvMjZpmwKi4+s/f4zEaZhDJbUEUlPr1M6vr6Mgk12PcSHiESVwuK zF9ZjphEQQk+1fNdBdrm4w6WsQZgMyCfBK7DDcP3w6v4cXSqXeqNmXVxZXnnkFbWzwhaxE +41c08E0NSgUTRpY6pBLNUtbB2Bhlu4= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5c24374d8b6so5365a12.1 for ; Thu, 12 Sep 2024 15:25:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726179903; x=1726784703; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=r08X9SW48hwdcFyWCetupv6po9aUVBR2VI/xglM7xt0=; b=c5kotxV+bsz6it7TTznpau0kstGR3NOAFmB1jnHNc7ME/DoojUBESbGGwYRa4mg6Q0 S+QSJAihxTkM4MdAiaaE8MnPTCRUzt+LWhR20DurWmsA7ZcL+We1+b/g7eTYvltfj5ks Y50CN4BELnxCBcf3Wqm8q7TS5bL1BZmvzEwJ1eDy/yqIfXqjUvE1RXnd2Meh+R7AZqoX ftdJ5qY+LvNmrWLDXdv+vX+JdFwjU5Ykyyjz/ZuYRZ5EMuRmErovwrUL6FAyG6zF4R3o fYXe9dKbVstA5oXz1MoUl0hkVPAHu7WMvxjKOY9ZpvmrSEa52drk7iCNnXbWMtZY7sTq AuTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726179903; x=1726784703; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r08X9SW48hwdcFyWCetupv6po9aUVBR2VI/xglM7xt0=; b=xKE5xycabgSVKv41bCZWSfM2/76fSPPZJ/p24bxemc9UI8/nyWFdswBfJMofyk87tC HUaMQB5SZa2NRvw5QZCyZZkw5fKUJ22Q/Vv8UT6EN6AQN/B/uerczqeJZGMIpUwdYcUP eQrmwrWAp8Txi72j9iGSACCwNm6sOYaFhicTmyDzy0ce8zmILycVGR2uZyZxRMWwYBg/ t0+0mZueV9paDsubil5AhGKTbMzb7gkvqGgcawrBvScMpJj27XdGWVaYrKxdOOfzRwMf NSIgb6BqDeCKQKpv14gA7ZM1r2Oybcs5bI6PmV7C18NVc5fpprudFi6mFMlCS4obzBJ5 +p/g== X-Forwarded-Encrypted: i=1; AJvYcCUeWRrLZRVKUUNAfZ7GvIps2i9IZNhxUfAICQGlCFURE4JDZQ53HZoBZe05bO/70GdQqWQUxfeFtw==@kvack.org X-Gm-Message-State: AOJu0Yyta/97otzsrIZY+nU1W5wROqgS+jmwBxHrbbUulVm9zNpoKI8c VCxXpAWqbN8iWSB0M1cOVqASrOUWAPg/B1Mlw/r9OL4jjW7caLNcXh8B2Dk/btCz0AcydUqd/o0 En/VbQkiHNuHjb9KtUT3zm8AHU4uQ/c/yyEr9 X-Google-Smtp-Source: AGHT+IGbXWMb8f+8TB0Fh3tbR8FGyTaM8DKVaS1fc+Rx1XvEkBhuc+66BgyT2RB36WIwNd5YLWgX7TxErR5Lw/uLj4E= X-Received: by 2002:a05:6402:2688:b0:5c2:5629:5fbb with SMTP id 4fb4d7f45d1cf-5c415db646fmr471514a12.1.1726179902448; Thu, 12 Sep 2024 15:25:02 -0700 (PDT) MIME-Version: 1.0 References: <20240912210222.186542-1-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 12 Sep 2024 15:24:50 -0700 Message-ID: Subject: Re: [PATCH v2 1/1] mm: introduce mmap_lock_speculation_{start|end} To: Andrii Nakryiko Cc: linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org, mjguzik@gmail.com, brauner@kernel.org, jannh@google.com, andrii@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: E54FD4000B X-Stat-Signature: fr5nqny4nnyihokk8py7arymsuidwe96 X-Rspam-User: X-HE-Tag: 1726179904-923977 X-HE-Meta: U2FsdGVkX1/ng1fk52hy93ExmcQV++cAq3Wm/s+aKnLDTLBT3tiM8rOD6ojNLV/IlRBUTJ0uOZPMJZcHE0UBr48cSV02iI7Jg6s8i4TE9tGTKeFN0mRXy111EDjjN3XwgGK74/8Jglqn63Kx6J/VsHRWfxtj8Rc5YTyDIp8HWODuM8ihdmYQWshi4xf2SzQZnJfuprl9uLPKFCQVQW9lVPW6aE6lfV3D558377mhYdRBNc7VmbKK2kX+vPVReGcY5yUqsD+Kp8xGB5E2W3SZPUiXucVSdkBkPh8rigpAHaFsxfKz4evNzcC/1fNVfc3aJsnjrE0Qdj+mp6g9ea1HJMwbxIq1q0FotNqlND+fMSYxLtj+OLZE29alIezYS2yvaGFAVsEDA2eUvGk2fN776dF0ab65LK675WcI4D02AdK0hvdlW20QrYRIPn7dxREKEVGJa1kmaIBwaHgyR55KtzXTNqQq848HDgNUHP+6s1qIve8bOeUBZT5Aj3KRjJjLYvt5KurdVoN1FV+SQCHPap/hd00Z5aDOm+G3PJOJ/ECT1gZLg32j9usESy7lSTNXZAQ6PvoefU43phR2+1sPtNfyDburn1c9Fz4luaLDCXheghCnW/PBMXk7Y4Hf4sgINpusTrgGScck9LjXYjftFZrefaFwqXDohJq2htlAWfuI5iXcD5yjIwoWX/3/vMMvPkdlZakWbzipg4U/5RGOIT5hXXNdE3z+lDtdDQpOtTiiL5lIUlnAP6jCcvYtcz29gYp5keNZNIFYT7pP9NxN4P7mb9yQNGObm++FhHE9d49uSrscBo6jdoUN/BMthPfzW6WlOsKHpAmIZc7SJi0GPSonajq3e6IRHmE125q8wGFVOn9ShRTrX8Z/AqfBUDFyi074HkQKCrPQx3vKkmT+3pkMxaysi/Zno3sgPCfbRCrCk5jTZpG7WWg6BDmr2GAeNmNLYrmt09/sa2stM40 3slsmzZk /x9k1CeEDHBPH4XWpgrMvBMxdUM78iGFPqlxLLPsh7rm1Ve+EVg2H9uHZbWjNHRYlcsORdC0M4Ti/h2eONQNtl4Rnk1QD0RdGkDFz2Lz42awo9lIMIdSUQxEhy6SBWD0v9ZkU/6HzTSx0CIZz5RA9o0fZGvHLRaG69HHlfdq3VXjAmCFRRUEfFzjVVRJCDmCXrhFSXfeRyht+fATAh1K0L+UZWE/lYybnphgfnnJoaAk6U/qay8njRLG4rgfg/PlEMbOGuM7kPYigNqJ5pnLCaaWC4hw1bcDvKIlgXImpZYCD4Vl0iKthtQhK2OsZ+/BudeeQms07Tp0XyJNaUP88iboy4il5zTEGAS7yD3Z1rATfy2YbT307wmz/onJZlFdzaVtiX1/pPkZp1s/j9KwkwxmOqlCtKXSnPRjFkaYpT+wSprE42A+IDsrQItOZEWdMojIQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 12, 2024 at 3:20=E2=80=AFPM Andrii Nakryiko wrote: > > On Thu, Sep 12, 2024 at 2:04=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Thu, Sep 12, 2024 at 2:02=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > Add helper functions to speculatively perform operations without > > > read-locking mmap_lock, expecting that mmap_lock will not be > > > write-locked and mm is not modified from under us. > > > > Here you go. I hope I got the ordering right this time around, but I > > would feel much better if Jann reviewed it before it's included in > > your next patchset :) > > Thanks a lot! And yes, I'll give it a bit of time for reviews before > sending a new revision. > > Did you by any chance get any new ideas for possible benchmarks > (beyond what I did with will-it-scale)? Hmm. You could use Mel Gorman's mmtests suite I guess. > > > > Thanks, > > Suren. > > > > > > > > Suggested-by: Peter Zijlstra > > > Signed-off-by: Suren Baghdasaryan > > > Signed-off-by: Andrii Nakryiko > > > --- > > > Changes since v1 [1]: > > > - Made memory barriers in inc_mm_lock_seq and mmap_lock_speculation_e= nd > > > more strict, per Jann Horn > > > > > > [1] https://lore.kernel.org/all/20240906051205.530219-2-andrii@kernel= .org/ > > > > > > include/linux/mm_types.h | 3 ++ > > > include/linux/mmap_lock.h | 74 ++++++++++++++++++++++++++++++++-----= -- > > > kernel/fork.c | 3 -- > > > 3 files changed, 65 insertions(+), 15 deletions(-) > > > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > > index 6e3bdf8e38bc..5d8cdebd42bc 100644 > > > --- a/include/linux/mm_types.h > > > +++ b/include/linux/mm_types.h > > > @@ -887,6 +887,9 @@ struct mm_struct { > > > * Roughly speaking, incrementing the sequence number= is > > > * equivalent to releasing locks on VMAs; reading the= sequence > > > * number can be part of taking a read lock on a VMA. > > > + * Incremented every time mmap_lock is write-locked/u= nlocked. > > > + * Initialized to 0, therefore odd values indicate mm= ap_lock > > > + * is write-locked and even values that it's released= . > > > * > > > * Can be modified under write mmap_lock using RELEAS= E > > > * semantics. > > > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h > > > index de9dc20b01ba..a281519d0c12 100644 > > > --- a/include/linux/mmap_lock.h > > > +++ b/include/linux/mmap_lock.h > > > @@ -71,39 +71,86 @@ static inline void mmap_assert_write_locked(const= struct mm_struct *mm) > > > } > > > > > > #ifdef CONFIG_PER_VMA_LOCK > > > +static inline void init_mm_lock_seq(struct mm_struct *mm) > > > +{ > > > + mm->mm_lock_seq =3D 0; > > > +} > > > + > > > /* > > > - * Drop all currently-held per-VMA locks. > > > - * This is called from the mmap_lock implementation directly before = releasing > > > - * a write-locked mmap_lock (or downgrading it to read-locked). > > > - * This should normally NOT be called manually from other places. > > > - * If you want to call this manually anyway, keep in mind that this = will release > > > - * *all* VMA write locks, including ones from further up the stack. > > > + * Increment mm->mm_lock_seq when mmap_lock is write-locked (ACQUIRE= semantics) > > > + * or write-unlocked (RELEASE semantics). > > > */ > > > -static inline void vma_end_write_all(struct mm_struct *mm) > > > +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquir= e) > > > { > > > mmap_assert_write_locked(mm); > > > /* > > > * Nobody can concurrently modify mm->mm_lock_seq due to excl= usive > > > * mmap_lock being held. > > > - * We need RELEASE semantics here to ensure that preceding st= ores into > > > - * the VMA take effect before we unlock it with this store. > > > - * Pairs with ACQUIRE semantics in vma_start_read(). > > > */ > > > - smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1); > > > + > > > + if (acquire) { > > > + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); > > > + /* > > > + * For ACQUIRE semantics we should ensure no followin= g stores are > > > + * reordered to appear before the mm->mm_lock_seq mod= ification. > > > + */ > > > + smp_wmb(); > > > + } else { > > > + /* > > > + * We need RELEASE semantics here to ensure that prec= eding stores > > > + * into the VMA take effect before we unlock it with = this store. > > > + * Pairs with ACQUIRE semantics in vma_start_read(). > > > + */ > > > + smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq += 1); > > > + } > > > +} > > > + > > > +static inline bool mmap_lock_speculation_start(struct mm_struct *mm,= int *seq) > > > +{ > > > + /* Pairs with RELEASE semantics in inc_mm_lock_seq(). */ > > > + *seq =3D smp_load_acquire(&mm->mm_lock_seq); > > > + /* Allow speculation if mmap_lock is not write-locked */ > > > + return (*seq & 1) =3D=3D 0; > > > +} > > > + > > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, i= nt seq) > > > +{ > > > + /* Pairs with ACQUIRE semantics in inc_mm_lock_seq(). */ > > > + smp_rmb(); > > > + return seq =3D=3D READ_ONCE(mm->mm_lock_seq); > > > } > > > + > > > #else > > > -static inline void vma_end_write_all(struct mm_struct *mm) {} > > > +static inline void init_mm_lock_seq(struct mm_struct *mm) {} > > > +static inline void inc_mm_lock_seq(struct mm_struct *mm, bool acquir= e) {} > > > +static inline bool mmap_lock_speculation_start(struct mm_struct *mm,= int *seq) { return false; } > > > +static inline bool mmap_lock_speculation_end(struct mm_struct *mm, i= nt seq) { return false; } > > > #endif > > > > > > +/* > > > + * Drop all currently-held per-VMA locks. > > > + * This is called from the mmap_lock implementation directly before = releasing > > > + * a write-locked mmap_lock (or downgrading it to read-locked). > > > + * This should normally NOT be called manually from other places. > > > + * If you want to call this manually anyway, keep in mind that this = will release > > > + * *all* VMA write locks, including ones from further up the stack. > > > + */ > > > +static inline void vma_end_write_all(struct mm_struct *mm) > > > +{ > > > + inc_mm_lock_seq(mm, false); > > > +} > > > + > > > static inline void mmap_init_lock(struct mm_struct *mm) > > > { > > > init_rwsem(&mm->mmap_lock); > > > + init_mm_lock_seq(mm); > > > } > > > > > > static inline void mmap_write_lock(struct mm_struct *mm) > > > { > > > __mmap_lock_trace_start_locking(mm, true); > > > down_write(&mm->mmap_lock); > > > + inc_mm_lock_seq(mm, true); > > > __mmap_lock_trace_acquire_returned(mm, true, true); > > > } > > > > > > @@ -111,6 +158,7 @@ static inline void mmap_write_lock_nested(struct = mm_struct *mm, int subclass) > > > { > > > __mmap_lock_trace_start_locking(mm, true); > > > down_write_nested(&mm->mmap_lock, subclass); > > > + inc_mm_lock_seq(mm, true); > > > __mmap_lock_trace_acquire_returned(mm, true, true); > > > } > > > > > > @@ -120,6 +168,8 @@ static inline int mmap_write_lock_killable(struct= mm_struct *mm) > > > > > > __mmap_lock_trace_start_locking(mm, true); > > > ret =3D down_write_killable(&mm->mmap_lock); > > > + if (!ret) > > > + inc_mm_lock_seq(mm, true); > > > __mmap_lock_trace_acquire_returned(mm, true, ret =3D=3D 0); > > > return ret; > > > } > > > diff --git a/kernel/fork.c b/kernel/fork.c > > > index 61070248a7d3..c86e87ed172b 100644 > > > --- a/kernel/fork.c > > > +++ b/kernel/fork.c > > > @@ -1259,9 +1259,6 @@ static struct mm_struct *mm_init(struct mm_stru= ct *mm, struct task_struct *p, > > > seqcount_init(&mm->write_protect_seq); > > > mmap_init_lock(mm); > > > INIT_LIST_HEAD(&mm->mmlist); > > > -#ifdef CONFIG_PER_VMA_LOCK > > > - mm->mm_lock_seq =3D 0; > > > -#endif > > > mm_pgtables_bytes_init(mm); > > > mm->map_count =3D 0; > > > mm->locked_vm =3D 0; > > > > > > base-commit: 015bdfcb183759674ba1bd732c3393014e35708b > > > -- > > > 2.46.0.662.g92d0881bb0-goog > > >