From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA3A4C28B20 for ; Sat, 29 Mar 2025 00:57:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6765F280172; Fri, 28 Mar 2025 20:57:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 626DA280165; Fri, 28 Mar 2025 20:57:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EF55280172; Fri, 28 Mar 2025 20:57:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2925C280165 for ; Fri, 28 Mar 2025 20:57:08 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3F6D3B8026 for ; Sat, 29 Mar 2025 00:57:08 +0000 (UTC) X-FDA: 83272774536.01.9BC3718 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf02.hostedemail.com (Postfix) with ESMTP id 78E458000B for ; Sat, 29 Mar 2025 00:57:06 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oxfaQu3j; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743209826; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JzRygMyv0V1NGimIk1v3svu+EcZDmN5Xz1VnN8Dn/o4=; b=h9iPsoQ/4t1NeiI2geJhcbbPdz2WwKioMwIMJmwMSYwIQDO6IbmkeWYpJEaCS6XfbZ8Qer xzFpm6ZAuzEJrSi9PGSN/kEvZd3uu9gmH9m2EssUIDKZUMCo3ijXENQgZGTO8b5kbYO8y1 oegyMOy8ee+BumXGbpdEspv4cLdSTJM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=oxfaQu3j; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of surenb@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743209826; a=rsa-sha256; cv=none; b=wJ2b4No755Kwms+vVQvCxV54mf/jwqAFI3bF0a8gjHMcsTFfT1nnds534DqsfP3jbY2lwF 3s6RqBhPLvXUfslQHM6e4DCXjTY+SUe3lUuItb45dvsa6jAqFsKkkLCwCAbPPcF2mMCNj9 57Lf2M2HxHBC54Zj1OSgM0lOt12f9k0= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-4774611d40bso102881cf.0 for ; Fri, 28 Mar 2025 17:57:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743209825; x=1743814625; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JzRygMyv0V1NGimIk1v3svu+EcZDmN5Xz1VnN8Dn/o4=; b=oxfaQu3j4q7K5r7Pl6qXwX8hcHFZUoQ7w/0xayY1cd4tSpucu5yI6so+/XsHSkBHJ/ 5h2RjYtWV1V1qDYuLJh/oLnnN5RGqppP9kqxGYH2eIPnaPA+IBmDacEcPEVWGom7pSuS VQSNkpB9PIKMgdhqIof8Y4DZirUyQsbJEZGblPefkODs+4ss9UoBlcjdX2Zkv2549OzQ E0kr+LIeyIpur5FooaHEDXg3Tmayc2dcr9kQo0FMYUbuLVT41XlsqlpPx1oGPLIZQlXv 8Yr/v3Agn+FHiwJaltK0i5WqbgHx8N3lBsEykM8W4rHjJTlz416/Z9HYtbO8CtC7hGCA k/Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743209825; x=1743814625; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JzRygMyv0V1NGimIk1v3svu+EcZDmN5Xz1VnN8Dn/o4=; b=JYrINpAompk76zu2KCZqPNyAWzWULo2vO9sr5e85H9KLEixzsdzbNIKQ50lrUDbeGo iD+2vCK9Ie5m1J4ykQq0Mm7UoUHhWu6AgegD++SwZiGVquJ164hUpx574Kfcq3BcSA9O c4CSgzxMLPV7EXSSTlr+E5zCNTm95rhkOeTp8xaLgFSnSVNzR4wuiNIEVY13eG+M9I+n KkCNRgpCW5WQ3EoFlmw3FsNPCLhEM9aYlYEyoFc6FxtP0Zu71v3LHb5KbZTnXt12pMTz M41dDfi/LKt1UhnNzQkjWy3ZVkt4cL9SzOKlWKYsxgiTONTM546gJ+EPalQjt6c6QC+1 oi9Q== X-Gm-Message-State: AOJu0Yws77dhYy3PJHMCE8YhKkO22jJViZpVH09tsMIFfu5kfd9LYH2B exkV9euuV9/6VaRQZpa/7kLiUlHKYQ0b+AnQPR8rZ2WYHJQ4thA7qwkCfaMvmmbIbh8fzshizh6 uy58B226VSDrUooWz2Nng0+p4PxUSYr6DVSok X-Gm-Gg: ASbGncubunVSCjdQNTcGNBdeIrPG/y/iQINt5vl3mJVTXl9jDsbMv4MdohFN0bO7fSz JU9MBeDbSmuRvzJZSuHiZWkQIlvoJaIJ2sEQ2Zixgq9ZE4Ave8ADk6QhLMRxN9MmpKl+6CIUDoU h6lMNXBnaCp46yD96ZtoP6ulIGpQ== X-Google-Smtp-Source: AGHT+IEAiUOz9XkSHe978oLegP0ie4lmf2yLHz05FVyemRkt4pvHt0I9Hq6z43ETArG4vmgM74oWSNSXDG4JcXYdSMY= X-Received: by 2002:ac8:7f84:0:b0:477:c4f:ee58 with SMTP id d75a77b69052e-47800540a77mr1426311cf.24.1743209825203; Fri, 28 Mar 2025 17:57:05 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Suren Baghdasaryan Date: Fri, 28 Mar 2025 17:56:54 -0700 X-Gm-Features: AQ5f1Jq4Sd5F_gbA6wDzJFlAJC-xCl4eUZZzoZ0ffu-uZClHVJ3RFVrZ9-BDLFw Message-ID: Subject: Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded To: Mateusz Guzik Cc: linux-mm , "Liam R. Howlett" , Lorenzo Stoakes , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 78E458000B X-Stat-Signature: 1qpo9bktjk43ai84a5675yh477dwzpz5 X-Rspam-User: X-HE-Tag: 1743209826-53863 X-HE-Meta: U2FsdGVkX1/l5YxBfM/QEBoELek5HDcYaBkyzNwfFJsyUCxCyOqYSpyWDNdWC80DMfG2B/bMb1OTZaieKUub0elSuMLpv4njlXggwEFz2O0AF/4dCw5fq1lx0aqfTlxz3x9r2Pj9xgLSvNa1frmF3/5fwe3xzj9xXWyhL1SezzFSVpAwwY6G7uyW56B5EtOUBsTP7KBlyNPEooLc53hP7B2vql8t5UCPNLE1/SWURFnsHdRUzFqjJ7lhRUF4mtOsQVSie7DRdIy+5rzoBr35bSMD8ZlDr7vDdsnMXdd7BDnCw6M8QRRmgqv075fwYmctAWu0OCvCClUYA8ZMFiO9WEkKEbH9ucSxTng7owSWz75sVAFWirV6Vb+nSmUT7QZIBvMmQJNZYeSQJrebX0pd0AHwaFUVX2ugMz0RIv2JTNnzNuZKkS1aUXp8U3qUkWqrrKDePuVrc+LI7JxUQhfqX7X6FjHN0QExJIkm+F9Qud8Rmr+g2QlvpUbfDvC8v++zj5MMi+nnc7LKamitkWpGDFhcBVwLrIq8PFB2zW7NJxLhuyTccx1sib48MQygz/KiNcWC3WZ40qMQE0VAj5AJOoyTXX9BsIqAW2eIWY+G+x6GtD9ZH11x54eMCH8wB/vLa7+khrFEHvSjYmTaATLd95c8WO43yctp6K6r3UCQJ9BZbvbWjpFkjmVqdxLBA1X3BvqMWVIj/0YO0D83gVPN/7RLUb2BpvMyzwSZJIRG87E0d40TXplgtXqT/dXNcK7w2/WRMBBXIOkBxmX0FTvkokvtyJ1aGpIBe7qpVI8CF4AAGX0T+DhCzFpk7PdIchJqMMwdLy+kd6ddrAhbKGYvlONi9hcTNCQ899F8oJt8ZotWQqG01o0/2wWtrZIfFeJzsD4X7fNOVqJFukHBSLJb0m7R0rxE+YaqqQ9f8gAicgvz4hHdU8sWJOBADXZOuGp/eij9OVcvjMMmkwHP2q9 sANkqOB3 uaiH44lUHf7jgsJgNKtbW88PddtIH1xcE106JpJ+HtD4Wpg/wKuB6OUe+kAGEn5DtG88S/geGd41OjuOcshcnO2KPXDINkRe6YEG7j9a4ilBCaUHHRMHYoxNqzxPdNhaBd1CCHIdHdwfnIeKAbas8g2/9fn4Y2B2dkCtXa0GveQs2qaQC9eTqG+bHMusPfzId9W5JLmCKBoAMnVNsOT8YorLeV6QPHh2xvIoUB8x4sR3zVAcjNAZwgEL0UI1MPsk/XDzvwVWhb3KwGGWlX107mhNatnZCClJcKpKN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Mateusz, On Wed, Mar 26, 2025 at 10:46=E2=80=AFPM Mateusz Guzik = wrote: > > Hi Suren, > > I brought this up long time ago, you agreed the idea works and stated > you will sort it out, but it apparently fell to the wayside (no hard > feelings though :>) Sorry about that. Yeah, that was a hectic time and I completely forgot to look into the single-user case. > > Here is the original: > https://lore.kernel.org/all/20230804214620.btgwhsszsd7rh6nf@f/ > > the thread is weirdly long and I recommend not opening it without a > good reason, I link it for reference if needed. I had to re-read it to remember what it was all about :) To bring others up-to-speed, the suggested change would look something like this: @@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_struct = *mm, unsigned long charge =3D 0; LIST_HEAD(uf); VMA_ITERATOR(vmi, mm, 0); + bool only_user; if (mmap_write_lock_killable(oldmm)) return -EINTR; + only_user =3D atomic_read(&oldmm->mm_users) =3D=3D 1; flush_cache_dup_mm(oldmm); uprobe_dup_mmap(oldmm, mm); /* @@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_struct = *mm, mt_clear_in_rcu(vmi.mas.tree); for_each_vma(vmi, mpnt) { struct file *file; - vma_start_write(mpnt); + if (!only_user) + vma_start_write(mpnt); if (mpnt->vm_flags & VM_DONTCOPY) { retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_start, mpnt->vm_end, GFP_KERNE= L); > > At the time there were woes concerning stability of the new locking > scheme, resulting in CC list being rather excessive. As such I'm > starting a new thread with a modest list, not sure who to add though. > > So dup_mmap() holds the caller mmap_sem for writing and calls > vma_start_write() to protect against fault handling in another threads > using the same mm. > > If this is the only thread with this ->mm in place, there are no > sibling threads to worry about and this can be checked with mm_users > =3D=3D 1. > > AFAICS all remote accesses require the mmap_sem to also be held, which > provides exclusion against dup_mmap, meaning they don't pose a problem > either. Yes, I believe the optimization you proposed would be safe. > > The patch to merely skip locking is a few liner and I would officially > submit it myself, but for my taste an assert is needed in fault > handling to runtime test the above invariant. Hmm. Adding an assert in the pagefault path that checks whether the fault is triggered during dup_mmap() && mm_users=3D=3D1 would not be trivial. We would need to indicate that we expect no page faults while in that section of dup_mmap() (maybe a flag inside mm_struct) and then assert that this flag is not set inside the pagefault handling path. I'm not sure it's worth the complexity... As was discussed in that thread, the only other task that might fault a page would be external and therefore would have to go through access_remote_vm()->mmap_read_lock_killable() and since dup_mmap() already holds mmap_write_lock the new user would have to wait. > So happens I really > can't be bothered to figure out how to sort it out and was hoping you > would step in. ;) Alternatively if you guys don't think the assert is > warranted, that's your business. I don't think it's worth it but I'll CC Matthew and Lorenzo (you already CC'ed Liam) to get their opinion. > > As for whether this can save any locking -- yes: Yeah, I'm sure it will make a difference in performance. While forking we are currently locking each VMA separately, so skipping that would be nice. Folks, WDYT? Do we need a separate assertion that pagefault can't happen if mm_users=3D=3D1 and we are holding mmap_write_lock (access_remote_vm() will block)? Thanks, Suren. > > I added a probe (below for reference) with two args: whether we are > single-threaded and vma_start_write() returning whether it took the > down/up cycle and ran make -j 20 in the kernel dir. > > The lock was taken for every single vma (377913 in total), while all > forking processes were single-threaded. Or to put it differently all > of these were skippable. > > the probe (total hack): > bpftrace -e 'kprobe:dup_probe { @[arg0, arg1] =3D count(); }' > > probe diff: > diff --git a/fs/namei.c b/fs/namei.c > index ecb7b95c2ca3..d6cde76eda81 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -5459,3 +5459,7 @@ const struct inode_operations > page_symlink_inode_operations =3D { > .get_link =3D page_get_link, > }; > EXPORT_SYMBOL(page_symlink_inode_operations); > + > +void dup_probe(int, int); > +void dup_probe(int, int) { } > +EXPORT_SYMBOL(dup_probe); > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 1f80baddacc5..f7b1f0a02f2e 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -760,12 +760,12 @@ static bool __is_vma_write_locked(struct > vm_area_struct *vma, unsigned int *mm_l > * Exclude concurrent readers under the per-VMA lock until the currently > * write-locked mmap_lock is dropped or downgraded. > */ > -static inline void vma_start_write(struct vm_area_struct *vma) > +static inline bool vma_start_write(struct vm_area_struct *vma) > { > unsigned int mm_lock_seq; > > if (__is_vma_write_locked(vma, &mm_lock_seq)) > - return; > + return false; > > down_write(&vma->vm_lock->lock); > /* > @@ -776,6 +776,7 @@ static inline void vma_start_write(struct > vm_area_struct *vma) > */ > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > up_write(&vma->vm_lock->lock); > + return true; > } > > static inline void vma_assert_write_locked(struct vm_area_struct *vma) > diff --git a/kernel/fork.c b/kernel/fork.c > index 735405a9c5f3..0cc56255a339 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -629,6 +629,8 @@ static void dup_mm_exe_file(struct mm_struct *mm, > struct mm_struct *oldmm) > pr_warn_once("exe_file_deny_write_access() failed in > %s\n", __func__); > } > > +void dup_probe(int, int); > + > #ifdef CONFIG_MMU > static __latent_entropy int dup_mmap(struct mm_struct *mm, > struct mm_struct *oldmm) > @@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_struc= t *mm, > unsigned long charge =3D 0; > LIST_HEAD(uf); > VMA_ITERATOR(vmi, mm, 0); > + bool only_user; > > if (mmap_write_lock_killable(oldmm)) > return -EINTR; > + only_user =3D atomic_read(&oldmm->mm_users) =3D=3D 1; > flush_cache_dup_mm(oldmm); > uprobe_dup_mmap(oldmm, mm); > /* > @@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_struc= t *mm, > mt_clear_in_rcu(vmi.mas.tree); > for_each_vma(vmi, mpnt) { > struct file *file; > + bool locked; > + > + locked =3D vma_start_write(mpnt); > + dup_probe(only_user ? 1 :0, locked ? 1 : 0); > > - vma_start_write(mpnt); > if (mpnt->vm_flags & VM_DONTCOPY) { > retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_star= t, > mpnt->vm_end, GFP_KER= NEL); > > -- > Mateusz Guzik