From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E594C28B20 for ; Sat, 29 Mar 2025 01:16:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D1D8280174; Fri, 28 Mar 2025 21:16:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 25B7A280165; Fri, 28 Mar 2025 21:16:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D50D280174; Fri, 28 Mar 2025 21:16:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DD510280165 for ; Fri, 28 Mar 2025 21:16:09 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6D8835388B for ; Sat, 29 Mar 2025 01:16:11 +0000 (UTC) X-FDA: 83272822542.01.B74ED76 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf09.hostedemail.com (Postfix) with ESMTP id 72855140008 for ; Sat, 29 Mar 2025 01:16:09 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eOGy2mKX; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743210969; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+IaMuBgyzRkz0J5mm2/En9YauDUe7HPi2YCn8hrN0Lg=; b=k1JjC1VvnjVTvdTsorNz9TAkSt/pVszDYu/sd6uivrgQHA1jqz48iI/uPpbOMk8WeIXpFF zC4h5UF85i13bJw+Bz6Bkp1j/KpFAOSjJPrkDo1it7BWYX+mkKCthpncB3ErE9PZGL7Vih 89uOi9QmVb84LAfgfTVClKWvdKPWI8c= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=eOGy2mKX; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743210969; a=rsa-sha256; cv=none; b=MK02asr29yEeM7Bq8Af3sF/1/EiwFwbhCv4HFcKI5hpJqkmLB/E/WUxn4Lh/Vaq4P27WvX GQQAy+fkg882a+OxxF8Nqv0+GoJFuSvjgATPGz0f9DPc9S8MawXh3wkusX3UVryAe95ayu pKaSX+9phMi1pNvsPfRcbHvTGq+nFI8= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-ac2a81e41e3so533645366b.1 for ; Fri, 28 Mar 2025 18:16:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743210968; x=1743815768; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+IaMuBgyzRkz0J5mm2/En9YauDUe7HPi2YCn8hrN0Lg=; b=eOGy2mKXGelQvAdkk528S4woRsJE/D33r1sVz34kPAV0MQyObX66qqcnLsqd57gtXk LSErkylEuEAACLV9gjCacNgVDhGE1v0GM9bOuGarZHUhB/0xkP+I2loEX8Al/pIeZCNX p84BziwEO+sn5Qm/TGDe8qnGFx5bVoh4e5JzG//Zvnx1hKDWlKDhDlE8rXUjiYugBCjR 6rlLSuXIbg9cz4CLOz2utLiwDbL5udcUngDkNd6zMiQ7T4BPD1AkeCyTf9bVtZw0b9wt dhdcL0VdOhSpN6sa0uf4CQetUHE0I6f1UKKgK4NzvHFPRf44akMX8BlDHQeeNvtdRRW3 ykhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743210968; x=1743815768; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+IaMuBgyzRkz0J5mm2/En9YauDUe7HPi2YCn8hrN0Lg=; b=WpEnVtE2qV7A374eGzVdg1mSSlTWGgKBRCwboL3eE5GKawyJUCgiSvVBfe6XhXj+mN Uh1m8N4cYgk3sypBuqdKNT9R27+8AgJt/B0aihWa/rnoUaffhSTviKVut6oUq3+6TJ0O yGddAYXSG78xl8T51cpkr+IaIe9x7Q8rwNO37JMoqlnkQp4RpOhpvzcJtEn72BU2hbfC DJpGTB3+Wn0D7aID37FNia9HXWFAVIFJKjx5NJdibntLci0DLq9ULLcIr3lJJOfh3Pgk kOSmHb/2OQ8fF4sQ++90+WKxpZVF93lARsS/Aw97jewDRC5A6TwjZL4wFhabsoDH9P3t 5uXA== X-Gm-Message-State: AOJu0YzBxYei3iJTPVhzwHiIW6oJEBEDbx+1OUi8aO5u0iZ5ZS0aySIV nyisS3t0UrIrtn7hxHR52c9ObDaennCSQYRmoK7G7udQNDPLnQ/pKuT8qQulH0uvpzwsQT3eNPH 2PI84B1rvPAQh5Su1q/955sa+2CI= X-Gm-Gg: ASbGnctGLVCQEIaRIp48Nk3xsnKI8fT9jX9vG3BbZIapgS05zVwJfv1QWo2d86m1jL2 45enJLIPaVlJtjEmwqktWIxMB0hnwsGgEa5+5M5DS2LOxnxMq0YaKmRfPFirNAa01RjkIUtD3TV 6T4CGEd+SHMRuOVmyOxK6pEuuc X-Google-Smtp-Source: AGHT+IFQQFTGlZEfiX6DJlbGjmMRBWrUfJMTdfNW90X1zxI6BbKArSZ8Jue85a//Pn6N7MVFSYTn9EzXtxCXL2GDM30= X-Received: by 2002:a17:907:c1f:b0:abf:4ca9:55ff with SMTP id a640c23a62f3a-ac738a9853fmr83879166b.32.1743210967507; Fri, 28 Mar 2025 18:16:07 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Mateusz Guzik Date: Sat, 29 Mar 2025 02:15:55 +0100 X-Gm-Features: AQ5f1JrWRFSLw4mEmOSaMi3_vaEXCGlV_PgtMtDFd_FBfqhdQq-3YZHCdwhRT44 Message-ID: Subject: Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded To: Suren Baghdasaryan Cc: linux-mm , "Liam R. Howlett" , Lorenzo Stoakes , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Stat-Signature: zmg8mzjgrn93arrpf86pc7ubb8phbfym X-Rspam-User: X-Rspamd-Queue-Id: 72855140008 X-HE-Tag: 1743210969-436230 X-HE-Meta: U2FsdGVkX1/1+AaTdWiLYdH2IayOZwhqcVLzg4XOuZFr/jYL0DrCPOftLKNd2ZlDTZXJMBA3ysPsJmBhqzw9fPXBvacFa0Vw7o2A2+qT+yAu8p/AU1XaV7YSwQwsL+MfowdDfO4hzgrqPN32XJcRTT9Ur0yejvtMMz7YtFASMlqXcO1vnov5yPDcP0eQI0NM0lqf+6dagnMv19SW5hUgaZHPTaX60aOQ2xCCUglvKUg8IbDeaaKjCyDEtzdTU/8d80NqD4sUNP+uFrZ5rtUSZKSq+daoLtv71WFtzmmrmyqBogQ8ihIHf60QjfD/0a7ZDGexWGNZp/kt21eAQy6tkZGNmbocH5PzQAvP4OXAI7hEKvOGcgEP/k6P82mTVWbiEUbhJH17vCipWB32zz7cE44H40O2Or9tMKjfFDQFYTxL0s8cSfUjxLNPV+nqCGO4wzGWNM1YqYx6tF4fWu0kjPI4azp2hKjpAXPADV7QSpT8+mLd5QDV/8D6svf84Apfk2a73qTdP2Flhiunq0yJAeh8PAQ2vduUa2fOgyZsjcYfRxTJ9X5lWLs190mxf4sKdXd/Uu5urtPVUiLu7zBjfdW9MyYrnnpoeWJmcbHZ1UpN0I9JE7B3LEYd46Vv4JrwGnw9M8Sfe8MnL5BPr9BPIpaqOR4CPght6pneFKqpRbi4EcWW/pnIagNS2w+hfr0pSK1WxIS1rXwwQcUK4xImiXx8ik9ycrh607yV5mF0reHOQisWNESpEJrEuJlyARFyqgScWxuNWnyLmYwl+qDHxqj5iMHZv/4pwP5zuBlZEbjprXeZkDcL27q8F2BS6pE5yfFepld+ehTbZe9KFlq591g1BJnfMVE4pzWoHfvugABbtLPheMn6nQyXFJVsTpWOJIvt+kfJ8/1X+J/dvD1KmpEVofsVmWA9TuskvPlLiSNC8b71WEgLtI9paBi9PQmXgvmQoAhMRJiuObFJ5su CXDHGpcc ikagB1N0vjPRLxx3AJaZAa6jlo6+/7vd0SNFSEzoFzBvJh4Z/JpxhO2BFTREAs9YUXGKNtWspkI2B5+SuIKKEDS7TzjyNxniypCJAw9jwqO4684Qr4GyPaICSpnT7G3WZz3LxY8ODw2ktFYGweYIdGpVaI4NPdb0sBYUtBg6K/8LDg3FyqaYdCzVENA9KnWr9ymVRCqihaiVqpVz+t7DBlmU0diWD7KGhYWMZ72oUcSsQRR94C7ll5QAmo+Re7FvLLLC1cBWwt/7hnWwUhCtFoWpxtGw/iNrzOKLhsdNY4g0Ql91pBvl9TH1kKCiw2t7V/glCsKTiaQBDiZfHawo/qHxVVHbeg9hEI2lKKakQmZDmzoWDrrl+1qF/L8xV0GUlZIDNwQdXE2n5eEsWSytJGPCNwI8WwNuiwZm79lfPPUz3c/JPGmBTEi6vSA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Mar 29, 2025 at 1:57=E2=80=AFAM Suren Baghdasaryan wrote: > On Wed, Mar 26, 2025 at 10:46=E2=80=AFPM Mateusz Guzik wrote: > > Here is the original: > > https://lore.kernel.org/all/20230804214620.btgwhsszsd7rh6nf@f/ > > > > the thread is weirdly long and I recommend not opening it without a > > good reason, I link it for reference if needed. > > I had to re-read it to remember what it was all about :) To bring > others up-to-speed, the suggested change would look something like > this: > > @@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_struc= t *mm, > unsigned long charge =3D 0; > LIST_HEAD(uf); > VMA_ITERATOR(vmi, mm, 0); > + bool only_user; > > if (mmap_write_lock_killable(oldmm)) > return -EINTR; > + only_user =3D atomic_read(&oldmm->mm_users) =3D=3D 1; > flush_cache_dup_mm(oldmm); > uprobe_dup_mmap(oldmm, mm); > /* > @@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_struc= t *mm, > mt_clear_in_rcu(vmi.mas.tree); > for_each_vma(vmi, mpnt) { > struct file *file; > > - vma_start_write(mpnt); > + if (!only_user) > + vma_start_write(mpnt); > if (mpnt->vm_flags & VM_DONTCOPY) { > retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_star= t, > mpnt->vm_end, GFP_KER= NEL); > > > > > At the time there were woes concerning stability of the new locking > > scheme, resulting in CC list being rather excessive. As such I'm > > starting a new thread with a modest list, not sure who to add though. > > > > So dup_mmap() holds the caller mmap_sem for writing and calls > > vma_start_write() to protect against fault handling in another threads > > using the same mm. > > > > If this is the only thread with this ->mm in place, there are no > > sibling threads to worry about and this can be checked with mm_users > > =3D=3D 1. > > > > AFAICS all remote accesses require the mmap_sem to also be held, which > > provides exclusion against dup_mmap, meaning they don't pose a problem > > either. > > Yes, I believe the optimization you proposed would be safe. > > > > > The patch to merely skip locking is a few liner and I would officially > > submit it myself, but for my taste an assert is needed in fault > > handling to runtime test the above invariant. > > Hmm. Adding an assert in the pagefault path that checks whether the > fault is triggered during dup_mmap() && mm_users=3D=3D1 would not be > trivial. We would need to indicate that we expect no page faults while > in that section of dup_mmap() (maybe a flag inside mm_struct) and then > assert that this flag is not set inside the pagefault handling path. > I'm not sure it's worth the complexity... As was discussed in that > thread, the only other task that might fault a page would be external > and therefore would have to go through > access_remote_vm()->mmap_read_lock_killable() and since dup_mmap() > already holds mmap_write_lock the new user would have to wait. > > > So happens I really > > can't be bothered to figure out how to sort it out and was hoping you > > would step in. ;) Alternatively if you guys don't think the assert is > > warranted, that's your business. > > I don't think it's worth it but I'll CC Matthew and Lorenzo (you > already CC'ed Liam) to get their opinion. > > > > > As for whether this can save any locking -- yes: > > Yeah, I'm sure it will make a difference in performance. While forking > we are currently locking each VMA separately, so skipping that would > be nice. > Folks, WDYT? Do we need a separate assertion that pagefault can't > happen if mm_users=3D=3D1 and we are holding mmap_write_lock > (access_remote_vm() will block)? pseudocode-wise I was thinking in the lines of the following in the fault p= ath: if (current->mm !=3D vma->vm_mm) mmap_assert_locked(vma->vm_mm); with the assumption that mmap_assert_locked expands to nothing without debu= g > > > > > I added a probe (below for reference) with two args: whether we are > > single-threaded and vma_start_write() returning whether it took the > > down/up cycle and ran make -j 20 in the kernel dir. > > > > The lock was taken for every single vma (377913 in total), while all > > forking processes were single-threaded. Or to put it differently all > > of these were skippable. > > > > the probe (total hack): > > bpftrace -e 'kprobe:dup_probe { @[arg0, arg1] =3D count(); }' > > > > probe diff: > > diff --git a/fs/namei.c b/fs/namei.c > > index ecb7b95c2ca3..d6cde76eda81 100644 > > --- a/fs/namei.c > > +++ b/fs/namei.c > > @@ -5459,3 +5459,7 @@ const struct inode_operations > > page_symlink_inode_operations =3D { > > .get_link =3D page_get_link, > > }; > > EXPORT_SYMBOL(page_symlink_inode_operations); > > + > > +void dup_probe(int, int); > > +void dup_probe(int, int) { } > > +EXPORT_SYMBOL(dup_probe); > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 1f80baddacc5..f7b1f0a02f2e 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -760,12 +760,12 @@ static bool __is_vma_write_locked(struct > > vm_area_struct *vma, unsigned int *mm_l > > * Exclude concurrent readers under the per-VMA lock until the current= ly > > * write-locked mmap_lock is dropped or downgraded. > > */ > > -static inline void vma_start_write(struct vm_area_struct *vma) > > +static inline bool vma_start_write(struct vm_area_struct *vma) > > { > > unsigned int mm_lock_seq; > > > > if (__is_vma_write_locked(vma, &mm_lock_seq)) > > - return; > > + return false; > > > > down_write(&vma->vm_lock->lock); > > /* > > @@ -776,6 +776,7 @@ static inline void vma_start_write(struct > > vm_area_struct *vma) > > */ > > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > > up_write(&vma->vm_lock->lock); > > + return true; > > } > > > > static inline void vma_assert_write_locked(struct vm_area_struct *vma) > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 735405a9c5f3..0cc56255a339 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -629,6 +629,8 @@ static void dup_mm_exe_file(struct mm_struct *mm, > > struct mm_struct *oldmm) > > pr_warn_once("exe_file_deny_write_access() failed in > > %s\n", __func__); > > } > > > > +void dup_probe(int, int); > > + > > #ifdef CONFIG_MMU > > static __latent_entropy int dup_mmap(struct mm_struct *mm, > > struct mm_struct *oldmm) > > @@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_str= uct *mm, > > unsigned long charge =3D 0; > > LIST_HEAD(uf); > > VMA_ITERATOR(vmi, mm, 0); > > + bool only_user; > > > > if (mmap_write_lock_killable(oldmm)) > > return -EINTR; > > + only_user =3D atomic_read(&oldmm->mm_users) =3D=3D 1; > > flush_cache_dup_mm(oldmm); > > uprobe_dup_mmap(oldmm, mm); > > /* > > @@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_str= uct *mm, > > mt_clear_in_rcu(vmi.mas.tree); > > for_each_vma(vmi, mpnt) { > > struct file *file; > > + bool locked; > > + > > + locked =3D vma_start_write(mpnt); > > + dup_probe(only_user ? 1 :0, locked ? 1 : 0); > > > > - vma_start_write(mpnt); > > if (mpnt->vm_flags & VM_DONTCOPY) { > > retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_st= art, > > mpnt->vm_end, GFP_K= ERNEL); > > > > -- > > Mateusz Guzik --=20 Mateusz Guzik