From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04157C28B20 for ; Sat, 29 Mar 2025 01:35:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FB7E280175; Fri, 28 Mar 2025 21:35:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AA8C280165; Fri, 28 Mar 2025 21:35:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54CDD280175; Fri, 28 Mar 2025 21:35:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 34A8D280165 for ; Fri, 28 Mar 2025 21:35:16 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 831BBC0982 for ; Sat, 29 Mar 2025 01:35:16 +0000 (UTC) X-FDA: 83272870632.15.B925146 Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf11.hostedemail.com (Postfix) with ESMTP id 5EF5F40004 for ; Sat, 29 Mar 2025 01:35:14 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RQ8ThjTB; spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743212114; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WjmZ+rEpVyhBnGSMcIqM6zkvVu8zf0AwMJJ5/O4AM7M=; b=k/jbpZZqsD/dKrn125s0hPugn7qtAdJ/ojXTOvGsuTjBPs39HciXn3278ccyt0naf6H7Eg SoX3+EAkovggb464v2I6GSk1S7qIsuSGcAvWj7sxI98Uc7NMNu+eqC7gEiH4POy05grZdi n2Yxc5zoCIOvvoAgBhcxspILs3jyUug= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RQ8ThjTB; spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743212114; a=rsa-sha256; cv=none; b=P0im4+uLeelaz60bO0VOC1auR1kI++KmZoOMYSaRfam0gQq7WSKqNvZ/XKLgZJ/SNIvjeH gksF1UjEqw6gcyCaXDWT034jrjm08sRzELCEoLNUVOspB3lkswLUQl5nNxdUWYhREtHXck siKtLQYj1IsM7H47ewGvQcYzWIWbMo8= Received: by mail-qt1-f182.google.com with SMTP id d75a77b69052e-4774611d40bso108051cf.0 for ; Fri, 28 Mar 2025 18:35:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743212113; x=1743816913; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WjmZ+rEpVyhBnGSMcIqM6zkvVu8zf0AwMJJ5/O4AM7M=; b=RQ8ThjTBct9mQ2mOzgETqUoUPPZPFOndUHA+42csNcLFj2J1VzCI6Wcidoo/olgox0 Xh4WUIWE53eIgntRcNe44isV2aXpEt9K4/sufjhemyhURSXSIQE26PNEguEZE5lfpQAT 4wEkjVQ8jggNlH65jRFFQcN5msRtLfv668sd86m06BoV6vJXEu402ZtzzpX6Y1J3FHJL pxQ6Swox0a8UIXqMFSjt7qktDTL829X/T4kFZzPdFuF+uY7yfp7ATt0axT/3g2OKd7Pg CPwliINimrJ1oIKxZsDP2KjctjLJvarnmOa3UFFfz6YLWK9RuV01w8jB3yyFOhe31mMc mbVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743212113; x=1743816913; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WjmZ+rEpVyhBnGSMcIqM6zkvVu8zf0AwMJJ5/O4AM7M=; b=j4TmjUsDTyp/QKgZbcJj2MKqbG/n8xXyKjUNdHnabCg8hFXfPMMWFJ9mRdgfXsppFY qBhT/wEqhipfr15TDj2jqumbLHrcPfB+o0kWbEXWti+wkeF43Clb14/9WAYjEv9wpuwv It8ceKhvJ1MHzoG74UNpWybHZ5cCwMs7iaVn8lXKy0CcSkDekSLv4OdFDci/ptY8HXcx SxKgr0ffvdmi0dBYn2UQGi+EhiY6JCy+DZ+ZIPpCDk2kv4Eg9MeVfWghzOFq8nHGQIV0 bPjIsHH7r4WTSlVbbz34GhKUc/30pAiSWbtrbwtnlqBeMGbG1TX9O9p3q+OdvwtxetV9 8ZqQ== X-Gm-Message-State: AOJu0YzuM0m5fQwZLsvheV61e5JmWu2als0nMPHk9Nvmmby79+LPv6dU aFGKt6sIggChYUyRUW7T8RUC56XyTK9gxM7VdFykPOjzd0XaT0KJgxDnCrZD0mG1iG9TzN27SKI ImBc6fDq8taPlomVqXLQH7o9jAWBgvPkvtL7v X-Gm-Gg: ASbGnctDvHsMOlfraA9uG6XH9vfN5+OukVBKmDXwvw7EdvfxeBe3+84ZSVxxaZBaJy5 1N+BibfR2R0XVk3FRGdxBYJB6tXp5AgWMe6MxOKBDlNgtyJUNosIBD22qboKIW4PaE8jzvjOGs1 Ku1n2Tebkdf0vIsLxLxboZJ03hDw== X-Google-Smtp-Source: AGHT+IGduey42us1cFNdBn+OyKsNcwd+Cr4Sr5lIviplmKewQjsMJELgKrYCWVDYSCRRquWNmJFW3U03DPwNFWA5NIY= X-Received: by 2002:ac8:7f84:0:b0:477:c4f:ee58 with SMTP id d75a77b69052e-47800540a77mr1508111cf.24.1743212113023; Fri, 28 Mar 2025 18:35:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Suren Baghdasaryan Date: Fri, 28 Mar 2025 18:35:01 -0700 X-Gm-Features: AQ5f1JqV91bCFn0RPMwsE6304RdhbZQh1-iIDIHMMpWBi1k4F4BY59hPhxPJLA0 Message-ID: Subject: Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded To: Mateusz Guzik Cc: linux-mm , "Liam R. Howlett" , Lorenzo Stoakes , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 5EF5F40004 X-Stat-Signature: i5g1mtogi9qsik59zyj6ochctofxkb1n X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1743212114-729940 X-HE-Meta: U2FsdGVkX199T4n3Uaiv51sFymtIJLJpdNm3CFOaaKn6fjBmb9ADXRmlXa6g//kYfKTiPz6SoLHQIS5YDacVeIa0Q6aUCAUNmWqkJurKy0o5Bf6Dz5oF/Gi6Xa8RQQtTLksAuaRkbrq2mbLzPbM2KDdBOqRQqg8BFBJTWsXoZdD9qNXR1r15EPLVrYcbMgOB75XpU2fHXAPchJantDYKBuuJ8nOjjiFUEISJ4llo01o/AbQRfs4h3y0GS51cq8eSvna2rsX0OoWcMDQv8ogvL9GAuxtbCqCQneioreCa1itMCk0mBHTXJy2s4rq9qhNnOVM1Z+eKQZdH7WgCWARsaiKQErCf6yhKRJNQrwNcqbtevh6d9I+FSRrzn1333VGA/gzawazd65HhOsgCeJtu7IQrN7fLEVQjVaUzbEbhx0axcpVW+sf6wq29RPdEVzSvIOZOj4AdzBVPz4HtmqJKmJ6HuhvfazYHthPcPG6uAd3r8N/PN8NoMJ5GGbLljBiiDK9WH+7rCRumCqrvFcT+bTcO9BesSqWA0LzKw9El5Gg5/HzDzl2p5ItQlLePjEP1hoqOX/S/ORIy+wms8LxRbbK7g5HQ4ikUiSzbxP0PtKzj7F8XCFMYF3NJbjczE6dYAku6f1f61WxhNy6zx5604KfjW3qnyGgskxzr2T9A9oJ+GEZ7TJopd030vA1Dvk+NMjRgz+tE4Xwr8VnKveYXjQASqWkIu3CL8NnCiMuW9plI0HOzqpsKwhe7IMD6xyXMJRUv4ozDLkEdafLlaxXCezIHGZeNbMAqQetRzs3I/kC+Yu6FT9VF9S0LDk9aTtMNeKCtgAt1YFtydjL1bOP+DKb/V7EpvUgl2jWvvDHK7FKzHpfOzfGygiKZsD4vWEZbMotH7PiBeXqq1tShT+r9qQaFgyIEtrDD3ruN7y7Gbznyd92KHhspfphUx171b0XOm2LxIgfj9s2kFL9Zd70 pEwP3RqV hKCv3FSoYhAVc83dxqwqhnc7k2NpIVFarizx6WMHTBSVFgxYPudx4/2wPYOsSBzIAmIc9NFex4fJHd8UtTO7Sbdu/phPZ98jqjuq8mUfMxn3LDDGRjvpRNr6RpLhqDF/EvyEaV7I7N5zDMAwl0j+arBaFEurwJkn98pv/xGTGfWdRI2iGDQ1svyAWk+Fd4DhrUn5mHPgSCKLcymiLrKLogFxuJQNtMYuvybYzuGu1y4l/gsqqe/yZYQdu62SaWk8LYQ/qqiRJ7eGm7gokSME9kA2Ezs4OEqgBQCG483Snjqml6pi4WW33/JSWcw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 28, 2025 at 6:16=E2=80=AFPM Mateusz Guzik w= rote: > > On Sat, Mar 29, 2025 at 1:57=E2=80=AFAM Suren Baghdasaryan wrote: > > On Wed, Mar 26, 2025 at 10:46=E2=80=AFPM Mateusz Guzik wrote: > > > Here is the original: > > > https://lore.kernel.org/all/20230804214620.btgwhsszsd7rh6nf@f/ > > > > > > the thread is weirdly long and I recommend not opening it without a > > > good reason, I link it for reference if needed. > > > > I had to re-read it to remember what it was all about :) To bring > > others up-to-speed, the suggested change would look something like > > this: > > > > @@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_str= uct *mm, > > unsigned long charge =3D 0; > > LIST_HEAD(uf); > > VMA_ITERATOR(vmi, mm, 0); > > + bool only_user; > > > > if (mmap_write_lock_killable(oldmm)) > > return -EINTR; > > + only_user =3D atomic_read(&oldmm->mm_users) =3D=3D 1; > > flush_cache_dup_mm(oldmm); > > uprobe_dup_mmap(oldmm, mm); > > /* > > @@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_str= uct *mm, > > mt_clear_in_rcu(vmi.mas.tree); > > for_each_vma(vmi, mpnt) { > > struct file *file; > > > > - vma_start_write(mpnt); > > + if (!only_user) > > + vma_start_write(mpnt); > > if (mpnt->vm_flags & VM_DONTCOPY) { > > retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_st= art, > > mpnt->vm_end, GFP_K= ERNEL); > > > > > > > > At the time there were woes concerning stability of the new locking > > > scheme, resulting in CC list being rather excessive. As such I'm > > > starting a new thread with a modest list, not sure who to add though. > > > > > > So dup_mmap() holds the caller mmap_sem for writing and calls > > > vma_start_write() to protect against fault handling in another thread= s > > > using the same mm. > > > > > > If this is the only thread with this ->mm in place, there are no > > > sibling threads to worry about and this can be checked with mm_users > > > =3D=3D 1. > > > > > > AFAICS all remote accesses require the mmap_sem to also be held, whic= h > > > provides exclusion against dup_mmap, meaning they don't pose a proble= m > > > either. > > > > Yes, I believe the optimization you proposed would be safe. > > > > > > > > The patch to merely skip locking is a few liner and I would officiall= y > > > submit it myself, but for my taste an assert is needed in fault > > > handling to runtime test the above invariant. > > > > Hmm. Adding an assert in the pagefault path that checks whether the > > fault is triggered during dup_mmap() && mm_users=3D=3D1 would not be > > trivial. We would need to indicate that we expect no page faults while > > in that section of dup_mmap() (maybe a flag inside mm_struct) and then > > assert that this flag is not set inside the pagefault handling path. > > I'm not sure it's worth the complexity... As was discussed in that > > thread, the only other task that might fault a page would be external > > and therefore would have to go through > > access_remote_vm()->mmap_read_lock_killable() and since dup_mmap() > > already holds mmap_write_lock the new user would have to wait. > > > > > So happens I really > > > can't be bothered to figure out how to sort it out and was hoping you > > > would step in. ;) Alternatively if you guys don't think the assert is > > > warranted, that's your business. > > > > I don't think it's worth it but I'll CC Matthew and Lorenzo (you > > already CC'ed Liam) to get their opinion. > > > > > > > > As for whether this can save any locking -- yes: > > > > Yeah, I'm sure it will make a difference in performance. While forking > > we are currently locking each VMA separately, so skipping that would > > be nice. > > Folks, WDYT? Do we need a separate assertion that pagefault can't > > happen if mm_users=3D=3D1 and we are holding mmap_write_lock > > (access_remote_vm() will block)? > > pseudocode-wise I was thinking in the lines of the following in the fault= path: > > if (current->mm !=3D vma->vm_mm) > mmap_assert_locked(vma->vm_mm); > > with the assumption that mmap_assert_locked expands to nothing without de= bug I see. IOW, if someone external is faulting a page then it has to be holding at least mmap_read_lock. So, it's a more general check but seems reasonable. I think adding it at the end of lock_vma_under_rcu() under the "#ifdef CONFIG_DEBUG_VM" condition would be enough. > > > > > > > > > I added a probe (below for reference) with two args: whether we are > > > single-threaded and vma_start_write() returning whether it took the > > > down/up cycle and ran make -j 20 in the kernel dir. > > > > > > The lock was taken for every single vma (377913 in total), while all > > > forking processes were single-threaded. Or to put it differently all > > > of these were skippable. > > > > > > the probe (total hack): > > > bpftrace -e 'kprobe:dup_probe { @[arg0, arg1] =3D count(); }' > > > > > > probe diff: > > > diff --git a/fs/namei.c b/fs/namei.c > > > index ecb7b95c2ca3..d6cde76eda81 100644 > > > --- a/fs/namei.c > > > +++ b/fs/namei.c > > > @@ -5459,3 +5459,7 @@ const struct inode_operations > > > page_symlink_inode_operations =3D { > > > .get_link =3D page_get_link, > > > }; > > > EXPORT_SYMBOL(page_symlink_inode_operations); > > > + > > > +void dup_probe(int, int); > > > +void dup_probe(int, int) { } > > > +EXPORT_SYMBOL(dup_probe); > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > > index 1f80baddacc5..f7b1f0a02f2e 100644 > > > --- a/include/linux/mm.h > > > +++ b/include/linux/mm.h > > > @@ -760,12 +760,12 @@ static bool __is_vma_write_locked(struct > > > vm_area_struct *vma, unsigned int *mm_l > > > * Exclude concurrent readers under the per-VMA lock until the curre= ntly > > > * write-locked mmap_lock is dropped or downgraded. > > > */ > > > -static inline void vma_start_write(struct vm_area_struct *vma) > > > +static inline bool vma_start_write(struct vm_area_struct *vma) > > > { > > > unsigned int mm_lock_seq; > > > > > > if (__is_vma_write_locked(vma, &mm_lock_seq)) > > > - return; > > > + return false; > > > > > > down_write(&vma->vm_lock->lock); > > > /* > > > @@ -776,6 +776,7 @@ static inline void vma_start_write(struct > > > vm_area_struct *vma) > > > */ > > > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > > > up_write(&vma->vm_lock->lock); > > > + return true; > > > } > > > > > > static inline void vma_assert_write_locked(struct vm_area_struct *vm= a) > > > diff --git a/kernel/fork.c b/kernel/fork.c > > > index 735405a9c5f3..0cc56255a339 100644 > > > --- a/kernel/fork.c > > > +++ b/kernel/fork.c > > > @@ -629,6 +629,8 @@ static void dup_mm_exe_file(struct mm_struct *mm, > > > struct mm_struct *oldmm) > > > pr_warn_once("exe_file_deny_write_access() failed in > > > %s\n", __func__); > > > } > > > > > > +void dup_probe(int, int); > > > + > > > #ifdef CONFIG_MMU > > > static __latent_entropy int dup_mmap(struct mm_struct *mm, > > > struct mm_struct *oldmm) > > > @@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_s= truct *mm, > > > unsigned long charge =3D 0; > > > LIST_HEAD(uf); > > > VMA_ITERATOR(vmi, mm, 0); > > > + bool only_user; > > > > > > if (mmap_write_lock_killable(oldmm)) > > > return -EINTR; > > > + only_user =3D atomic_read(&oldmm->mm_users) =3D=3D 1; > > > flush_cache_dup_mm(oldmm); > > > uprobe_dup_mmap(oldmm, mm); > > > /* > > > @@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_s= truct *mm, > > > mt_clear_in_rcu(vmi.mas.tree); > > > for_each_vma(vmi, mpnt) { > > > struct file *file; > > > + bool locked; > > > + > > > + locked =3D vma_start_write(mpnt); > > > + dup_probe(only_user ? 1 :0, locked ? 1 : 0); > > > > > > - vma_start_write(mpnt); > > > if (mpnt->vm_flags & VM_DONTCOPY) { > > > retval =3D vma_iter_clear_gfp(&vmi, mpnt->vm_= start, > > > mpnt->vm_end, GFP= _KERNEL); > > > > > > -- > > > Mateusz Guzik > > > > -- > Mateusz Guzik