From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12841EB64DD for ; Wed, 5 Jul 2023 17:24:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A404E8D0005; Wed, 5 Jul 2023 13:24:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EFD88D0001; Wed, 5 Jul 2023 13:24:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DEFE8D0005; Wed, 5 Jul 2023 13:24:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7D15C8D0001 for ; Wed, 5 Jul 2023 13:24:07 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4408C160402 for ; Wed, 5 Jul 2023 17:24:07 +0000 (UTC) X-FDA: 80978231334.15.280B9E6 Received: from mail-yw1-f182.google.com (mail-yw1-f182.google.com [209.85.128.182]) by imf08.hostedemail.com (Postfix) with ESMTP id F3BD8160028 for ; Wed, 5 Jul 2023 17:24:04 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=UxvRl93O; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688577845; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sHjvoo64g8hqR4hb9CfrVJyaF2fuw5Igy5Tcp1DXgn0=; b=yqXw3+ae/q2b3AixiXHoYAaEVHf8/IQgR9TZwwTdIYQjJ5Ws6itduLG7IKSWimcsug31ED 2oik2tJyXif4M3iDYnBo4CkYIptJIj71Z3G6bjwGZL7X1fd/Rs0IP55EkNBPUZWdOd1aTl l4usFHCA6Je+15iBU54h2WIU0ij8FjA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688577845; a=rsa-sha256; cv=none; b=AltkPQ5Tpw5LtRRXquRrNoV80x3bRDhchVbBOT0GR8N9DUbgEVGtaLCO4lMyfCHSl1AkAN kCsZ31nvLSYHHlQzpHu6IOiCMbKYC0CnRdKZpzutzOdI0ya5SHyvy16GShvKGz9tR98FRO qtHtV7UOjvtXtu60Hc5J2IvODQr0pvg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=UxvRl93O; spf=pass (imf08.hostedemail.com: domain of surenb@google.com designates 209.85.128.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f182.google.com with SMTP id 00721157ae682-56fff21c2ebso84186587b3.3 for ; Wed, 05 Jul 2023 10:24:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688577844; x=1691169844; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=sHjvoo64g8hqR4hb9CfrVJyaF2fuw5Igy5Tcp1DXgn0=; b=UxvRl93OuANxiyXqhyXqCmP3i8b5MiWAFlpa+NkGqyp7Idba76W8Zi+pYaDuYc8QQF tpXpBKOQWeP4Jj2xp63wGWrjOd4zyUlRPhTbHr2unVBgGIU/ruEwEiVucvDbTdW6MrvD KjtcaaJSZWTFmn9jrVitGbjRJwCfra5ctoo1050hawVosrwe51cJAqB5NuG6QQob/d5P bnD/AS6wPoTYkovC1lf3GjAXyZ5Xi+kc0m1epLh3aUHr6BJMy+xZs79S5v2Q0hMgLSP5 dNx/+zAc0HrJG9uD1lfdHtoyZquD1XibsyHsndmcWiqPSC8JvH2WG4E6M7jWRR9Tcj7X CgvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688577844; x=1691169844; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sHjvoo64g8hqR4hb9CfrVJyaF2fuw5Igy5Tcp1DXgn0=; b=irFngcztpdboe0EqKtOiy/Vbdy8GsWcP1rDXAmHiSwtHXfw79W7o28QlY82J+F/307 vhvU7v/RPfR+0KnbMhS5UbxZit+fs/hw+7HOn1mR3/1sLz6wS9mpQhXHLmVG98h0BLAt jDK3ZnEqz/ylEJYaCZYIQz+3WCiPxg7/Ii2w1u1mjbWXXyLXH1Ue4cB2+Dgq3CBvE9AW oXlXuQMfqu+P3IWPnRvJKW7KzNgvOJKfOkiNT3Gbtm4RniUdrqEXUiIrFXAHIiH9CcMc mtZ4P/RRz1poMVketol/XBU9aAbQXjyNggWjeHAgf1ev9FNRj0wsszD5xbLsqeBJMAj8 sKxQ== X-Gm-Message-State: ABy/qLbxssDAWEogeu8SY8xPoIt87PSAnUqN8HQ2CaMGev8ojYm3AAY7 apWcoaRRPx18dtEzLFa7UVokrqIwOcSO87Ft9rzYdg== X-Google-Smtp-Source: APBJJlEXr+7nJr1kglpTbDNE3cb0nSLMWuqSOLSYsZv8wiWSxoU7q9OpLPF0oSmE8g15pgbllX4DkKQIua5s0UK9uiQ= X-Received: by 2002:a25:69cd:0:b0:c4e:c503:d5f6 with SMTP id e196-20020a2569cd000000b00c4ec503d5f6mr9299399ybc.64.1688577843719; Wed, 05 Jul 2023 10:24:03 -0700 (PDT) MIME-Version: 1.0 References: <20230705171213.2843068-1-surenb@google.com> <20230705171213.2843068-2-surenb@google.com> <10c8fe17-fa9b-bf34-cb88-c758e07c9d72@redhat.com> In-Reply-To: <10c8fe17-fa9b-bf34-cb88-c758e07c9d72@redhat.com> From: Suren Baghdasaryan Date: Wed, 5 Jul 2023 10:23:52 -0700 Message-ID: Subject: Re: [PATCH v3 1/2] fork: lock VMAs of the parent process when forking To: David Hildenbrand Cc: akpm@linux-foundation.org, jirislaby@kernel.org, jacobly.alt@gmail.com, holger@applied-asynchrony.com, hdegoede@redhat.com, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: hqix7n4ep5qumsybgagzgntiy5ne8664 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: F3BD8160028 X-Rspam-User: X-HE-Tag: 1688577844-206212 X-HE-Meta: U2FsdGVkX19lgGoSgJ9E1fkNdGKx+i/bCGifq0/bPGtkd4ErMYx9lOv7S8i5bSGMcal8Iv/ZQ240t+36Tucod32boFPne6p9892YrdKLmieJvG0rKp09+bznunzp8CbRwL+BNhyRepDcq8rcscoZJeUdYAqTXaO5GPMBr/skOHCaWwAPo7eRwZ9NAL6AKSadiTR48GVmA6Sn0PvBAj8VCPmy4FUrHFzTFlKxChewblpe14mr8S1+kf2n20fA52J/iengN9wHIru2y9J3YDh8lJqjdO2pRi9R510GuTGRjZLApRLfTRYLcgl8OQyKtUMas9rOlfB2QV4SJrl8yyK5t5IlMqNRWwyJLQgtwJE74l1zOPiI6+zwDuNHVaLmaT70qmzNPNzPF7QhNpAS1WlkQT38gWVsl6571bkT8m9u3Mx1lsa0ov0gGrnXSA+OLy4BsTsJYwckAVEo3AhcQbXeZ/46Nqk7e75kNYhtBYgEZpMiHo2au7Jp7FVfkGYzPdH9oalTvmDzprL5VGRKbkxjJ+ot7zVvtKRvBGfruHX8RQjCzxn8E4wgNd+FgbAOw2uvmNmrBA831xOpMIQxqbIdotH9CxpP5Py9G1GSAAz8BxPE8uF31RZMbXUwIMRFPgCmSg1i+8Agt6P+VtTktSstfpxSu8AVC0Hut2hMpdhivJxBmFu+aTZaNajmr4d5aDkJIZ0OYLftu665u4Byksf9Nlz4h0ChcMk7oUUguUbS7LF8uZsVhASEmfqt8d+dOLK3VoOcR04gg6T4QRsIjhkJcuGKasG73KiyjCn8+54RYzFWPpg8YrgI/0ufwhgv8hQ/dYnUZVpDo4IKNKoHkDep8NUoPtkP68P/0kZ6HL5Zawj+dcuYveFc+aFc7s/KnQ81rA5UoCSsyC2h7LnwxWz7SBXbahxqDAwddvhk7iHaoeUj72I9uvoM4F4rgLqx72t/Hyr0PdUCW2WJ2AlwnLk Q5bHKgl9 wzTqVhjWAv6lichADJm4lA4gmhfrZxRRijZI++grwWH+mbtk299yctpbGjCMyK2FoOPnJU1fb+Q1L6RtF/ZbX/EweV9IblsXuiqpbGWuOjBopbJQAfWiXAGREfvqs5H7256B1Lq2zNs5cNYq1e5OWqe1potXfrjCZM5THCW/rB/neXDFsxgczlsSBOD77aLopX76OGeUmPneyJ/9IYXt08iNLGUXhug9wJSvI2qT6lv013ZFabsPX4i+OoYLpAV3gBWhQocGScQZBgrn+CyDQl3/IgQg4rXakPPdc0pvaTPIsnme3xhClFOe+CwZVfUMXBVVKFwz2cLiIb2ttmYq/LnraiLxZ9B7NpLu2mEoWbhe3BC9gt39dBDz9gIMq+PwRS5xHbjosaIewzV02UqCVAp0G3f/1HEE0/RC14OoSY0Xp/y9TLtozrFWCkDD9u4SaoUIMPZ++rVG+eq0OT4o70UqCvEZKxJVKo9v3FjAPL5Co6Hg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 5, 2023 at 10:14=E2=80=AFAM David Hildenbrand wrote: > > On 05.07.23 19:12, Suren Baghdasaryan wrote: > > When forking a child process, parent write-protects an anonymous page > > and COW-shares it with the child being forked using copy_present_pte(). > > Parent's TLB is flushed right before we drop the parent's mmap_lock in > > dup_mmap(). If we get a write-fault before that TLB flush in the parent= , > > and we end up replacing that anonymous page in the parent process in > > do_wp_page() (because, COW-shared with the child), this might lead to > > some stale writable TLB entries targeting the wrong (old) page. > > Similar issue happened in the past with userfaultfd (see flush_tlb_page= () > > call inside do_wp_page()). > > Lock VMAs of the parent process when forking a child, which prevents > > concurrent page faults during fork operation and avoids this issue. > > This fix can potentially regress some fork-heavy workloads. Kernel buil= d > > time did not show noticeable regression on a 56-core machine while a > > stress test mapping 10000 VMAs and forking 5000 times in a tight loop > > shows ~5% regression. If such fork time regression is unacceptable, > > disabling CONFIG_PER_VMA_LOCK should restore its performance. Further > > optimizations are possible if this regression proves to be problematic. > > > > Suggested-by: David Hildenbrand > > Reported-by: Jiri Slaby > > Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51= b@kernel.org/ > > Reported-by: Holger Hoffst=C3=A4tte > > Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34= c@applied-asynchrony.com/ > > Reported-by: Jacob Young > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624 > > Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling fi= rst") > > Cc: stable@vger.kernel.org > > Signed-off-by: Suren Baghdasaryan > > --- > > kernel/fork.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index b85814e614a5..403bc2b72301 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -658,6 +658,12 @@ static __latent_entropy int dup_mmap(struct mm_str= uct *mm, > > retval =3D -EINTR; > > goto fail_uprobe_end; > > } > > +#ifdef CONFIG_PER_VMA_LOCK > > + /* Disallow any page faults before calling flush_cache_dup_mm */ > > + for_each_vma(old_vmi, mpnt) > > + vma_start_write(mpnt); > > + vma_iter_init(&old_vmi, oldmm, 0); > > +#endif > > flush_cache_dup_mm(oldmm); > > uprobe_dup_mmap(oldmm, mm); > > /* > > The old version was most probably fine as well, but this certainly looks > even safer. > > Acked-by: David Hildenbrand Thanks! > > -- > Cheers, > > David / dhildenb >