From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 896FBC04A6A for ; Thu, 10 Aug 2023 20:32:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC47E6B0071; Thu, 10 Aug 2023 16:32:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B74AA6B0072; Thu, 10 Aug 2023 16:32:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A155B6B0074; Thu, 10 Aug 2023 16:32:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8FFB26B0071 for ; Thu, 10 Aug 2023 16:32:02 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3F38CB2A9B for ; Thu, 10 Aug 2023 20:32:02 +0000 (UTC) X-FDA: 81109341684.13.814DF11 Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by imf27.hostedemail.com (Postfix) with ESMTP id 9C85240008 for ; Thu, 10 Aug 2023 20:32:00 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=sQJbvYiD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691699520; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g59HiU23KR/p51M+HUju4XZbajYYWPrjBLvP531ccBY=; b=lZmU1ydXYVWlmQwitazBwjRi1d1+b1ho7nZu0iXiQFtzDK0/yJiEZyApMngQdJ6Dq07mag gY7nS4+YRqOk1DCzdzZD062g+OVupA82rvUYbVCLUfaPURbmZhuJoi5Fch4E72WxgraNZS Fjey7amzLciMiCxKH8BaBSqPyMlOZsE= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=sQJbvYiD; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of surenb@google.com designates 209.85.219.173 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691699520; a=rsa-sha256; cv=none; b=umVJZGck6OOy6YRNzqWjW2d/JXkNSnQlVuMk+NAGZ04oPGOOsw5VC/4rN+zgAPIkvYc/+4 VJR56w7P8knj4g5oujonDy+xjUxmzSh1ZeCAwyjnd2qeo+YWK4ePYavvxvFX8IWFaL1qSb GV4wxIRhK9tne0yUHgXsdXe1EuJvr9k= Received: by mail-yb1-f173.google.com with SMTP id 3f1490d57ef6-d075a831636so1242132276.3 for ; Thu, 10 Aug 2023 13:32:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691699519; x=1692304319; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=g59HiU23KR/p51M+HUju4XZbajYYWPrjBLvP531ccBY=; b=sQJbvYiD2y8zgmqroUUInNm5qo+9svPvMKGI2Q5EosMEVYq8fVpXbepiuBmM8UHsu5 1xA3VsZZ74iKcPdDnDhxZC4uSHJK54aT4DlNWO8hwfA4JiuD7t4CWPxiXrP/nbUOHo0Q yT7umT2X5oKCRGZ0takoWpPqSlsZp0v9MrOa7YHwyPU0B+IqdvprJriglT/mbQ+sHNxD UMiNvorIcqNnTefZhdQ/k6OBDOutLULbBDm66lgBSHMzlUCl+hCesN16wLAgGaphn/wz PxDkPKB3lDFs+LRlcoBwp7Zp/Zw7wNOABXSpvJcfjHw/n52956z3zBvb0UYdl2a6nkaO Z6LQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691699519; x=1692304319; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g59HiU23KR/p51M+HUju4XZbajYYWPrjBLvP531ccBY=; b=axUESwrsm06Izx32xnLhdotgE3D3UISIRFVSVnWkyBj1iX8TSjmxsvefTqJQKfOv5b j8sK0hzSIoW8yM0uT4mXXY432TwlrsttlIPuq22X5IHBmmBuK7IQfHvlfwJc/YveQbzI RmrUKktYi4++KkutlFeQnVuDtrZOeyTbIq1XMWkcboRSXFfBUq+zXIll/Lm63zvBs9LP DuhmxHXGnJ6+l3fnQkPLpuSH5RmiAWH06DUJBrU0pRRUGoHqiwsZStPkF23QErdsAmZX CP5UDrQaFA1LdTe2lJIPgUNQPcugYlgCMgfznL8jQ6gJXTw06CekEO5BuaL9922vpeZC hP5Q== X-Gm-Message-State: AOJu0YwWNhi0FvwTWv8ux6jnnosdMW5NIpSR0Ja5BnZRh+aCxASRWsEy jcrSq7LbHw7o5dxlIQGQ5g/yv1qQYdR1dkyLBWoivw== X-Google-Smtp-Source: AGHT+IF8GAFo+XXfl6EEGjTASUeLQcv985qmMnxxeDdunk/11wQEl8xfkhkwuBI70mFIU+5epiXdNgUabufZRRazq9w= X-Received: by 2002:a25:c7c6:0:b0:d06:ae7d:8664 with SMTP id w189-20020a25c7c6000000b00d06ae7d8664mr3581227ybe.29.1691699519368; Thu, 10 Aug 2023 13:31:59 -0700 (PDT) MIME-Version: 1.0 References: <20230708191212.4147700-1-surenb@google.com> <20230708191212.4147700-3-surenb@google.com> <20230804214620.btgwhsszsd7rh6nf@f> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 10 Aug 2023 13:31:48 -0700 Message-ID: Subject: Re: [PATCH v2 3/3] fork: lock VMAs of the parent process when forking To: Mateusz Guzik Cc: Linus Torvalds , akpm@linux-foundation.org, regressions@leemhuis.info, bagasdotme@gmail.com, jacobly.alt@gmail.com, willy@infradead.org, liam.howlett@oracle.com, david@redhat.com, peterx@redhat.com, ldufour@linux.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, gregkh@linuxfoundation.org, regressions@lists.linux.dev, Jiri Slaby , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9C85240008 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: cd8667m7diobwfh5qr9o8mrqgdrc6ium X-HE-Tag: 1691699520-392199 X-HE-Meta: U2FsdGVkX1/9ep2pcLjImlrGBZMsPl6JpYg+5u9lfAjkgH445fwVp2IDEvDu1LiXVf8Y5fla9dULDYGvdaLy6agsOA5lvbS/dbocMDRH3HSulYa7s8e4NirIiWj1CiBSPpusfeF8Y7gS3ADya0aSvVASMfPZcYT1Gl41hs0pOxtC56lUATjE8xq3AthQLC0Dawnte7LXpcnmLSDbRMyb7SV1U5dru92AjmdKSqwpBVwE9fdCOYCBQY2zmZ8FQYtou5bBRyZue5kOPQDggHoMkgxG4D/fFsfkJ2+ltAmh0+q+qlD99gksH0c4EpDFX3zM5PG9f6+P6cjdHaw7b810fzhVbgPmhgERno1l6DcqANFpck8SDnDemfGrLQoKlp8zuKBmd3eHIVXX9VwOARx0eNt9hwjzxrd5bQP6oJhsSeg6FLAm2r2Z27Lu707/uiZvQPHgcAJBbDKO2HjTjl0UDPr7F+W3pUFtfz2UK78ToPS32hWA0UrBWXVbhPvCYaB+CdF3C+IwGuhDIuH2yr483zFwr6KmEssz09apnWgK7ugsnSj60wWR+AhqtJUFogwz51+kttpct6dqrMrl0wHqGJS/kzCSf39Psut1+E9a1EELBBmcGRazzb7ucHC56Dccj6RWKr8g77z3t7Pz5SBRrqgDYBGUhSXnTCSc84d0tT7raMnLRk3Sl6anabOca5W2v2CZw9HfmiUpNYmJxB5awL4+bMTDCKD/qoMBZlzMdwEkXXh6IAfWIbo0NL8iN4/dkRP21GEPqSEkD8TPgJk9Aw291ESe8OcBI3rLB4GktXGFx59xXWoOrx7b5v4S5Bk7WIeoKfp4o/20t0dqWGSWk7l5IuXs6sUa4fvrXqLeIGpjBCOYpgTMdTXUPAFt9s7+QKrczyGiVbWt583AoE3YsXkMJ/OgcVBsjFzlek1ZyvMo6/BgxuEvyetdwP8dSp5mKgrJ3KV7eiv15QhSDzR 4hTVaYCM IgFuyCSRQ/RL87Oi+aenkyiYKgEGAXMWDGU3EpqrcABh46rfwkGq5fjA2+WNEhu+rEtWFsDVkc2NNQ/wpPYEVGhKti8HlbmWEaCFtCvDKUSy1lt0rGxE0pWNx34OKtYs6QytgbjseKXP6hU7OeMY26ICYoeJbGFxrnW3JYXDpyT3rE0YkQlLfmh4KRNBYqpk6haGq8eSP7RlGkgQmINOpuBeicbkFnu/1J1vxzcOmpPYV+RbDj0TQEFwj0LRQCNId6n+Z7b0uZcBOP4PJ4lxwWKX4k2eIHEKioH4yZgL9RoSl67hqCGhftGXN4rhjW6PdSnmsXt4CIGW+tLpFfm5hFv7SqqiM2B9qRJRq1/s22vnm1FoO6LmxDVqCj0PUft77XCK9EmJo8VcQPQl+cAxdgm7n8GiI1JIMb1PaRpOiX1KVCgdaT1c/KakO0BJ11o96Bs2yQx3A09iuvACDr5N6gTjeZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Aug 9, 2023 at 2:07=E2=80=AFPM Mateusz Guzik wr= ote: > > On 8/5/23, Suren Baghdasaryan wrote: > > On Fri, Aug 4, 2023 at 6:06=E2=80=AFPM Mateusz Guzik wrote: > >> > >> On 8/5/23, Linus Torvalds wrote: > >> > On Fri, 4 Aug 2023 at 16:25, Mateusz Guzik wrote= : > >> >> > >> >> I know of these guys, I think they are excluded as is -- they go > >> >> through access_remote_vm, starting with: > >> >> if (mmap_read_lock_killable(mm)) > >> >> return 0; > >> >> > >> >> while dup_mmap already write locks the parent's mm. > >> > > >> > Oh, you're only worried about vma_start_write()? > >> > > >> > That's a non-issue. It doesn't take the lock normally, since it star= ts > >> > off > >> > with > >> > > >> > if (__is_vma_write_locked(vma, &mm_lock_seq)) > >> > return; > >> > > >> > which catches on the lock sequence number already being set. > >> > > >> > So no extra locking there. > >> > > >> > Well, technically there's extra locking because the code stupidly > >> > doesn't initialize new vma allocations to the right sequence number, > >> > but that was talked about here: > >> > > >> > > >> > https://lore.kernel.org/all/CAHk-=3DwiCrWAoEesBuoGoqqufvesicbGp3cX0L= yKgEvsFaZNpDA@mail.gmail.com/ > >> > > >> > and it's a separate issue. > >> > > >> > >> I'm going to bet one beer this is the issue. > >> > >> The patch I'm responding to only consists of adding the call to > >> vma_start_write and claims the 5% slowdown from it, while fixing > >> crashes if the forking process is multithreaded. > >> > >> For the fix to work it has to lock something against the parent. > >> > >> VMA_ITERATOR(old_vmi, oldmm, 0); > >> [..] > >> for_each_vma(old_vmi, mpnt) { > >> [..] > >> vma_start_write(mpnt); > >> > >> the added line locks an obj in the parent's vm space. > >> > >> The problem you linked looks like pessimization for freshly allocated > >> vmas, but that's what is being operated on here. > > > > Sorry, now I'm having trouble understanding the problem you are > > describing. We are locking the parent's vma before copying it and the > > newly created vma is locked before it's added into the vma tree. What > > is the problem then? > > > > Sorry for the late reply! > > Looks there has been a bunch of weird talking past one another in this > thread and I don't think trying to straighten it all out is worth any > time. > > I think at least the two of us agree that if a single-threaded process > enters dup_mmap an > down_writes the mmap semaphore, then no new thread can pop up in said > process, thus no surprise page faults from that angle. 3rd parties are > supposed to interfaces like access_remote_vm, which down_read said > semaphore and are consequently also not a problem. The only worry here > is that someone is messing with another process memory without the > semaphore, but is very unlikely and patchable in the worst case -- but > someone(tm) has to audit. With all these conditions satisfied one can > elide vma_start_write for a perf win. > > Finally, I think we agreed you are going to do the audit ;) Ack. I'll look into this once the dust settles. Thanks! > > Cheers, > -- > Mateusz Guzik