From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1046EB64DA for ; Sat, 8 Jul 2023 22:54:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 124F76B0072; Sat, 8 Jul 2023 18:54:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D57B6B0074; Sat, 8 Jul 2023 18:54:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDF508D0001; Sat, 8 Jul 2023 18:54:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E04386B0072 for ; Sat, 8 Jul 2023 18:54:03 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7D131C01B3 for ; Sat, 8 Jul 2023 22:54:03 +0000 (UTC) X-FDA: 80989949166.08.453B85D Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf20.hostedemail.com (Postfix) with ESMTP id 55B3A1C0013 for ; Sat, 8 Jul 2023 22:54:01 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Sg6Ahf+A; dmarc=none; spf=pass (imf20.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.49 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688856841; a=rsa-sha256; cv=none; b=4onT/lgK1SMEl2c2C0U2wmBq4g3MbzFZHgssXoFLCL62dSdLCCZ2Mp60K2youWFKikwYuX VDit/MATXuSyizK7vKOMPOMpX4WE9X3rKDTgUN800N7ngPX7nKlb+HsX2Xb4ADC931Kuu6 e+jBUvW1Ds75beaHx/sPQdv9uYw+89M= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Sg6Ahf+A; dmarc=none; spf=pass (imf20.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.49 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688856841; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HMLbgJOTu2WV/nORFtWqoUMvz+jr1XXDxLBzD5NfFBo=; b=bqDiRpRLDEZ54+HcmZ6K+sHB8i10zLhxyC3uQamAV6OOvDtX41IDURXVXl9xfzedSmDKBX ldQq+UlU9zTt6n0vpjxvnLRaLAStvthUvwb8q3eBQzGwwb/Q1jBDfwzu4ZgWIn5FCyI2R8 je3jMAzwhPr1vf4DBc/oCXY1Qkfu5Po= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-51d885b0256so4318879a12.2 for ; Sat, 08 Jul 2023 15:54:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1688856839; x=1691448839; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HMLbgJOTu2WV/nORFtWqoUMvz+jr1XXDxLBzD5NfFBo=; b=Sg6Ahf+A9qzcGKQMRKk69F4K9URRCUVpjB5cGLSCNq0mu+WuoJBjyae1Dvr9691OD5 fjxBdSuRdPH0+STiSjYVNN2SUgB5mGOA/pTMaVa2rrtmYzZJmmUv2DXbkLRKWOA/PSG8 vHEI/eLkcfL96BlYRAhQgI7eVLFxqdLAhrnfg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688856839; x=1691448839; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HMLbgJOTu2WV/nORFtWqoUMvz+jr1XXDxLBzD5NfFBo=; b=lDD6xNDI5aQLb8uM0S7+99qqXzc8rT0e0CpmgZoDmLVMY/vPsdBNtTyktqX4mtrz5p 46HJXtEid52OjGDz3flmkjCvwVuhOa70I3UaIruTWh/gjyWa0rZ6YUHG3yLpxaUypvS/ aA6RvJauOVx6p4Br0WvjWFs1kc+oWB380QqnKNE6zIT6Fk53TCAZwUFD0PaEKcbAAD5j ISQFHRnU+Y7Sr3sRrLgQVtxgXyK573lMaXzRZlvBQRlwnLNuh5Q+PzbYCO5SNO+5xEFc CXPZdvZjm4j8v2X/lbZSFdBnKImBWR8mrTKvbFwplKkjJtVmEsQmzqLRgdAehxfZzKto 6Hbw== X-Gm-Message-State: ABy/qLaxlpkEOf1Wpnq75Bu/rgjQeg7NB7tmWHZMaSjm1jgMt+B/erWy 8bmNGIB3Lu50GyXI6LIrY7oL1QyuG4UK8FsZyqDHHCew X-Google-Smtp-Source: APBJJlEW5ASw92sUCOQpg79FTgB7lfdK3uONBFZ7F7K/SlDVmeRuVLCPCFg6YLqywwovP4PP0bJCbw== X-Received: by 2002:a05:6402:1b0b:b0:51d:9b4d:66bd with SMTP id by11-20020a0564021b0b00b0051d9b4d66bdmr6808095edb.9.1688856839554; Sat, 08 Jul 2023 15:53:59 -0700 (PDT) Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com. [209.85.208.43]) by smtp.gmail.com with ESMTPSA id w23-20020aa7da57000000b0051debcb1fa2sm3742440eds.69.2023.07.08.15.53.57 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 08 Jul 2023 15:53:58 -0700 (PDT) Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-51bece5d935so4327685a12.1 for ; Sat, 08 Jul 2023 15:53:57 -0700 (PDT) X-Received: by 2002:aa7:d383:0:b0:51a:50f2:4e7a with SMTP id x3-20020aa7d383000000b0051a50f24e7amr7201378edq.13.1688856837545; Sat, 08 Jul 2023 15:53:57 -0700 (PDT) MIME-Version: 1.0 References: <20230708191212.4147700-1-surenb@google.com> <20230708191212.4147700-3-surenb@google.com> In-Reply-To: From: Linus Torvalds Date: Sat, 8 Jul 2023 15:53:40 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 3/3] fork: lock VMAs of the parent process when forking To: Suren Baghdasaryan , David Hildenbrand Cc: akpm@linux-foundation.org, regressions@leemhuis.info, bagasdotme@gmail.com, jacobly.alt@gmail.com, willy@infradead.org, liam.howlett@oracle.com, peterx@redhat.com, ldufour@linux.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, gregkh@linuxfoundation.org, regressions@lists.linux.dev, Jiri Slaby , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 55B3A1C0013 X-Stat-Signature: dsajnzyaroco8baaauibch9qkpc5d3mr X-HE-Tag: 1688856841-967750 X-HE-Meta: U2FsdGVkX193lru4+/h5h0RwYM5AfH8RDgQeJP6v13Epfbtjui1Yop+NFLC0/iC8dOt4b0X8lErqIDetYtQOLmDwTb4mOxbGVKce2dVwSvYmWDdaKVjsgLombrOsOs9Jdq+PxKs1TRrDreM/Cc0opaAj1xIs+bPCbTCSsUh96lPCBg77askGw9k+/eBgcqpH5PurvgoTtQUPlW5oWsoxXQ0/LID6xG1rPxZXy7iBqn7sCPlbyFynztTQDJgMPqLo1k6cSXxqEgAkUqVVrT5FZkoSM3Xot2da6VRKJVBzIV8qHNsIAeEokWmYnyztnwY77EFTdbv8Z2mOht9lhkXgcumJ6eGJwu3E9tFUKJV3wYAG/jlZBF/dsweCqT5i01WDBZb4fIW59Vm9/wYvjkMywXN9vOD0B9wy7Kq/02YcyJ4VPxiV569ra9L2jEVmc5Ndmej1qlscqH4CBZ1hyud6+vVR1nclglNM5irGOYXMSlWzcZEZRzBWA1tV1ZlVBvdO1Dkml04Zlr1iUEOtlVnDJAXeMTmpc6OivcgUF/NCOVi9TOx0ea0I+Yfxnuqplz6skXVj3rpzSDAZeklYhd4FWj+6E3O8KzSuXNw6SqLTpape7ZDZeDXw7HTBiZtU2GOSgeWqnXjBS2ahab4ZSwWVvnf7X0yLvkJC73KjX2k9+zzBOzGH125wkA5o2ih3Qzjd85D7mqC7iRvF643xnzstjerT49YaCE2A0IOmMdfS6D19EO4nq40zDExJS9aMzMIfae4E2JBT8n0ZdICn6GUTTB5y5E9MugLWlLVf3RGh4mJGI7uFWwfb+8S0cds/009pdbF5RB+y8KgIAzMoEcz6OvfVKtXo2PfuMn3iMAiQgT7XAjpICGdK99jlU6rp1VwyAmeMXiz+wBOk1Mvvt9WEaVs6IImLFVfh/ukbDrxdViyMnrmltAy+UrpEA49f0pUFVSHRQUhTSpKDChEMOIx 41gY2RYA RxXow5cjF7SnCAOoODG5r9vVW74dn7e1Fwce6gJi5LC3oIfQSOKhUpsRfJoxMqlHHZD4vD51aiVOgcpLGEQNoOOQcQB+Mv5XwF+elJcXyDNzJHPKpTLuk7FZ8y3qBQC1AoygzyQIHTPo1nAOQpmg0r93V687jTrBt3ASbN5M9PR5GYTV5zAtX9cP3bY21llZ9ynZ+dz5MG9ubQokx4LenjfDtuo/+rr7c6e28EERC/D4dPGfv3dh0+87+dPIcMo6hezjEj4bca2wlAyvqCEa3A/n8aDDh1EMOIIMTxbq15YeNeodMeIGSqtLdAKu6DZQ+PBW9jDYgryf0oz1Uv3F+i2kxrQMzPaPBoTBfpcn6hKJsJ6R6ukccuSni+QDFMIE5z6zFiBe56WbuZTwGaCIYd/xqryF+liu5sYd6WI1VgRZ1H3H6xSDr6iXI9k99aMWbOMooT01aCg4UWgu+f8tbiDETq29kNRn7kF9eXOkw5sHbxCbXx4Uw6TCgZTgkTwmUOXCjwgkM2K5t5qO33K1PjPW77pqBXvNtSoezMqIOwDSBx3upR6Kmr7O1VxVhoLbbxud7NZu+MugVxplYC62wA/Ng9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 8 Jul 2023 at 15:36, Suren Baghdasaryan wrote: > > On Sat, Jul 8, 2023 at 2:18=E2=80=AFPM Linus Torvalds > > > > Again - maybe I messed up, but it really feels like the missing > > vma_start_write() was more fundamental, and not some "TLB coherency" > > issue. > > Sounds plausible. I'll try to use the reproducer to verify if that's > indeed happening here. I really don't think that's what people are reporting, I was just trying to make up a completely different case that has nothing to do with any TLB issues. My real point was simply this one: > It's likely there are multiple problematic > scenarios due to this missing lock though. Right. That's my issue. I felt your explanation was *too* targeted at some TLB non-coherency thing, when I think the problem was actually a much larger "page faults simply must not happen while we're copying the page tables because data isn't coherent". The anon_vma case was just meant as another random example of the other kinds of things I suspect can go wrong, because we're simply not able to do this whole "copy vma while it's being modified by page faults". Now, I agree that the PTE problem is real, and probable the main thing, ie when we as part of fork() do this: /* * If it's a COW mapping, write protect it both * in the parent and the child */ if (is_cow_mapping(vm_flags) && pte_write(pte)) { ptep_set_wrprotect(src_mm, addr, src_pte); pte =3D pte_wrprotect(pte); } and the thing that can go wrong before the TLB flush happens is that - because the TLB's haven't been flushed yet - some threads in the parent happily continue to write to the page and didn't see the wrprotect happening. And then you get into the situation where *some* thread see the page protections change (maybe they had a TLB flush event on that CPU for random reasons), and they will take a page fault and do the COW thing and create a new page. And all the while *other* threads still see the old writeable TLB state, and continue to write to the old page. So now you have a page that gets its data copied *while* somebody is still writing to it, and the end result is that some write easily gets lost, and so when that new copy is installed, you see it as data corruption. And I agree completely that that is probably the thing that most people actually saw and reacted to as corruption. But the reason I didn't like the explanation was that I think this is just one random example of the more fundamental issue of "we simply must not take page faults while copying". Your explanation made me think "stale TLB is the problem", and *that* was what I objected to. The stale TLB was just one random sign of the much larger problem. It might even have been the most common symptom, but I think it was just a *symptom*, not the *cause* of the problem. And I must have been bad at explaining that, because David Hildenbrand also reacted negatively to my change. So I'll happily take a patch that adds more commentary about this, and gives several examples of the things that go wrong. Linus