From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29C39EB64DA for ; Sat, 8 Jul 2023 23:03:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6ED638D0001; Sat, 8 Jul 2023 19:03:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69DC66B0074; Sat, 8 Jul 2023 19:03:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58C2D8D0001; Sat, 8 Jul 2023 19:03:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 499316B0072 for ; Sat, 8 Jul 2023 19:03:51 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1D7541A023D for ; Sat, 8 Jul 2023 23:03:51 +0000 (UTC) X-FDA: 80989973862.27.89244B0 Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) by imf21.hostedemail.com (Postfix) with ESMTP id 4F87D1C000A for ; Sat, 8 Jul 2023 23:03:48 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=q8wnNoLt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688857428; a=rsa-sha256; cv=none; b=VoK+s0+mNqFazHR+5l3qZrnvw2k/ZBMjEejEmA/NfxURIu2jXfE18zMvILS1TEXlAW3NuM 39qaTQ1s55pnKCCbu7KNDrUeQTp1IHIj3rIk0qnd6GWwWE0a17oTxmzDfS/5YcoWRwjBi0 dZ0bIFp+8tVa5TvAyeSgAxp/CpmeJE8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=q8wnNoLt; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of surenb@google.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688857428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gwBj92eIiUbEa/JYqaOxSr02NhzAqRIjjJvbmPFMSxo=; b=iK3djfYwylf752zTpdxI4g8NiYyJSFeKLKcNfOp7r42bHdrjU+OxuKDsr7r3ABqmhW+9Ec Sxupr13rZbSjDwi5p2HHkQTWLc8GGBbKdxAYqh3m4BjzZHVQvVl1UPd4ESKDzf6U2gDeOQ 0XzVe+WcMbd4PyGp7X9nTdfTyyl/bDU= Received: by mail-yb1-f182.google.com with SMTP id 3f1490d57ef6-c6cad6a3998so2616601276.3 for ; Sat, 08 Jul 2023 16:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688857427; x=1691449427; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gwBj92eIiUbEa/JYqaOxSr02NhzAqRIjjJvbmPFMSxo=; b=q8wnNoLteM9JW+3RGd0GkHIMD3jU1gDl75C+fTaQWt5QAwCs7wlLOiouTZAdrQtK3i a4IrwA/yeP6VXeQxGaWW0JAcdY/lMDOjJd9nMB0Y/fmz4TXWlll59y557Q3hYUceCeRe w9gh7oweFP0r3kMXkm64YZ/B/ggFhCU2n3MzDKImFPTM2g4chapbtSKG9SW6Yhcu0TGK pHtMSE7YGKer6zBijlY+FgA5y64a2UUMOIN6Q/iBXQEQiYp9tqAOEAJQMeLj8kxDQIsa TStlZEZ4ZB19RRJuB0nJ5gv5cUn5jhSqlqKXDZuwiK75dUs2IsgXNeJsAGMXu2QCmD4T QrJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688857427; x=1691449427; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gwBj92eIiUbEa/JYqaOxSr02NhzAqRIjjJvbmPFMSxo=; b=Zt6n7p5REVVtsaIIlLWT5oOT9L7IPJBEgrijHSUPgaMvtaHK8KfHE+QKHkFL1mFZSa 3ay6Xt/9hNrxkPeeY1x9gGVbpC9910wmjNzed3D4oY7ea7Uox6J8VzxavrRRwsdWaaKb rB0aQ/gHSOdSXmrgDMwUsWm4U1UH67ikh4frrE67HzYHKEU6xKMbH16uGgYWjZ++4yZB tqk97wIoUXW/0pQI+jX5QahSNfVkYDjEckajN5G0ijo3KycuYKpRUYhLTd6nORZaNunU 0aT5yIscpwcP5WrwUnxfY9DTJVqccmE+k+8DvOrxAov1hpH5+HDcakS4VXE3TMeDHM75 jWUQ== X-Gm-Message-State: ABy/qLbBbz7zz/YwsmmJY7BVbQOhTKbxDWmdePTJDmxH+taIQDK4n7gk ydcSrjbtkTJouaZqivJTavbITsTrGPiMH9qxdMMMuw== X-Google-Smtp-Source: APBJJlH7KKflCD9n462UEhT/w6BrMfCGoogXc/HwcWmbTfPF29n1OnJ/9HqyJN4tlPo9OHEugHrnxKyQfd4IB9dP7Rw= X-Received: by 2002:a25:ad88:0:b0:c77:abc9:d577 with SMTP id z8-20020a25ad88000000b00c77abc9d577mr1345314ybi.52.1688857427141; Sat, 08 Jul 2023 16:03:47 -0700 (PDT) MIME-Version: 1.0 References: <20230708191212.4147700-1-surenb@google.com> <20230708191212.4147700-3-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Sat, 8 Jul 2023 16:03:35 -0700 Message-ID: Subject: Re: [PATCH v2 3/3] fork: lock VMAs of the parent process when forking To: Linus Torvalds Cc: David Hildenbrand , akpm@linux-foundation.org, regressions@leemhuis.info, bagasdotme@gmail.com, jacobly.alt@gmail.com, willy@infradead.org, liam.howlett@oracle.com, peterx@redhat.com, ldufour@linux.ibm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-arm-kernel@lists.infradead.org, gregkh@linuxfoundation.org, regressions@lists.linux.dev, Jiri Slaby , =?UTF-8?Q?Holger_Hoffst=C3=A4tte?= , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4F87D1C000A X-Stat-Signature: 8fjc36r5estntbncym7gqmbqihhhn6hd X-HE-Tag: 1688857428-48879 X-HE-Meta: U2FsdGVkX19wtQA33okYyjwCvqBGiK/BbO8WyeRRM2Clytstr0DLFtrUFK6/+BiOknERDYS0FZstQsxNx1li8l3krcSy9PGBHhJB8mXo51uTNPbK0r/DfISFcsEJ85kjwPeRnwMHa9rIgJxegXgfyWWpT9JjMJ6IliL7gmdz0JlkeJcapgc1Fz1IGzgHz/mV7gpLKNxhzSzvNGCvlMfiZid9RNKQn8shlB1XlYScQyhBkXJncO6bYimXQBufmPNhBzr8zN+SYd8nLq1kppe4ICEeDmEW7Dwa0/xVywKf6fCs7iOMYAlU/98hHAq7sOP9Gr6dTZGwcRRb9pDoyNXsemIGVr0fgs6bDnRMMdLo0J4l8IQCoVXDQy3nlMklkt6qPhZJjOsaqMhd1BPPGfgMWwj+9isQNmBvS6ROG/yxbQ6bY0ktUb2lADbh7bKMCJHVcszzOtqqWqOOx4WhtbIaBdphRB8GHORHz5I+X8nBz3deGSpODxZT8SGuhKjlRxjiHId7HdLb2ioWqae2/rLm1DSHQ51FJ2yucWx4dty14Do2SInyw4V11XTtb9W/0WvCOFCu9J9jdZvrkROGeKpXHHq5H6LPLoOhZyZs6CLKHNibbUdfwhqoSns7fZndSxcqGiPirXNGYSSSgMINxoYFCoXtxE6oCxmh0ypGiAc8M0tbV4xotR+kS/T27b5pxRt+Eyd0cFgeuVWEdTS9DRI84EComJWKeuFmt2ax8/Q8cwn8DCD66x/kOOoTJ3NKEUqSRH9WtsIaMUvMUu44QOlc2fbjC/mo/WmBpnTWG6pVXzzUK6X7uzQUZc63atVTom2ZOQRwrm0tBlyz1yNAmykSE3OpLEwoCCQxgY/sSSpij0mgdOvOfybWpNiXs8qtz+8U/6Q4ISQ94J9lkRp0OTWzzH8L+1y/6s/Q57rRifsm5WvPMX3h6DxAHafJGngp+kIOMUIy9CMAoRuDlu/eYsL GPJStqeC cIVGFEPTZ72XfJlfy4VYRbP2PW4UigyTV+sGfs4o3uEuKx/GCyonxcii8UNOwD9auAJO3vV/kOow/eQhvLp3T9AqPctWxu11c21OFxbK2idpncwKaqX0Bk3SMPuBLFWrVZiVNwlNQzfUj0MUMaopq+cyUE5C0Ogc4dso5bJuz10Jg325heA8AuqB4G9h4fWattZ6GZ4KDdjfej3hbKbwKPH68t52oEpfJdKjKFgkW9F03iRjSQXc0mZI/CuL6Ki9k9wkcA3KdiQ93Onjkz+SNEGu8MEBgaHVE3B/qo2Zl3bBD6CnG1Ru6VaoM2728SmWzuNf6QHpjldwsgjzMuXT5FGNFsP9pWydyrD7BeNTrEQRt3+E/GF7nJc2DEc/5481EAV+X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jul 8, 2023 at 3:54=E2=80=AFPM Linus Torvalds wrote: > > On Sat, 8 Jul 2023 at 15:36, Suren Baghdasaryan wrote= : > > > > On Sat, Jul 8, 2023 at 2:18=E2=80=AFPM Linus Torvalds > > > > > > Again - maybe I messed up, but it really feels like the missing > > > vma_start_write() was more fundamental, and not some "TLB coherency" > > > issue. > > > > Sounds plausible. I'll try to use the reproducer to verify if that's > > indeed happening here. > > I really don't think that's what people are reporting, I was just > trying to make up a completely different case that has nothing to do > with any TLB issues. > > My real point was simply this one: > > > It's likely there are multiple problematic > > scenarios due to this missing lock though. > > Right. That's my issue. I felt your explanation was *too* targeted at > some TLB non-coherency thing, when I think the problem was actually a > much larger "page faults simply must not happen while we're copying > the page tables because data isn't coherent". > > The anon_vma case was just meant as another random example of the > other kinds of things I suspect can go wrong, because we're simply not > able to do this whole "copy vma while it's being modified by page > faults". > > Now, I agree that the PTE problem is real, and probable the main > thing, ie when we as part of fork() do this: > > /* > * If it's a COW mapping, write protect it both > * in the parent and the child > */ > if (is_cow_mapping(vm_flags) && pte_write(pte)) { > ptep_set_wrprotect(src_mm, addr, src_pte); > pte =3D pte_wrprotect(pte); > } > > and the thing that can go wrong before the TLB flush happens is that - > because the TLB's haven't been flushed yet - some threads in the > parent happily continue to write to the page and didn't see the > wrprotect happening. > > And then you get into the situation where *some* thread see the page > protections change (maybe they had a TLB flush event on that CPU for > random reasons), and they will take a page fault and do the COW thing > and create a new page. > > And all the while *other* threads still see the old writeable TLB > state, and continue to write to the old page. > > So now you have a page that gets its data copied *while* somebody is > still writing to it, and the end result is that some write easily gets > lost, and so when that new copy is installed, you see it as data > corruption. > > And I agree completely that that is probably the thing that most > people actually saw and reacted to as corruption. > > But the reason I didn't like the explanation was that I think this is > just one random example of the more fundamental issue of "we simply > must not take page faults while copying". > > Your explanation made me think "stale TLB is the problem", and *that* > was what I objected to. The stale TLB was just one random sign of the > much larger problem. > > It might even have been the most common symptom, but I think it was > just a *symptom*, not the *cause* of the problem. > > And I must have been bad at explaining that, because David Hildenbrand > also reacted negatively to my change. > > So I'll happily take a patch that adds more commentary about this, and > gives several examples of the things that go wrong. How about adding your example to the original description as yet another scenario which is broken without this change? I guess having both issues described would not hurt. > > Linus