From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D243BEB64DA
	for <linux-mm@archiver.kernel.org>; Sat,  8 Jul 2023 19:06:16 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id EBEC96B0071; Sat,  8 Jul 2023 15:06:15 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E6F646B0072; Sat,  8 Jul 2023 15:06:15 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D37458D0001; Sat,  8 Jul 2023 15:06:15 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id C56126B0071
	for <linux-mm@kvack.org>; Sat,  8 Jul 2023 15:06:15 -0400 (EDT)
Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 5BDD51A01EC
	for <linux-mm@kvack.org>; Sat,  8 Jul 2023 19:06:15 +0000 (UTC)
X-FDA: 80989375110.02.B31EF86
Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51])
	by imf23.hostedemail.com (Postfix) with ESMTP id 474EE14000B
	for <linux-mm@kvack.org>; Sat,  8 Jul 2023 19:06:11 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=linux-foundation.org header.s=google header.b=BBBDWJ8p;
	spf=pass (imf23.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.51 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org;
	dmarc=none
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1688843172;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=o+hqttwHhvpKq8WU0CE6n+csMr8KAQKC/7ZFjSW0qHs=;
	b=AW20dopjxrZM1geT5FxYlljg0TKE0JUyMEGlM3+xlX2Gyj2AO9ib0Wx9pke5T4sguNbF6V
	ojTxEuU8wW3/UPpffIvompdOdC5MaFmmqr03Gkyy9/kPBaSUVCGL6c0uN+GBtTpldVowe8
	0EuzZemgWSnQWGq7Sb0d+4je5jxllNM=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688843172; a=rsa-sha256;
	cv=none;
	b=DNyL5h3fsppeJzjGna6ElMMyCweiekqLHLvsmTK4SrsqXy7U/VkBD3HzWFIzWxp1dOW442
	47u+qEC0pLyvC51H0OsI0n04jiR0tcqwkVYyuQJVotRGC3XeSkGX0a24AKmawFqBMOU8dT
	HF9A9S5KZ2IVc1Q01np4zoa9H2Vp2K0=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=linux-foundation.org header.s=google header.b=BBBDWJ8p;
	spf=pass (imf23.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.51 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org;
	dmarc=none
Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-992b66e5affso381424366b.3
        for <linux-mm@kvack.org>; Sat, 08 Jul 2023 12:06:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linux-foundation.org; s=google; t=1688843170; x=1691435170;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=o+hqttwHhvpKq8WU0CE6n+csMr8KAQKC/7ZFjSW0qHs=;
        b=BBBDWJ8pR7idPI3iYhZgBOEEIpLZWv8w5xVSH76wb3YlQvLIvZAIadsvS0Zj8cAUtT
         PTGb2Se41jaiSNzoMH/2ms9eq4kcnS13IFgmciXpyhH2k+LVRlNN6OC5AcpFTdKVPGB5
         wUtGDrkBN7J+SDFp/wRdlUg1hXapk6UaSo5Js=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1688843170; x=1691435170;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=o+hqttwHhvpKq8WU0CE6n+csMr8KAQKC/7ZFjSW0qHs=;
        b=Kc3wzRuD9rxJ7SCTpbud80VVLNA8FdhnUbgX3fL3iV9eG1JoQ4nHlbvSJgJ1omrASc
         vxJfNWrCS1p3fZUMaHdkcWXgippcNv6fANJQzZvJflSenhfuyqurcydUSeXqjkFbZp1M
         hYRcRDuLpH7yjxuo4gPe7oOVtVif4+XucZqtJnYCOeCHDuGecqILUXBiSyEhr0Y/m8HN
         P3jEuDZ7DaPb+T6gwDAm+1YKutsQl+egBP4xUH9RaIqN3LNU5+XGotw5KmmtJfNQHKiV
         4WtRFhR0w1m3f2tu9+4/goWMMQgsZMqV+cFa6opvdEhA/rciJlNc4KAqGpfe+FwEBAUW
         9yXw==
X-Gm-Message-State: ABy/qLY9jXHpaHXT2h5CJzc1yumPHiz1+P3t9xPik4V6l9MwPi9kt4rE
	l6Bvy3YwCyk5/Og2J0u4G9TQA+6tHLz7+uC8tD1MCWWj
X-Google-Smtp-Source: APBJJlEE50omakcls79KewwIVl0EE6xKLyhgbgueikW9CG2V6+sQMd6kNFFk+XA9moNx2gGeWHcp9w==
X-Received: by 2002:a17:906:11:b0:98d:3491:68da with SMTP id 17-20020a170906001100b0098d349168damr7396807eja.44.1688843170291;
        Sat, 08 Jul 2023 12:06:10 -0700 (PDT)
Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com. [209.85.208.47])
        by smtp.gmail.com with ESMTPSA id cf8-20020a170906b2c800b009934707378fsm3810526ejb.87.2023.07.08.12.06.09
        for <linux-mm@kvack.org>
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Sat, 08 Jul 2023 12:06:09 -0700 (PDT)
Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-51e28b299adso4145052a12.2
        for <linux-mm@kvack.org>; Sat, 08 Jul 2023 12:06:09 -0700 (PDT)
X-Received: by 2002:aa7:c1d4:0:b0:51d:d622:713d with SMTP id
 d20-20020aa7c1d4000000b0051dd622713dmr5404706edp.39.1688843168772; Sat, 08
 Jul 2023 12:06:08 -0700 (PDT)
MIME-Version: 1.0
References: <facbfec3-837a-51ed-85fa-31021c17d6ef@gmail.com>
 <5c7455db-4ed8-b54f-e2d5-d2811908123d@leemhuis.info> <CAJuCfpH7BOBYGEG=op09bZrh1x3WA8HMcGBXXRhe6M5RJaen5A@mail.gmail.com>
 <CAJuCfpH7t7gCV2FkctzG2eWTUVTFZD7CtD14-WuHqBqOYBo1jA@mail.gmail.com>
 <2023070359-evasive-regroup-f3b8@gregkh> <CAJuCfpF=XPpPYqp2Y1Vu-GUL=RBj4fyhXoXzjBY4EKtBnYE_eQ@mail.gmail.com>
 <2023070453-plod-swipe-cfbf@gregkh> <20230704091808.aa2ed3c11a5351d9bf217ac9@linux-foundation.org>
 <CAJuCfpE_WjRQoDT1XnvBghCH-kpqk+pfcBJGyDnK7DZLMVG5Mw@mail.gmail.com>
 <2023070509-undertow-pulverize-5adc@gregkh> <7668c45a-70b1-dc2f-d0f5-c0e76ec17145@leemhuis.info>
 <20230705084906.22eee41e6e72da588fce5a48@linux-foundation.org>
 <df1d7d39-56f3-699c-0d0f-fcc8774f182e@leemhuis.info> <CAHk-=whKd05V49AbZGF=inYmhU6H_yNvvw1grWyhQfQ=9+5-VQ@mail.gmail.com>
 <20230708103936.4f6655cd0d8e8a0478509e25@linux-foundation.org>
 <CAHk-=wh498i3s+BgOF=pUOF=Qe_A0A16-mFcH2YGy+iZXvNChQ@mail.gmail.com> <CAJuCfpFLc1yoZm9uqRcmcwtFNGHYKyjxrc71tzXennpGB7QbYQ@mail.gmail.com>
In-Reply-To: <CAJuCfpFLc1yoZm9uqRcmcwtFNGHYKyjxrc71tzXennpGB7QbYQ@mail.gmail.com>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 8 Jul 2023 12:05:51 -0700
X-Gmail-Original-Message-ID: <CAHk-=wi-99-DyMOGywTbjRnRRC+XfpPm=r=pei4A=MEL0QDBXA@mail.gmail.com>
Message-ID: <CAHk-=wi-99-DyMOGywTbjRnRRC+XfpPm=r=pei4A=MEL0QDBXA@mail.gmail.com>
Subject: Re: Fwd: Memory corruption in multithreaded user space program while
 calling fork
To: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, 
	Thorsten Leemhuis <regressions@leemhuis.info>, Bagas Sanjaya <bagasdotme@gmail.com>, 
	Jacob Young <jacobly.alt@gmail.com>, Laurent Dufour <ldufour@linux.ibm.com>, 
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux Memory Management <linux-mm@kvack.org>, 
	Linux PowerPC <linuxppc-dev@lists.ozlabs.org>, 
	Linux ARM <linux-arm-kernel@lists.infradead.org>, Greg KH <gregkh@linuxfoundation.org>, 
	Linux regressions mailing list <regressions@lists.linux.dev>
Content-Type: text/plain; charset="UTF-8"
X-Stat-Signature: zj3yfxmcypjms6t5s7wcea1h9ohsbkmc
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: 474EE14000B
X-Rspam-User: 
X-HE-Tag: 1688843171-683494
X-HE-Meta: U2FsdGVkX18ghb1nL6E9CZAi8t4D0/FYZ4d983N4y8zzpWsC6i74yDRLCcuF2iTcht0FEeGvhbDil/W7khhYBwWiR8yy5vF5tYivNZ7L/PZI1HZsKoLRYun/ETrNQkvGZNPMHmg4IqEpFKH/IRrIQWaFBJ3Cc/mZOtgzUme0a0YCxH74pzfNothmZRQ8qQZRWHLlA/5JAVynEKP55kcDxyxhrYmMMZMDtp7fLrpc5BOjYHlK911d+6HNtivYeO0JYJF4qOK60lJhUR0HSjUB83vBQabQUQnEKLcLlaTVa/tJaBn5Uc3xf1nz8RaNxNZLYWCgkZt9Ui7keLAr5pP5a/Ru0IiXC4DU6XsKn1iKYMINNFnt1g7EyoDrAINkgzRH/RQo/78d+YtAYz7mhmdcRmcVLDT+uZ54Du6Tqc2xuyYggLq1/P1V5i4vTpG4l7LbwHUj3En1fprUKrzk2/UTfDeL9JjlJ0ndnKZk1JXjX/absTt5xG6Rg93CkE3fpB4G6oFGaJ4Pc0rwctS3fFj2mfcrjD1WqBZrBFX5Iveu+joGxyMRxpe0Ho9r9ke7+ypavJtrtoChcv+ak8ZFVpYGJmuh4DfsXUdEeVdjYZ8ikvcvizVCFrp2py0kTpj6NnUqDzF55v8pW0AT6+kalqgAinR9fMw4FmzunrGxjsEQF95fVbgBAK1s28GgjH0JcHI7aRhKq7414epujMkvtPcM9pLPLziTKSOjaF1AkVhyjBpncATsZw2jEN8mDBk3d3QcgaRcvII8P5r7IS6bEaqZMXfkv5kgS+c7JXbx3OeCpj1ZHE+nN9yQmfMmIzQSecWQF/Wo3QO/wqvBIsJpCtYA7FhTZxU5XySxhTYuFdR+gQgWD2S3ff6pe3t12CjI6HjXYiQPC/U/qk41Bq8hGFKiKyUPrRnumMeBBcv2g3ZRKfk40YdtUO14ETOCxx0+8xzsr6S3hW9bZzTvaf8R6Op
 Mmgoo2GC
 SzC6werdOixJCGYyyv9GSGOXhsyXcJ98FOaLj/aHNRRs0lwOvo2CjU+MbPgJ1Z8uDeBZ+tl+tjlpSmkkyCK2kGuyNp/zCCxab3TswyuvSyJjFkwMpH0rDMqvvLYbC6Q6bG0kWvjj+YCXDCCE6jYazCWjli++9uV9hyDCNXSRFoRPGLixNqNMdv4c0uibFLtM/gFkyR6lPQh57hMINPkSOw0ysH1FfQw1xeVuWiIIDUkOUcM5s1bv9BxkQRcRV8w7DNBRkBGifNFWQUtv4g2hvfxBDl4V/oBSQ+D/Twdbch1VRN7cgkUD3J516mSNMi3t5TUSoibndoe7GtsVhMNuil3Qw7KO8YBAwtREmkiKkEDW5w7g45q+1DGTj3mXZkDWt6VG4nnahPnKYKjfoIVVDAcJ/cBjAPfcCz/L730qqq9/mjOxgIv8FQoikYlf5klTaFtrlDZegdvOSumiT+x2J0BheBXGXP5kh3b4UebxiLi9V+a+gW1231AfGpM/K0G4pNR4oUkyCJhjaAtGw2YLHUVwdIHDhrvIJlAwb0JhF+lptNOUzrTYG3ESi8n6hsVFDo0LTcTIZ5os7C2giABUvHSVdHVIwjJ4d59q3/GVJNLnxcYmZd1NKTc5izUCEtg/DBgsnb2/abNHBlhm5Sm5Rl3LdLA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Sat, 8 Jul 2023 at 11:40, Suren Baghdasaryan <surenb@google.com> wrote:
>
> My understanding was that flush_cache_dup_mm() is there to ensure
> nothing is in the cache, so locking VMAs before doing that would
> ensure that no page faults would pollute the caches after we flushed
> them. Is that reasoning incorrect?

It is indeed incorrect.

The VIVT caches are fundamentally broken, and we have various random
hacks for them to make them work in legacy situations.

And that flush_cache_dup_mm() is exactly that: a band-aid to make sure
that when we do a fork(), any previous writes that are dirty in the
caches will have made it to memory, so that they will show up in the
*new* process that has a different virtual mapping.

BUT!

This has nothing to do with page faults, or other threads.

If you have a threaded application that does fork(), it can - and will
- dirty the VIVT caches *during* the fork, and so the whole
"flush_cache_dup_mm()" is completely and fundamentally race wrt any
*new* activity.

It's not even what it is trying to deal with. All it tries to do is to
make sure that the newly forked child AT LEAST sees all the changes
that the parent did up to the point of the fork. Anything after that
is simply not relevant at all.

So think of all this not as some kind of absolute synchronization and
cache coherency (because you will never get that on a VIVT
architecture anyway), but as a "for the simple cases, this will at
least get you the expected behavior".

But as mentioned, for the issue of PER_VMA_LOCK, this is all *doubly*
irrelevant. Not only was it not relevant to begin with (ie that cache
flush only synchronizes parent -> child, not other-threads -> child),
but VIVT caches don't even exist on any relevant architecture because
they are fundamentally broken in so many other ways.

So all our "synchronize caches by hand" is literally just band-aid for
legacy architectures. I think it's mostly things like the old broken
MIPS chips, some sparc32, pa-risc: the "old RISC" stuff, where people
simplified the hardware a bit too much.

VIVT is lovely for hardware people becasue they get a shortcut. But
it's "lovely" in the same way that "PI=3" is lovely. It's simpler -
but it's _wrong_.

And it's almost entirely useless if you ever do SMP. I guarantee we
have tons of races with it for very fundamental reasons - the problems
it causes for software are not fixable, they are "hidable for the
simple case".

So you'll also find things like dcache_page_flush(), which flushes
writes to a page to memory. And exactly like the fork() case, it's
*not* real cache coherency, and it's *not* some kind of true global
serialization.

It's used in cases where we have a particular user that wants the
changes *it* made to be made visible. And exactly like
flush_cache_dup_mm(), it cannot deal with concurrent changes that
other threads make.

> Ok, I think these two are non-controversial:
> https://lkml.kernel.org/r/20230707043211.3682710-1-surenb@google.com
> https://lkml.kernel.org/r/20230707043211.3682710-2-surenb@google.com

These look sane to me. I wonder if the vma_start_write() should have
been somewhere else, but at least it makes sense in context, even if I
get the feeling that maybe it should have been done in some helper
earlier.

As it is, we randomly do it in other helpers like vm_flags_set(), and
I've often had the reaction that these vma_start_write() calls are
randomly sprinked around without any clear _design_ for where they
are.

> and the question now is how we fix the fork() case:
> https://lore.kernel.org/all/20230706011400.2949242-2-surenb@google.com/
> (if my above explanation makes sense to you)

See above. That patch is nonsensical. Trying to order
flush_cache_dup_mm() is not about page faults, and is fundamentally
not doable with threads anyway.

> https://lore.kernel.org/all/20230705063711.2670599-2-surenb@google.com/

This is the one that makes sense to me.

               Linus