From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 410CBEB64DC
	for <linux-mm@archiver.kernel.org>; Sat,  8 Jul 2023 19:17:25 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id C22E46B0071; Sat,  8 Jul 2023 15:17:24 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BACAD6B0072; Sat,  8 Jul 2023 15:17:24 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id A25BE8D0001; Sat,  8 Jul 2023 15:17:24 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 8A0696B0071
	for <linux-mm@kvack.org>; Sat,  8 Jul 2023 15:17:24 -0400 (EDT)
Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 5384DAF8C6
	for <linux-mm@kvack.org>; Sat,  8 Jul 2023 19:17:24 +0000 (UTC)
X-FDA: 80989403208.02.F2AA158
Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169])
	by imf05.hostedemail.com (Postfix) with ESMTP id 8BBF9100017
	for <linux-mm@kvack.org>; Sat,  8 Jul 2023 19:17:22 +0000 (UTC)
Authentication-Results: imf05.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=kgAq9llG;
	spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1688843842;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=X3LY7dYv5k15re4a/Llihk/Zb4iZk7kWPwhcuGYZHVQ=;
	b=4qYQF5ARSHpMz8vMo7CmvYmM1ZuDZezQrT34MoBi4FSa7EcltZw+sht/3xpyprDT6DzPcW
	BNa4Z7athFCMThwONKmmJ+Y081ezKYMCdqLS8u7A70B0oQBhYB/x77Q4zwpy1Kq5Cffy7M
	jMp55d9X1JpsmKGOoZnm4FgsnoGswkU=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688843842; a=rsa-sha256;
	cv=none;
	b=Kg37HOlLFVBNWQ3Lzpo+CUhNRO/EEAEHQi5nGcXOA/QH/0Wx9Rqx4a4+UHqOjc3Wv8YR1a
	aHuuuwccDki52+2SxycHbc2droAVP2H3W/iRSCB4etIzkdhreeC2D+7YSHHtcF6r2WZstS
	Bqtqlx72kOt0AnpOWcOp3sLrwUQNeLI=
ARC-Authentication-Results: i=1;
	imf05.hostedemail.com;
	dkim=pass header.d=google.com header.s=20221208 header.b=kgAq9llG;
	spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com;
	dmarc=pass (policy=reject) header.from=google.com
Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-5776312eaddso37707577b3.3
        for <linux-mm@kvack.org>; Sat, 08 Jul 2023 12:17:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20221208; t=1688843841; x=1691435841;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=X3LY7dYv5k15re4a/Llihk/Zb4iZk7kWPwhcuGYZHVQ=;
        b=kgAq9llGN6XA4R8/ZW4mQm2krbD3RuAy2NnKl17unmj7xKtTK7WE4GTVtAk3UgX4rx
         JIG25ZBE7XRe3DlgpSFq5q9k/0GDndch6DFxZVhjJG0ZVwZLuFxqKN0c+g9rOlS/DYu5
         /wZktXXImobRHfRGb7J5KNFmvLMpLeqOuNc+/jdLE6slEaEdCurnVeo76cqIesypaV+S
         4hD2bmMpab0v/qupOogGUVk5F93K4Ymh3Zt14ynvl83FF0NtppCEhd7H0dMIDRi1Fdym
         zE0bcwrcGEhT5CnFGfvW3xgsdM1Wb2BDdYEo1tc6VgVFX41XBrz6vVX/XgpKwLBOgIVF
         ZHyg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1688843841; x=1691435841;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=X3LY7dYv5k15re4a/Llihk/Zb4iZk7kWPwhcuGYZHVQ=;
        b=DKf+iic4v8kz/Q6NmTFxga97bvuHe6XOFdPMOAvovlRfGewSpnUwFLUd8LVeO4iQta
         RCBKevrj8SsPG10pEhQpnCMvCr6RG8iTC/qLDG8XK8SXUMCJrAFS/tJEsbmGkfLw7GoZ
         95tOc+u5Iogje6cvQePitoNlTbFg+Cw4zgrEe07B7q2LcmBZzLvXJg9nj++fb5D/3gso
         PJS9YD6ZbtEDgf+Ja+NjLtbfoPmwBgZ2kVDliL+MPYeC4X5kHba8tRQemEJG2T4N1T77
         j6p079SuGF/CCWZo3k2SMgbSdYUfuL/h+1NQoUAkB787Y2XKdUxICUNVD099O/fhP0+z
         aD7A==
X-Gm-Message-State: ABy/qLaGjtqMOo5fPN96T9fTJtU4hPEhw7qbQd7UVLIRCtGmR1VW4WyH
	7+DWqTz6UyZQqI/oj6s9nGDV5w1WzbunkDAXF/z51Q==
X-Google-Smtp-Source: APBJJlEoHrHoz6BmG+RC4l0pnykLqH5FaPicQh57ySeeIgGtPEduQgxdcLv7yLiT5UnAbSmnjSy6P3MEwJytlPhMWnU=
X-Received: by 2002:a81:4a0b:0:b0:573:98a3:f01a with SMTP id
 x11-20020a814a0b000000b0057398a3f01amr9036621ywa.40.1688843841416; Sat, 08
 Jul 2023 12:17:21 -0700 (PDT)
MIME-Version: 1.0
References: <facbfec3-837a-51ed-85fa-31021c17d6ef@gmail.com>
 <5c7455db-4ed8-b54f-e2d5-d2811908123d@leemhuis.info> <CAJuCfpH7BOBYGEG=op09bZrh1x3WA8HMcGBXXRhe6M5RJaen5A@mail.gmail.com>
 <CAJuCfpH7t7gCV2FkctzG2eWTUVTFZD7CtD14-WuHqBqOYBo1jA@mail.gmail.com>
 <2023070359-evasive-regroup-f3b8@gregkh> <CAJuCfpF=XPpPYqp2Y1Vu-GUL=RBj4fyhXoXzjBY4EKtBnYE_eQ@mail.gmail.com>
 <2023070453-plod-swipe-cfbf@gregkh> <20230704091808.aa2ed3c11a5351d9bf217ac9@linux-foundation.org>
 <CAJuCfpE_WjRQoDT1XnvBghCH-kpqk+pfcBJGyDnK7DZLMVG5Mw@mail.gmail.com>
 <2023070509-undertow-pulverize-5adc@gregkh> <7668c45a-70b1-dc2f-d0f5-c0e76ec17145@leemhuis.info>
 <20230705084906.22eee41e6e72da588fce5a48@linux-foundation.org>
 <df1d7d39-56f3-699c-0d0f-fcc8774f182e@leemhuis.info> <CAHk-=whKd05V49AbZGF=inYmhU6H_yNvvw1grWyhQfQ=9+5-VQ@mail.gmail.com>
 <20230708103936.4f6655cd0d8e8a0478509e25@linux-foundation.org>
 <CAHk-=wh498i3s+BgOF=pUOF=Qe_A0A16-mFcH2YGy+iZXvNChQ@mail.gmail.com>
 <CAJuCfpFLc1yoZm9uqRcmcwtFNGHYKyjxrc71tzXennpGB7QbYQ@mail.gmail.com> <CAHk-=wi-99-DyMOGywTbjRnRRC+XfpPm=r=pei4A=MEL0QDBXA@mail.gmail.com>
In-Reply-To: <CAHk-=wi-99-DyMOGywTbjRnRRC+XfpPm=r=pei4A=MEL0QDBXA@mail.gmail.com>
From: Suren Baghdasaryan <surenb@google.com>
Date: Sat, 8 Jul 2023 12:17:10 -0700
Message-ID: <CAJuCfpHszCAc5hDdsxry+1xh3kz+=jsYdBCXKQez-Th9LESSZA@mail.gmail.com>
Subject: Re: Fwd: Memory corruption in multithreaded user space program while
 calling fork
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, 
	Thorsten Leemhuis <regressions@leemhuis.info>, Bagas Sanjaya <bagasdotme@gmail.com>, 
	Jacob Young <jacobly.alt@gmail.com>, Laurent Dufour <ldufour@linux.ibm.com>, 
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Linux Memory Management <linux-mm@kvack.org>, 
	Linux PowerPC <linuxppc-dev@lists.ozlabs.org>, 
	Linux ARM <linux-arm-kernel@lists.infradead.org>, Greg KH <gregkh@linuxfoundation.org>, 
	Linux regressions mailing list <regressions@lists.linux.dev>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 8BBF9100017
X-Rspam-User: 
X-Stat-Signature: 156owyorbqohgjrjrcjo671aqsax148t
X-Rspamd-Server: rspam03
X-HE-Tag: 1688843842-833526
X-HE-Meta: U2FsdGVkX19eoxVbglyLNjcrIPZwWfjtQ/Go7PChSonrTO8eaJsSpvl4tw2lxZcDDsedH4wjrFvPV1vCVeCyCm2727f7ggSQKMrWyvX2t6xTter619hoC/tjsVDndCgSq6+623HrYLQA9Q+Xab+FZQ2Mr9pVGFkU2YZfWxRdJQ5HOjBtR1wwyYh1UgdyQpUezYtnqtiUyrSHsz7OR/6UknpNF9pFU8fT/QXHr+J/FO4OkhPuPA4v7m8IsdU3h5/IMMzUy3rPJzX/17+3hKrSg8ecKgR6O76m859sWvJW+Lslc8sC1P3BSMWqqyhH7KfkADdDBCkzsxxz7B1CL8JIu8TWvDq6BR9fB50ZYyVgrnyqNXRKTAGEp3q/PK1nG3IKXTJwfLR+voEccCIuaHgIy/QJ4AkwgwmvBd5sU/+eiVd6GIYYZyGNn2Bd6ylIfERJerak9cBkO5xrazjOctt89vhKiXq9iaQskx37rJ7oVc/bPHy2uleQjRdArjzYXGOVhOX+SwP7fXIq4FyAggGygRUZXDf+OSHCxe4GDDSKkRmepyH+bhxlD1dN7KEMYMowY1vowRyG/S6DVeM6LhCQykNN6Td0QrYvxTslhtnR+uZ3R4cA1nY6van3ZHoWTzggdtf9NPpIK2diD+fwwSuUXRBO9sIcK9d/YuT3QwO4KRCRsPVur3a71NqT47QUdTiKbf2ie1MXK657p2IzwqPAKHzVYcGViGMeH+T5TOdtkl4r5oqq7q2i2FLTGGnTmnPvkV4eebW+/tODssGUM1o9hR7P1qdiYcWftGagTTLfEPNn708K+vDHPRogBUA2HbbyLTPIw0kKg7NIfSrPHOXwpYdhCqu48its+PEBQbDlDp1B0Og3C0+C5ngg1TJ+FL5VcDfTNQATI2/Lf0ZkLj0097wTmFVeG54kCp4Kl4VFWloPWwML5VjKElP8pI/uHgud6Fz9NRZyOcPLBlmgMsz
 h3F/74Gi
 +YQ7MNizzFOcSFiPVZrEO0pFzW8JxjCHhSCPidRiUIlzVqN5PnJuMN4wdHvOwPW/h4lN0QMIGRcNN/BlLz/cCEJh9dKDFAKAwvCrmhXHrGC9J3N4fK22INz1P0HV0K2vu5cpR0FRMeK4Ifp1m1ZXRRv5vbPv61zc2fKcf2v+LwTZy/4JJ5ZhMrQEUKol8GAj1fPQ/X3V25uNmWzIw6UqlST0VDCp88rF/ejNlepeJfSS/ODdPiWMrgqMiDY0DC65r2Y4DRopC/++uyPeucKaS99FuzUzUKq1Xh1cf1ljs1m2HMMA0iPHpGFWagDd+QXSmLe2x9ED2vCUtnw0H6kKNWwDqED6Fy0ujcf8q7HA6J+bl1u0v+2abvsjVO4i8fdzcZzkq91FBI7pvF51aYvEITxdCzcFyawxpWimgqagPE7O4h21uFwIhEI6GoA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Sat, Jul 8, 2023 at 12:06=E2=80=AFPM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sat, 8 Jul 2023 at 11:40, Suren Baghdasaryan <surenb@google.com> wrote=
:
> >
> > My understanding was that flush_cache_dup_mm() is there to ensure
> > nothing is in the cache, so locking VMAs before doing that would
> > ensure that no page faults would pollute the caches after we flushed
> > them. Is that reasoning incorrect?
>
> It is indeed incorrect.
>
> The VIVT caches are fundamentally broken, and we have various random
> hacks for them to make them work in legacy situations.
>
> And that flush_cache_dup_mm() is exactly that: a band-aid to make sure
> that when we do a fork(), any previous writes that are dirty in the
> caches will have made it to memory, so that they will show up in the
> *new* process that has a different virtual mapping.
>
> BUT!
>
> This has nothing to do with page faults, or other threads.
>
> If you have a threaded application that does fork(), it can - and will
> - dirty the VIVT caches *during* the fork, and so the whole
> "flush_cache_dup_mm()" is completely and fundamentally race wrt any
> *new* activity.
>
> It's not even what it is trying to deal with. All it tries to do is to
> make sure that the newly forked child AT LEAST sees all the changes
> that the parent did up to the point of the fork. Anything after that
> is simply not relevant at all.
>
> So think of all this not as some kind of absolute synchronization and
> cache coherency (because you will never get that on a VIVT
> architecture anyway), but as a "for the simple cases, this will at
> least get you the expected behavior".
>
> But as mentioned, for the issue of PER_VMA_LOCK, this is all *doubly*
> irrelevant. Not only was it not relevant to begin with (ie that cache
> flush only synchronizes parent -> child, not other-threads -> child),
> but VIVT caches don't even exist on any relevant architecture because
> they are fundamentally broken in so many other ways.
>
> So all our "synchronize caches by hand" is literally just band-aid for
> legacy architectures. I think it's mostly things like the old broken
> MIPS chips, some sparc32, pa-risc: the "old RISC" stuff, where people
> simplified the hardware a bit too much.
>
> VIVT is lovely for hardware people becasue they get a shortcut. But
> it's "lovely" in the same way that "PI=3D3" is lovely. It's simpler -
> but it's _wrong_.
>
> And it's almost entirely useless if you ever do SMP. I guarantee we
> have tons of races with it for very fundamental reasons - the problems
> it causes for software are not fixable, they are "hidable for the
> simple case".
>
> So you'll also find things like dcache_page_flush(), which flushes
> writes to a page to memory. And exactly like the fork() case, it's
> *not* real cache coherency, and it's *not* some kind of true global
> serialization.
>
> It's used in cases where we have a particular user that wants the
> changes *it* made to be made visible. And exactly like
> flush_cache_dup_mm(), it cannot deal with concurrent changes that
> other threads make.

Thanks for the explanation! It's quite educational.

>
> > Ok, I think these two are non-controversial:
> > https://lkml.kernel.org/r/20230707043211.3682710-1-surenb@google.com
> > https://lkml.kernel.org/r/20230707043211.3682710-2-surenb@google.com
>
> These look sane to me. I wonder if the vma_start_write() should have
> been somewhere else, but at least it makes sense in context, even if I
> get the feeling that maybe it should have been done in some helper
> earlier.
>
> As it is, we randomly do it in other helpers like vm_flags_set(), and
> I've often had the reaction that these vma_start_write() calls are
> randomly sprinked around without any clear _design_ for where they
> are.

We write-lock a VMA before any modification. I tried to provide
explanations for each such locking in my comments/patch descriptions
but I guess I haven't done a good job at that...

>
> > and the question now is how we fix the fork() case:
> > https://lore.kernel.org/all/20230706011400.2949242-2-surenb@google.com/
> > (if my above explanation makes sense to you)
>
> See above. That patch is nonsensical. Trying to order
> flush_cache_dup_mm() is not about page faults, and is fundamentally
> not doable with threads anyway.
>
> > https://lore.kernel.org/all/20230705063711.2670599-2-surenb@google.com/
>
> This is the one that makes sense to me.

Ok, I sent you 3-patch series with the fixes here:
https://lore.kernel.org/all/20230708191212.4147700-1-surenb@google.com/
Do you want me to disable per-VMA locks by default as well?


>
>                Linus