From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 410CBEB64DC for ; Sat, 8 Jul 2023 19:17:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C22E46B0071; Sat, 8 Jul 2023 15:17:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BACAD6B0072; Sat, 8 Jul 2023 15:17:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A25BE8D0001; Sat, 8 Jul 2023 15:17:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8A0696B0071 for ; Sat, 8 Jul 2023 15:17:24 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5384DAF8C6 for ; Sat, 8 Jul 2023 19:17:24 +0000 (UTC) X-FDA: 80989403208.02.F2AA158 Received: from mail-yw1-f169.google.com (mail-yw1-f169.google.com [209.85.128.169]) by imf05.hostedemail.com (Postfix) with ESMTP id 8BBF9100017 for ; Sat, 8 Jul 2023 19:17:22 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=kgAq9llG; spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688843842; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X3LY7dYv5k15re4a/Llihk/Zb4iZk7kWPwhcuGYZHVQ=; b=4qYQF5ARSHpMz8vMo7CmvYmM1ZuDZezQrT34MoBi4FSa7EcltZw+sht/3xpyprDT6DzPcW BNa4Z7athFCMThwONKmmJ+Y081ezKYMCdqLS8u7A70B0oQBhYB/x77Q4zwpy1Kq5Cffy7M jMp55d9X1JpsmKGOoZnm4FgsnoGswkU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688843842; a=rsa-sha256; cv=none; b=Kg37HOlLFVBNWQ3Lzpo+CUhNRO/EEAEHQi5nGcXOA/QH/0Wx9Rqx4a4+UHqOjc3Wv8YR1a aHuuuwccDki52+2SxycHbc2droAVP2H3W/iRSCB4etIzkdhreeC2D+7YSHHtcF6r2WZstS Bqtqlx72kOt0AnpOWcOp3sLrwUQNeLI= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=kgAq9llG; spf=pass (imf05.hostedemail.com: domain of surenb@google.com designates 209.85.128.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f169.google.com with SMTP id 00721157ae682-5776312eaddso37707577b3.3 for ; Sat, 08 Jul 2023 12:17:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688843841; x=1691435841; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=X3LY7dYv5k15re4a/Llihk/Zb4iZk7kWPwhcuGYZHVQ=; b=kgAq9llGN6XA4R8/ZW4mQm2krbD3RuAy2NnKl17unmj7xKtTK7WE4GTVtAk3UgX4rx JIG25ZBE7XRe3DlgpSFq5q9k/0GDndch6DFxZVhjJG0ZVwZLuFxqKN0c+g9rOlS/DYu5 /wZktXXImobRHfRGb7J5KNFmvLMpLeqOuNc+/jdLE6slEaEdCurnVeo76cqIesypaV+S 4hD2bmMpab0v/qupOogGUVk5F93K4Ymh3Zt14ynvl83FF0NtppCEhd7H0dMIDRi1Fdym zE0bcwrcGEhT5CnFGfvW3xgsdM1Wb2BDdYEo1tc6VgVFX41XBrz6vVX/XgpKwLBOgIVF ZHyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688843841; x=1691435841; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X3LY7dYv5k15re4a/Llihk/Zb4iZk7kWPwhcuGYZHVQ=; b=DKf+iic4v8kz/Q6NmTFxga97bvuHe6XOFdPMOAvovlRfGewSpnUwFLUd8LVeO4iQta RCBKevrj8SsPG10pEhQpnCMvCr6RG8iTC/qLDG8XK8SXUMCJrAFS/tJEsbmGkfLw7GoZ 95tOc+u5Iogje6cvQePitoNlTbFg+Cw4zgrEe07B7q2LcmBZzLvXJg9nj++fb5D/3gso PJS9YD6ZbtEDgf+Ja+NjLtbfoPmwBgZ2kVDliL+MPYeC4X5kHba8tRQemEJG2T4N1T77 j6p079SuGF/CCWZo3k2SMgbSdYUfuL/h+1NQoUAkB787Y2XKdUxICUNVD099O/fhP0+z aD7A== X-Gm-Message-State: ABy/qLaGjtqMOo5fPN96T9fTJtU4hPEhw7qbQd7UVLIRCtGmR1VW4WyH 7+DWqTz6UyZQqI/oj6s9nGDV5w1WzbunkDAXF/z51Q== X-Google-Smtp-Source: APBJJlEoHrHoz6BmG+RC4l0pnykLqH5FaPicQh57ySeeIgGtPEduQgxdcLv7yLiT5UnAbSmnjSy6P3MEwJytlPhMWnU= X-Received: by 2002:a81:4a0b:0:b0:573:98a3:f01a with SMTP id x11-20020a814a0b000000b0057398a3f01amr9036621ywa.40.1688843841416; Sat, 08 Jul 2023 12:17:21 -0700 (PDT) MIME-Version: 1.0 References: <5c7455db-4ed8-b54f-e2d5-d2811908123d@leemhuis.info> <2023070359-evasive-regroup-f3b8@gregkh> <2023070453-plod-swipe-cfbf@gregkh> <20230704091808.aa2ed3c11a5351d9bf217ac9@linux-foundation.org> <2023070509-undertow-pulverize-5adc@gregkh> <7668c45a-70b1-dc2f-d0f5-c0e76ec17145@leemhuis.info> <20230705084906.22eee41e6e72da588fce5a48@linux-foundation.org> <20230708103936.4f6655cd0d8e8a0478509e25@linux-foundation.org> In-Reply-To: From: Suren Baghdasaryan Date: Sat, 8 Jul 2023 12:17:10 -0700 Message-ID: Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork To: Linus Torvalds Cc: Andrew Morton , Thorsten Leemhuis , Bagas Sanjaya , Jacob Young , Laurent Dufour , Linux Kernel Mailing List , Linux Memory Management , Linux PowerPC , Linux ARM , Greg KH , Linux regressions mailing list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8BBF9100017 X-Rspam-User: X-Stat-Signature: 156owyorbqohgjrjrcjo671aqsax148t X-Rspamd-Server: rspam03 X-HE-Tag: 1688843842-833526 X-HE-Meta: U2FsdGVkX19eoxVbglyLNjcrIPZwWfjtQ/Go7PChSonrTO8eaJsSpvl4tw2lxZcDDsedH4wjrFvPV1vCVeCyCm2727f7ggSQKMrWyvX2t6xTter619hoC/tjsVDndCgSq6+623HrYLQA9Q+Xab+FZQ2Mr9pVGFkU2YZfWxRdJQ5HOjBtR1wwyYh1UgdyQpUezYtnqtiUyrSHsz7OR/6UknpNF9pFU8fT/QXHr+J/FO4OkhPuPA4v7m8IsdU3h5/IMMzUy3rPJzX/17+3hKrSg8ecKgR6O76m859sWvJW+Lslc8sC1P3BSMWqqyhH7KfkADdDBCkzsxxz7B1CL8JIu8TWvDq6BR9fB50ZYyVgrnyqNXRKTAGEp3q/PK1nG3IKXTJwfLR+voEccCIuaHgIy/QJ4AkwgwmvBd5sU/+eiVd6GIYYZyGNn2Bd6ylIfERJerak9cBkO5xrazjOctt89vhKiXq9iaQskx37rJ7oVc/bPHy2uleQjRdArjzYXGOVhOX+SwP7fXIq4FyAggGygRUZXDf+OSHCxe4GDDSKkRmepyH+bhxlD1dN7KEMYMowY1vowRyG/S6DVeM6LhCQykNN6Td0QrYvxTslhtnR+uZ3R4cA1nY6van3ZHoWTzggdtf9NPpIK2diD+fwwSuUXRBO9sIcK9d/YuT3QwO4KRCRsPVur3a71NqT47QUdTiKbf2ie1MXK657p2IzwqPAKHzVYcGViGMeH+T5TOdtkl4r5oqq7q2i2FLTGGnTmnPvkV4eebW+/tODssGUM1o9hR7P1qdiYcWftGagTTLfEPNn708K+vDHPRogBUA2HbbyLTPIw0kKg7NIfSrPHOXwpYdhCqu48its+PEBQbDlDp1B0Og3C0+C5ngg1TJ+FL5VcDfTNQATI2/Lf0ZkLj0097wTmFVeG54kCp4Kl4VFWloPWwML5VjKElP8pI/uHgud6Fz9NRZyOcPLBlmgMsz h3F/74Gi +YQ7MNizzFOcSFiPVZrEO0pFzW8JxjCHhSCPidRiUIlzVqN5PnJuMN4wdHvOwPW/h4lN0QMIGRcNN/BlLz/cCEJh9dKDFAKAwvCrmhXHrGC9J3N4fK22INz1P0HV0K2vu5cpR0FRMeK4Ifp1m1ZXRRv5vbPv61zc2fKcf2v+LwTZy/4JJ5ZhMrQEUKol8GAj1fPQ/X3V25uNmWzIw6UqlST0VDCp88rF/ejNlepeJfSS/ODdPiWMrgqMiDY0DC65r2Y4DRopC/++uyPeucKaS99FuzUzUKq1Xh1cf1ljs1m2HMMA0iPHpGFWagDd+QXSmLe2x9ED2vCUtnw0H6kKNWwDqED6Fy0ujcf8q7HA6J+bl1u0v+2abvsjVO4i8fdzcZzkq91FBI7pvF51aYvEITxdCzcFyawxpWimgqagPE7O4h21uFwIhEI6GoA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jul 8, 2023 at 12:06=E2=80=AFPM Linus Torvalds wrote: > > On Sat, 8 Jul 2023 at 11:40, Suren Baghdasaryan wrote= : > > > > My understanding was that flush_cache_dup_mm() is there to ensure > > nothing is in the cache, so locking VMAs before doing that would > > ensure that no page faults would pollute the caches after we flushed > > them. Is that reasoning incorrect? > > It is indeed incorrect. > > The VIVT caches are fundamentally broken, and we have various random > hacks for them to make them work in legacy situations. > > And that flush_cache_dup_mm() is exactly that: a band-aid to make sure > that when we do a fork(), any previous writes that are dirty in the > caches will have made it to memory, so that they will show up in the > *new* process that has a different virtual mapping. > > BUT! > > This has nothing to do with page faults, or other threads. > > If you have a threaded application that does fork(), it can - and will > - dirty the VIVT caches *during* the fork, and so the whole > "flush_cache_dup_mm()" is completely and fundamentally race wrt any > *new* activity. > > It's not even what it is trying to deal with. All it tries to do is to > make sure that the newly forked child AT LEAST sees all the changes > that the parent did up to the point of the fork. Anything after that > is simply not relevant at all. > > So think of all this not as some kind of absolute synchronization and > cache coherency (because you will never get that on a VIVT > architecture anyway), but as a "for the simple cases, this will at > least get you the expected behavior". > > But as mentioned, for the issue of PER_VMA_LOCK, this is all *doubly* > irrelevant. Not only was it not relevant to begin with (ie that cache > flush only synchronizes parent -> child, not other-threads -> child), > but VIVT caches don't even exist on any relevant architecture because > they are fundamentally broken in so many other ways. > > So all our "synchronize caches by hand" is literally just band-aid for > legacy architectures. I think it's mostly things like the old broken > MIPS chips, some sparc32, pa-risc: the "old RISC" stuff, where people > simplified the hardware a bit too much. > > VIVT is lovely for hardware people becasue they get a shortcut. But > it's "lovely" in the same way that "PI=3D3" is lovely. It's simpler - > but it's _wrong_. > > And it's almost entirely useless if you ever do SMP. I guarantee we > have tons of races with it for very fundamental reasons - the problems > it causes for software are not fixable, they are "hidable for the > simple case". > > So you'll also find things like dcache_page_flush(), which flushes > writes to a page to memory. And exactly like the fork() case, it's > *not* real cache coherency, and it's *not* some kind of true global > serialization. > > It's used in cases where we have a particular user that wants the > changes *it* made to be made visible. And exactly like > flush_cache_dup_mm(), it cannot deal with concurrent changes that > other threads make. Thanks for the explanation! It's quite educational. > > > Ok, I think these two are non-controversial: > > https://lkml.kernel.org/r/20230707043211.3682710-1-surenb@google.com > > https://lkml.kernel.org/r/20230707043211.3682710-2-surenb@google.com > > These look sane to me. I wonder if the vma_start_write() should have > been somewhere else, but at least it makes sense in context, even if I > get the feeling that maybe it should have been done in some helper > earlier. > > As it is, we randomly do it in other helpers like vm_flags_set(), and > I've often had the reaction that these vma_start_write() calls are > randomly sprinked around without any clear _design_ for where they > are. We write-lock a VMA before any modification. I tried to provide explanations for each such locking in my comments/patch descriptions but I guess I haven't done a good job at that... > > > and the question now is how we fix the fork() case: > > https://lore.kernel.org/all/20230706011400.2949242-2-surenb@google.com/ > > (if my above explanation makes sense to you) > > See above. That patch is nonsensical. Trying to order > flush_cache_dup_mm() is not about page faults, and is fundamentally > not doable with threads anyway. > > > https://lore.kernel.org/all/20230705063711.2670599-2-surenb@google.com/ > > This is the one that makes sense to me. Ok, I sent you 3-patch series with the fixes here: https://lore.kernel.org/all/20230708191212.4147700-1-surenb@google.com/ Do you want me to disable per-VMA locks by default as well? > > Linus