From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CAC5EB64DA for ; Sat, 8 Jul 2023 18:40:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 52CBB8D0001; Sat, 8 Jul 2023 14:40:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DB7C6B0072; Sat, 8 Jul 2023 14:40:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A2988D0001; Sat, 8 Jul 2023 14:40:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 274CC6B0071 for ; Sat, 8 Jul 2023 14:40:16 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E3DBA14023A for ; Sat, 8 Jul 2023 18:40:15 +0000 (UTC) X-FDA: 80989309590.17.0D7CF3A Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com [209.85.219.175]) by imf16.hostedemail.com (Postfix) with ESMTP id 19558180002 for ; Sat, 8 Jul 2023 18:40:13 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Dou3PgTS; spf=pass (imf16.hostedemail.com: domain of surenb@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688841614; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MymRbz3ieuUAdzwpPbldo2WgSB6COfwRJnDlIa6QBns=; b=GFZyyECkU8gcUVTrnvNzYjfEb9NZEbFkQCwZyD/HC9vyGM6OSSfyUdU0WwmpQh/wHZi2x5 Z8NfghMSlLyNQzuh0R6OHSz/zzA8j0wSj8fSfe4fP+Qu2YgSR0iPNKwDDsyIJQ2m3nVKxE zSYYW8JNsZJH9IkdYEUvVmjRZ8KWgC4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688841614; a=rsa-sha256; cv=none; b=tfs1G2vAVMcEYjRkXfHYGKoHddDhxH/hOJMRTUuVMIPRtnMhQDZ/40Zw4buQthb0rZXIJ3 HXKb4I5qRDST2BsFPCHivsr6UgyPyg26xrUh0FEyl+GH3iklSP3HOoRxDhh4vEHk9BsiNv Xn0g4HkxvBAq6NwzwBWai1hgIr7NCf8= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=Dou3PgTS; spf=pass (imf16.hostedemail.com: domain of surenb@google.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yb1-f175.google.com with SMTP id 3f1490d57ef6-c15a5ed884dso3449284276.2 for ; Sat, 08 Jul 2023 11:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688841613; x=1691433613; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MymRbz3ieuUAdzwpPbldo2WgSB6COfwRJnDlIa6QBns=; b=Dou3PgTSSuwDMevLhKPDFQkjfTIwUERNti/exYtr8KC5Fr1uRk53y/2upEOgzQAuvB yQ6noYSvDpKjTlt1JX2ZzS5ZEb1qJiyf/0pTP/915p/o6pnYju0wBEOcJQcGl7Bb/i/S yLRdTHANtBBhE9R87ukPNuhnAWoo7h0BL8OBhnIKPxmiWmElUvKh26paGfvbt526UDU3 PnuHlmFIDjcC5C4nQr1wzh8srbReVPgN1NtyPbTKt2bLyfTP2GIukl8xGa/zslYieyb+ VW3PmV52yfQwr0XcWSNOw9gXT2/gnLsXKlRY9rzhUOZk69nShG7+uU898SnCjSXRU8Vk MUJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688841613; x=1691433613; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MymRbz3ieuUAdzwpPbldo2WgSB6COfwRJnDlIa6QBns=; b=T86IT+I/m4f7hrWmdgaGpDHXZpYs71XPrN0JJDnz17PPaB/5pzAlK1hy32DomlQs3j fGR+cePEBzpzFo6r10wTWgv61d6N0BQs/3NvWVWyY9N4jvP1ByZ4slE2Fa/tc9Ne+SHo kbocnFwNRUMWbdGouyyZLBdrq4bXVHFyiw3PrCsxnlq/LwetXQa3j7sz4S6W13vp9eMd hqS+G0e+w32U0F9KlOjdvo4DWaEXqBSwPGUA6Yim8VV/P36SHqrBuXPcOmR7BRDdmZWG 3xs4ZWISkq73h8PqgHgZ0SB+LRBg0kHNTk8PqZ1Gedn/JIiAM3jDbmwGNDFco0Dsgk7n 3uZg== X-Gm-Message-State: ABy/qLaclheZ8iKpXITq/XjYly7/ZfR4XEeU3hhiquHoj+CTZdHkPTqc TaYyrCjRRagn+sfmPXQuTPeMDNPbwMENW0w3G5KSGA== X-Google-Smtp-Source: APBJJlE3w2AdZ9A5EaDN/VISkV40ZMSV3+Neg11bBy4VnYWxy1YfNqSmKLOubCHqPUg4as9gjsxMVt/0cyF/xZSuMqs= X-Received: by 2002:a81:4ed4:0:b0:579:efbb:cbe3 with SMTP id c203-20020a814ed4000000b00579efbbcbe3mr7485642ywb.43.1688841612875; Sat, 08 Jul 2023 11:40:12 -0700 (PDT) MIME-Version: 1.0 References: <5c7455db-4ed8-b54f-e2d5-d2811908123d@leemhuis.info> <2023070359-evasive-regroup-f3b8@gregkh> <2023070453-plod-swipe-cfbf@gregkh> <20230704091808.aa2ed3c11a5351d9bf217ac9@linux-foundation.org> <2023070509-undertow-pulverize-5adc@gregkh> <7668c45a-70b1-dc2f-d0f5-c0e76ec17145@leemhuis.info> <20230705084906.22eee41e6e72da588fce5a48@linux-foundation.org> <20230708103936.4f6655cd0d8e8a0478509e25@linux-foundation.org> In-Reply-To: From: Suren Baghdasaryan Date: Sat, 8 Jul 2023 11:40:01 -0700 Message-ID: Subject: Re: Fwd: Memory corruption in multithreaded user space program while calling fork To: Linus Torvalds Cc: Andrew Morton , Thorsten Leemhuis , Bagas Sanjaya , Jacob Young , Laurent Dufour , Linux Kernel Mailing List , Linux Memory Management , Linux PowerPC , Linux ARM , Greg KH , Linux regressions mailing list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 19558180002 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 44ek61xc3zo4p7b4anc314rzn34bggr3 X-HE-Tag: 1688841613-723783 X-HE-Meta: U2FsdGVkX18UZDTIl1URkhpmHgANdTXLGa0k8VrZvY2AqzNiXAioP0Y02IcYpZOXjJkR/mzdlgpHlSlLh84SCcA98qlPE9lReEZpSFwMBxGLiaPkcv86Ey2otZ6Td9Bal5deHfZcSeqas64P69HlrLa3JR7xgAo3YIa3iAaayYoOFaWFFAx57E5vZAfqVPv+2EUXx/RrIDGDOTUCrjAJvdwwdHQmvW0VQas0Ar+N7fGKIOInMYbwKzZUJLSS4bhd0lpF6NohB4hNuT/PWaPtoyBEzdaqefk4C8jczevQr0YpIcXUJ0kgDa3CiwFj4Yk4bkEX1O7hipC5+y2xBnykWTYVKFFdf1Zj9tlKOfwS3+KtDtSABewAhN8juYkqRVOC4Y9bd22saglMUHvSb4c2ovQMOdEEuvQkhBjKa2hqzFCRU46fBdXQ+DbsDfOMoOl5H6JVbhoQ0haRjT6eaModstr7SHiZ4esZeF9j6kVvSuJIoME2ldtKWneVRXmfhePHRw6UgP+UsP2oFealq83GwT6zI7UgD9NtyPDmkyisz/lQERn/7yUu/kL6Uau58Nxrz7fbaAhdHJd3sI9fYODM6wqEVmpTsDvabzmk7iJuGGDCos5fPQTuN8ZgCQf3I4VuXHys//uk9qUAPvSm0P85Q+fzcwcYlDoByMeCFsHMGlqdv37oB6i5abmL7H4EiMcZfh4+mKG37P+RgDgl4Md+UT2bhl9btNw1UQu5sq8R5xznWW1sMolId2fzawVMDRuseYgIsJkf0NH4yblKPseMYcgbJqHSdcDgcPnEOASakdFe0LhFrp4i37AnFoeCqWxveffYW0Y7LjbvkzbTyOs6WbuSy6yfWTRq/mCh8OUltXiY+XO3Q2JYp/FKfGxjF0CFr3GkPv8qPB9td9YOp8hdG0XFrRkpRef/2BdQEFcNx+Nv3JLFVgAK1KFExvsECAm+bhgRSOeVxABi+k/DVCY VE2zSSBs wfbz2iK1+j0CuQxUTGfVtcjUPUnX+9sTtoqODXGGG1MQLUlc7EPi+h0rdSWGQyeHf8qGAeEh6D0O7TCuT/++5ViwsJJEM8uDDxGTjjflEZlYXP21c72sEFGVuoDGv2urcVqXqxpCUfLkVb10c85XleDUm7LcukmCDJupLpLXV7pcjhXl39t3nSRIlmBo0NFKdkNx+NaaTYo8B+zDT81rxF27Ih/e0ytpcRq8ApmM1HmdfPwAv1OS764EiuFCTsOTYWDJXz5oIrGwINDhQ+1rzYcnkveSB+2oIRZ7SAgi1144qPz4CAbQ6XqcB3vQ34QF4tGQfxu3BrREEfKESUClKIdteSEN5E2bOG9yEcO3vlmlHBt+XclZ8LYvonmVIoG4dEwpXhc3WguX5JCKJT3Lfv5Gg/iGaf6z9j2s+Nnwk9nO6XQTosOZ8MOl2Rg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jul 8, 2023 at 11:05=E2=80=AFAM Linus Torvalds wrote: > > On Sat, 8 Jul 2023 at 10:39, Andrew Morton wr= ote: > > > > That was the v1 fix, but after some discussion > > (https://lkml.kernel.org/r/20230705063711.2670599-1-surenb@google.com) > > it was decided to take the "excessive" approach. > > That makes absolutely _zero_ sense. > > It seems to be complete voodoo programming. > > To some degree I don't care what happens in stable kernels, but > there's no way we'll do that kind of thing in mainline without some > logic or reason, when it makes no sense. > > flush_cache_dup_mm() is entirely irrelevant to the whole issue, for > several reason, but the core one being that it only matters on broken > virtually indexed caches, so none of the architectures that do per-vma > locking. > > And the argument that "After the mmap_write_lock_killable(), there > will still be a period where page faults can happen" may be true > (that's kind of the *point* of per-vma locking), but it's irrelevant. > > It's true for *all* users of mmap_write_lock_killable, whether in fork > or anywhere else. What makes fork() so magically special? > > It's why we have that vma_start_write(), to say "I'm now modifying > *this* vma, so stop accessing it in parallel". > > Because no, flush_cache_dup_mm() is not the magical reason to do that thi= ng. My understanding was that flush_cache_dup_mm() is there to ensure nothing is in the cache, so locking VMAs before doing that would ensure that no page faults would pollute the caches after we flushed them. Is that reasoning incorrect? > > Maybe there is something else going on, but no, we don't write crazy > code without a reason for it. That's completely unmaintainable, > because people will look at that code, not understand it (because > there is nothing to understand) and be afraid to touch it. For no > actual reason. > > The obvious place to say "I'm now starting to modify the vma" is when > you actually start to modify the vma. > > > Also, this change needs a couple more updates: > > Those updates seem sane, and come with explanations of why they exist. > Looks fine to me. > > Suren, please send me the proper fixes. Not the voodoo one. The ones > you can explain. Ok, I think these two are non-controversial: https://lkml.kernel.org/r/20230707043211.3682710-1-surenb@google.com https://lkml.kernel.org/r/20230707043211.3682710-2-surenb@google.com and the question now is how we fix the fork() case: https://lore.kernel.org/all/20230706011400.2949242-2-surenb@google.com/ (if my above explanation makes sense to you) or https://lore.kernel.org/all/20230705063711.2670599-2-surenb@google.com/ Please let me know which ones and I'll send you the patchset including these patches. Thanks, Suren. > > And if stable wants to do something else, then that's fine. But for > the development kernel,. we have two options: > > - fix the PER_VMA_LOCK code > > - decide that it's not worth it, and just revert it all > > and honestly, I'm ok with that second option, simply because this has > all been way too much pain. > > But no, we don't mark it broken thinking we can't deal with it, or do > random non-sensible code code we can't explain. > > Linus