From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32D55EB64D9 for ; Tue, 4 Jul 2023 06:51:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 556A728005E; Tue, 4 Jul 2023 02:51:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DFD6280049; Tue, 4 Jul 2023 02:51:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 358CB28005E; Tue, 4 Jul 2023 02:51:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2112B280049 for ; Tue, 4 Jul 2023 02:51:14 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D24F51C8AEA for ; Tue, 4 Jul 2023 06:51:13 +0000 (UTC) X-FDA: 80973007626.10.8BBD24E Received: from mail-yb1-f182.google.com (mail-yb1-f182.google.com [209.85.219.182]) by imf16.hostedemail.com (Postfix) with ESMTP id 114FB180004 for ; Tue, 4 Jul 2023 06:51:11 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=a3UL7nij; spf=pass (imf16.hostedemail.com: domain of surenb@google.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688453472; a=rsa-sha256; cv=none; b=jqRnNMA45HPoQE3avYPj1spnkqDYxeJKvHslK3TBxeYmP6xtZT3MUdihU9aqcfB+G2vjmV BLjMpHQw4ZvpZ+MkDk0GkQCoHwyJXdBrl8J5kBBv8S6beP7p23DLKk+0Y97z/2o8JWj9XV D4hgSUbkpNOjR6OEZvtFJypHI7o+MFA= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=a3UL7nij; spf=pass (imf16.hostedemail.com: domain of surenb@google.com designates 209.85.219.182 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688453472; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZDwhUKVeFNSbtEgwtb6kwapk4vA4roRBzURwn1978Dg=; b=uRFUN+kWkAoIJr/cWqF6TTeOWdKT2AuFA67VF3FyWMslZb64Lh39Eh1njXHkGd7QU+lTk3 IA1T9K4hSS8CNuH5XGlgKl5+t2ugqe7hDW2D8xEvFOOwYxi9KeFIXK/IiRS9TBlMdlHZRk pkaqgdcYgSATywOCEEHyY9VutflLDEw= Received: by mail-yb1-f182.google.com with SMTP id 3f1490d57ef6-c4d1b491095so3135797276.0 for ; Mon, 03 Jul 2023 23:51:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688453471; x=1691045471; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZDwhUKVeFNSbtEgwtb6kwapk4vA4roRBzURwn1978Dg=; b=a3UL7nij6BmtmvB/bBT41X/8MJh2AZyxgVg3c6gaEhMGNO3aJ1n9mHeJim7Gh9K1eF 7hfs8TFy6Jm8pzomNFHyuqORdPlMiNxyk9+8Euen9aFK16+iNkLVNWWNfoVraJJmxG4V MwGb0y6bljehSsiqnRs0XAYGFnVv5DORb0KP+tQicR1av2aTLBiHn2qXcqiaQ+ceosb7 Bls0k/bvkk3yu3XqVe6AYDtQaWo3duepdLnxqwtWzudF1bmyO9sCxtZueVuP1uOcwV6w MCPj1PeuOHffImB576PFvbKK1YwIuiquKe1YDt0clrcv8G3FUnLFsG9xQE2XrSdtekV8 SscA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688453471; x=1691045471; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZDwhUKVeFNSbtEgwtb6kwapk4vA4roRBzURwn1978Dg=; b=FX66yAaa7COMI3pTdXNvuNWBvk4JcccETrgOP98+72t+ANAmZua/gzI+dprFObc4AL SxYc8yoHaz4aeWZGwaZtdTdJ4xOIW7Jl30eF+TpeGYYJtEm50bY1Y64Ffmi+jPCeaFMJ 9MMWxMUVQvNf2B8JAVnabrCV8FbNOFvp+tsbvBXSjfo+3iGniRjrOWglLLS8AMIzOnQr DxpIcSZ3vfbX1T4SZ8uriO60fA+2Q/s4b0nDbHHkEmO0NRLOZWcpDYPXTQsjqv0D2wFR 38lGXO3jt5BLe2yJsZzAnASFKH5WSWlgM7ihdv1fDbSqOCnP2heQZVrWQP1B0H7CfJxx Srfw== X-Gm-Message-State: ABy/qLbAn9Fbyz7NRZ8j8ZsjSS1Sb7YaP4H9on4TCFR/WSmGpqopCdYN I155RwTqyTnyaC51C4NE6r6Ao6kNEiFWqU/0VTgAdA== X-Google-Smtp-Source: APBJJlHxsPUKChvcL/CkyFVLjYzmRUqMibePdtddtzjoESmtV6jEGs58WTnc6ZMVznmAwerwTpZvusKfI+xWyuaFFKE= X-Received: by 2002:a25:a287:0:b0:c1a:2928:74ab with SMTP id c7-20020a25a287000000b00c1a292874abmr10887668ybi.31.1688453470943; Mon, 03 Jul 2023 23:51:10 -0700 (PDT) MIME-Version: 1.0 References: <20230703182150.2193578-1-surenb@google.com> <7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 3 Jul 2023 23:50:59 -0700 Message-ID: Subject: Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed To: David Hildenbrand Cc: akpm@linux-foundation.org, jirislaby@kernel.org, jacobly.alt@gmail.com, holger@applied-asynchrony.com, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 114FB180004 X-Stat-Signature: 69z33saaegehtbn8n1qw7n6snran8syk X-Rspam-User: X-HE-Tag: 1688453471-247453 X-HE-Meta: U2FsdGVkX1/URuGVWPPP55uqyGwYpc0cFrg89KAzJ2gTskcVvqKD9FzZbWOrY+aBoWdqQZmlpgrzAD22RAdjUQGFO5pZlLSqFpCAnMT43RQh849FcYpqv1D+jRMtM8VjvXHUXBK1huTqnOwJnbmxutgfQjPbJ4mSJppp1rHdhKx2LJ7RJL+o54oCQuTaHaBBiS7HePphYfvLpkuwk4kFPkkfWtPXv2lsMB6KauViIE7fTXEOYKM9v9IsyHMYFgz2YTuUeIZucSf5lBWRAWnrXOkO6vODmQpdRzhtx+C9psNkpMnDgmeGpGzEKtkAatz09IRMgCp7WQpNXN/UBtv2NWpAvuo00I4nQ/2Q1VammQwXlpYAiCsR2Y32ZAfdNuPwhfH0Ym+MR66Tx7ln8TIXTgLwfeaAcq4nx/ONOBUJZE3G8f9jShCTrdL2rF+IsIPhsZau7cNP43z2C8hH3gr/5HWI2lO901Dy0/kdzIZjWrtYP1SWVUVN6O1UL5vw2VmwzaLDnKohetQBolYsw3ZnJeSHWA2bevWLcG8awo5e5n8LIMArH3W9zBaVRY07V63ReKSsITihiiH1FHTB7IWZ8uHL2CjcjENDj8391MWP0UGzWDCHQmoTKxRpehb+qBIRJJ+ihRwz2Eppq3EQE47J90E0smSODw/jHW0Jx61LBsz+WnL5L6yQ70To5CFYtO2Gix5XrZRxX7uuMQC6nCwFxkohqs9snicUsQguWEEEzy/1TIHuFJ1WpO0/epo6jDPsaflgo/9MzxveEgAZ1rZxq97PEAUaSlb6I3cpEGnKMUkl7FGjK6d5/Noj3B1P5WIlpS3qjivZP/DeJ4iqXsW+BMB9eaMnidwrWvRDGJEzbsfDY3nx5UfxCyDnt8uPDbsiS7hHgTHjI4tKmwTWH+xUMuDkBs6JD3tOqakr2+/15NfVuRSBUDTZw1ldoeAZnB3z4c4eRHrPOODkHWWTEad pxberawq 8wT+jinrfeeHZMhrW08EwGlYJ9xs1USp74kulTy7BtaBSB79ryHNt9NjmPGFXB+T4yrQQgWxckAH8cgBcnsuSfv8mlTYMU5QJ8j92AMBOlzYgQRCWiu3zGDUH5j3+EnpJBUheziS1QmP2bxFZUl58g4IQH045o3mGZgiJfUXGhR5iYSOUDV/crgU/KNrJuZ2fKUNgB0mLd5T9FJU5cS8C6eRzumYkXfRe0b7iNubMv7/gk21KWJ/GnMmyAOvEWCorLbYcVUlqL+2ph1fjhzgEkvWDIPmulrAvweRljUMX0Ttxd8jqX5+H5sZC32FrM1KSDH5N5OrWSrAHCCtmOh2AS04u52SGGSuTqQDlqsO9vV5lFa4tYhSGbnldcAjzPeuTHOuvtMIuwPSlcB5s5BT4Xfhwv9RT+vr5ukpT9shjV1RMy3O+On866Cuim3BKud/IEU+UeicJteqn/nzNAMI0fTEMfuiA3q2FdbNU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jul 3, 2023 at 10:39=E2=80=AFPM Suren Baghdasaryan wrote: > > On Mon, Jul 3, 2023 at 8:30=E2=80=AFPM David Hildenbrand wrote: > > > > On 03.07.23 20:21, Suren Baghdasaryan wrote: > > > A memory corruption was reported in [1] with bisection pointing to th= e > > > patch [2] enabling per-VMA locks for x86. > > > Disable per-VMA locks config to prevent this issue while the problem = is > > > being investigated. This is expected to be a temporary measure. > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=3D217624 > > > [2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@goog= le.com > > > > > > Reported-by: Jiri Slaby > > > Reported-by: Jacob Young > > > Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling = first") > > > Signed-off-by: Suren Baghdasaryan > > > --- > > > mm/Kconfig | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/Kconfig b/mm/Kconfig > > > index 09130434e30d..de94b2497600 100644 > > > --- a/mm/Kconfig > > > +++ b/mm/Kconfig > > > @@ -1224,7 +1224,7 @@ config ARCH_SUPPORTS_PER_VMA_LOCK > > > def_bool n > > > > > > config PER_VMA_LOCK > > > - def_bool y > > > + bool "Enable per-vma locking during page fault handling." > > > depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP > > > help > > > Allow per-vma locking during page fault handling. > > > > As raised at LSF/MM, I was "surprised" that we can now handle page faul= ts > > concurrent to fork() and was expecting something to be broken already. > > > > What probably happens is that we wr-protected the page in the parent pr= ocess and > > COW-shared an anon page with the child using copy_present_pte(). > > > > But we only flush the parent MM tlb before we drop the parent MM lock i= n > > dup_mmap(). > > > > > > If we get a write-fault before that TLB flush in the parent, and we end= up > > replacing that anon page in the parent process in do_wp_page() [because= , COW-shared with the child], > > this might be problematic: some stale writable TLB entries can target t= he wrong (old) page. > > Hi David, > Thanks for the detailed explanation. Let me check if this is indeed > what's happening here. If that's indeed the cause, I think we can > write-lock the VMAs being dup'ed until the TLB is flushed and > mmap_write_unlock(oldmm) unlocks them all and lets page faults to > proceed. If that works we at least will know the reason for the memory > corruption. Yep, locking the VMAs being copied inside dup_mmap() seems to fix the issue= : for_each_vma(old_vmi, mpnt) { struct file *file; + vma_start_write(mpnt); if (mpnt->vm_flags & VM_DONTCOPY) { vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt)= ); continue; } At least the reproducer at https://bugzilla.kernel.org/show_bug.cgi?id=3D217624 is working now. But I wonder if that's the best way to fix this. It's surely simple but locking every VMA is not free and doing that on every fork might regress performance. > Thanks, > Suren. > > > > > > > We had similar issues in the past with userfaultfd, see the comment at = the beginning of do_wp_page(): > > > > > > if (likely(!unshare)) { > > if (userfaultfd_pte_wp(vma, *vmf->pte)) { > > pte_unmap_unlock(vmf->pte, vmf->ptl); > > return handle_userfault(vmf, VM_UFFD_WP); > > } > > > > /* > > * Userfaultfd write-protect can defer flushes. Ensure = the TLB > > * is flushed in this case before copying. > > */ > > if (unlikely(userfaultfd_wp(vmf->vma) && > > mm_tlb_flush_pending(vmf->vma->vm_mm))) > > flush_tlb_page(vmf->vma, vmf->address); > > } If do_wp_page() could identify that vmf->vma is being copied, we could simply return VM_FAULT_RETRY and retry the page fault under mmap_lock, which would block until dup_mmap() is done... Maybe we could use mm_tlb_flush_pending() for that? WDYT? > > > > > > We really should not allow page faults concurrent to fork() without fur= ther investigation. > > > > -- > > Cheers, > > > > David / dhildenb > >