From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9BDCEB64D9 for ; Tue, 4 Jul 2023 07:18:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 41457280064; Tue, 4 Jul 2023 03:18:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C469280049; Tue, 4 Jul 2023 03:18:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28D46280064; Tue, 4 Jul 2023 03:18:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1ADBD280049 for ; Tue, 4 Jul 2023 03:18:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DA8931A094C for ; Tue, 4 Jul 2023 07:18:28 +0000 (UTC) X-FDA: 80973076296.23.E7D58A2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 87233180017 for ; Tue, 4 Jul 2023 07:18:26 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jLgDBvsD; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688455106; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yG3RyaL504XVAek96dpUZLwdMjnjnlG8nPrYZsPyBW8=; b=Nsz3X6MOZi505TC/0wZw0ZbmG8f+gxGqaoMPX4uxdE02QH+5Ht9bFNvZzdTZRvj2msTL2B D2BuVZKTSwCRZ9xhPawYVWivV5mZfgiNS0fVvZ9bqDfzPOoQ0ImaHAZ3+yoKxY/9ygKdca xhmcloBjhd/3kOUgFwL5jwGImX57eSY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688455106; a=rsa-sha256; cv=none; b=pgf1Crf3Sbi4kzPIg9SpgJE8qy32eMWujCIfSK0TEpfz2r1yidwcM8+ZWBPlkoXH3OdfVj KdI/xeuUQzm1B4k63PUu3YIf9eIV2MQg9DqSgMDTeWeVMjq4Uxd3aF6Po+xrezR2aRd8xz 35ffct0uNy60y2kYCwBVfxaQ4qMTDVc= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jLgDBvsD; spf=pass (imf06.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688455105; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yG3RyaL504XVAek96dpUZLwdMjnjnlG8nPrYZsPyBW8=; b=jLgDBvsD1IrX46bvy/elmqvqO4wYJTTNQ/HRoF+Luo4sGEFxj7kG9RIahbnpyDtCjkwqG+ YzzOBUY0SFvwmCbhc1vF7Y8sbpXh66KLeBUkMlT67/z8Sc4r4KXfmZco74/BBRu/VpX3X8 7FPPeXrfhb2Bfj2uxTN55QvxYswNGP0= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-392-RWnwxnyXOjK6Gg1Pdi1ghw-1; Tue, 04 Jul 2023 03:18:22 -0400 X-MC-Unique: RWnwxnyXOjK6Gg1Pdi1ghw-1 Received: by mail-wr1-f71.google.com with SMTP id ffacd0b85a97d-314275b653eso1786114f8f.1 for ; Tue, 04 Jul 2023 00:18:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688455101; x=1691047101; h=content-transfer-encoding:in-reply-to:subject:organization:from :content-language:references:cc:to:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yG3RyaL504XVAek96dpUZLwdMjnjnlG8nPrYZsPyBW8=; b=JnzdUXV503Sx0eBB5ogykPCKJFC/7waoTe2/vI/5CRePbPzw0U2s6CKf8+I7B1oXaE 5T2yitnkv52loSBHKQNfGBiij1xuvCn18M/aS/uzvGpthN5VK6WieLdKomf4vWgO1Zfy 23n7C63UAvJ62uUZCjUWMnSmsYH9qDOHJZXDEOGZVEw4Jf4qoMLSy5n+0ZeC2yu79b/+ g09dJisnyRWuMy4UvAil1BCMsxHFa4NKzTNfsjJGnIN9VlGhzbMtjSSqch68HcRIjzNw +CzKdFGgwkO77BlXgSWVyh/uim8adw6HfNCpqjqCP2mwtFCuUuYGxVQUTrp+a6m1T/6Y D9mQ== X-Gm-Message-State: ABy/qLb6duYS6yrmiwd3JGPx0+j2OOQX7jSyBdwIAYzfnSBkQnJhjjGj FKqMWbU6daEt8MdTfQ2EUtf8XYOCbKGqSyfgD3MLDLYAV2L1YrnYwsShqR7XUfkBQJKA44cg2Qh 4lv9GiSD3Y3E= X-Received: by 2002:a5d:480d:0:b0:314:8d:7eb1 with SMTP id l13-20020a5d480d000000b00314008d7eb1mr10103712wrq.55.1688455101417; Tue, 04 Jul 2023 00:18:21 -0700 (PDT) X-Google-Smtp-Source: APBJJlHUpY1UFUzthVG/9XkSAvItVjnKlguA5ohYDG58zkl5v1G0BSUJK5YxBpD9w3cDD4jHAwxgBA== X-Received: by 2002:a5d:480d:0:b0:314:8d:7eb1 with SMTP id l13-20020a5d480d000000b00314008d7eb1mr10103660wrq.55.1688455100950; Tue, 04 Jul 2023 00:18:20 -0700 (PDT) Received: from ?IPV6:2003:d8:2f30:5a00:b30d:e6bc:74c3:d6f2? (p200300d82f305a00b30de6bc74c3d6f2.dip0.t-ipconnect.de. [2003:d8:2f30:5a00:b30d:e6bc:74c3:d6f2]) by smtp.gmail.com with ESMTPSA id m3-20020adff383000000b0031417b0d338sm13102264wro.87.2023.07.04.00.18.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Jul 2023 00:18:20 -0700 (PDT) Message-ID: Date: Tue, 4 Jul 2023 09:18:18 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, jirislaby@kernel.org, jacobly.alt@gmail.com, holger@applied-asynchrony.com, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, linux-mm@kvack.org References: <20230703182150.2193578-1-surenb@google.com> <7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: o1qurqeoo1owafywf6jm3oqdns8kxsb4 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 87233180017 X-Rspam-User: X-HE-Tag: 1688455106-208907 X-HE-Meta: U2FsdGVkX18UphNraFdqEpN80QL6AZQRxqx0h3ca5WrBvw4KHxZ5rKm6nZQFWLacGEqdReMb0ibtOvvAGHVx0VUw867gihNZAisoyHXIVjE/LjtC/xG7yDaLffCprb6RW7/p+z1/BG0mHv6iqxD4Am2Gn86b2EIS4Ous1EbV3MF6Y7rbjZA7C7TwgyGts1XROeSbeFs3T3ZFEom9zN2hEBxOvENoxBkMYGUgOXq+TYr1r4Ym4VL+UI0htP+BhaV7Ms5QhRJ9zNhSYa7USIbgGG5Jh97e4kgIlBF3RSUW2/RkFswyBhuqt/DvzV/Ae9YX7JgmSP9LYrjM96IqRbXBElML3/SIja4AcW4TESPkptlNlw1L7XSetTDkkRWicK/BYnfL2GFXu1vLQWR1F7wpTYK0EEPe3CGA8p7CGZMSftnUJZYvB67ix7lBLXyltoouk3ihCk7yAAONjtCrQAHtN7tfIPpcO4axK/vZvYX3zXnBXFlaNLpywnJKy2RgneXt/TW/+J+y2So5aEDGzOD+DJHZBvb/V/LiXXeO15m1dHtsEsk4OYlscssqeCKbmlV32RV16dn7Jo55rMQY44cSl1yzFi3bh1jWkpsxQZv0ldZMlWaR0/K13mu0lyfK1eOIMENDvlo2eKgssP3MXXLDyyWw0J9tpln1r4wosMtvpsHNBN5FSfLPoBktDbkjh92xfci4yDMQEZbtRTVvpY2Qaq9eYn4avAkA2rY06bOMSVwljvwwjC3g/KyjokBAXzYh2x03fEQbBE1rK375whF0JhcyyjTFTGQLOkwcwJfQ+VQ4Ox9eOM1p6mQqErKkjMlurGZWPK1u/aT4IkF3uJPhiaov7SNWm0xgrP/sWKkYpqLawvL6Mh9BrcTaW95uJwIjIje5Dtp9e6pXOJAGKFhOAT4BNBkw6NzIk6QxRz1Lr9Xl0O8H8gXo5sgR2USF1XoTMhmnQbXz3XBcyeeM+QE Sgky7we8 jsxPPpyaMYfZ0HdHp5pUVkMeOMnwWQdWa2oj0jSonDDAEVnfoqXtJgWvdxJVgHcCAFDFsu9QB3hD263bh/MVIhaAPF9x0+/wJx/HXICd49+m4VHgtzn/OBWE1s9bmSb+Qox5cR2whVqqKCxU7hKmd0Rcllp/SYbxIcDeD1UzX5T1/IVZPgQcRpazYiurO4RmeyP9C+KuoVlaEN3hAl+rEjgU3A8g0jZdFSA05E6COA++NdJ6dHMj0aFtBKQvJ9x5UrbEBLnNOISeefMOkrrlEp1HYLQQfug0KQBwxeFbSqmqBhEekii1DFBjYbP/Vu5NHvkTANyV4gah3vv/HcW5qGnj7kjqf7kwzz/EU0WmAuQwPm61FGletimLZUoAYAOL9BjhL18dSfQVb8uDBJXGhYlAqjGXgbn6oB+BHhhmeyKRsaK+rTBs4ThS3aLUyi2dN8dPyk97BPZNsuGy4+Pyl60Nu+jCduGo8DcVCQTC/z9hmY5Q+ZArTv7tz0ncNpO58U/f8FSAMNL1YhI44g9zqrCu8hO1C7Y9BZmEcO+iqKejOByLg/OmgcibDAJdzSS/mnB85NAIgubrEJ+WfkeZtg6avJNa/wiZqj/QxbNl0IIg1DwhGA5U8zH8CW/lGKPj7kN0I X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04.07.23 08:50, Suren Baghdasaryan wrote: > On Mon, Jul 3, 2023 at 10:39 PM Suren Baghdasaryan wrote: >> >> On Mon, Jul 3, 2023 at 8:30 PM David Hildenbrand wrote: >>> >>> On 03.07.23 20:21, Suren Baghdasaryan wrote: >>>> A memory corruption was reported in [1] with bisection pointing to the >>>> patch [2] enabling per-VMA locks for x86. >>>> Disable per-VMA locks config to prevent this issue while the problem is >>>> being investigated. This is expected to be a temporary measure. >>>> >>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624 >>>> [2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com >>>> >>>> Reported-by: Jiri Slaby >>>> Reported-by: Jacob Young >>>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") >>>> Signed-off-by: Suren Baghdasaryan >>>> --- >>>> mm/Kconfig | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/mm/Kconfig b/mm/Kconfig >>>> index 09130434e30d..de94b2497600 100644 >>>> --- a/mm/Kconfig >>>> +++ b/mm/Kconfig >>>> @@ -1224,7 +1224,7 @@ config ARCH_SUPPORTS_PER_VMA_LOCK >>>> def_bool n >>>> >>>> config PER_VMA_LOCK >>>> - def_bool y >>>> + bool "Enable per-vma locking during page fault handling." >>>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP >>>> help >>>> Allow per-vma locking during page fault handling. >>> >>> As raised at LSF/MM, I was "surprised" that we can now handle page faults >>> concurrent to fork() and was expecting something to be broken already. >>> >>> What probably happens is that we wr-protected the page in the parent process and >>> COW-shared an anon page with the child using copy_present_pte(). >>> >>> But we only flush the parent MM tlb before we drop the parent MM lock in >>> dup_mmap(). >>> >>> >>> If we get a write-fault before that TLB flush in the parent, and we end up >>> replacing that anon page in the parent process in do_wp_page() [because, COW-shared with the child], >>> this might be problematic: some stale writable TLB entries can target the wrong (old) page. >> >> Hi David, >> Thanks for the detailed explanation. Let me check if this is indeed >> what's happening here. If that's indeed the cause, I think we can >> write-lock the VMAs being dup'ed until the TLB is flushed and >> mmap_write_unlock(oldmm) unlocks them all and lets page faults to >> proceed. If that works we at least will know the reason for the memory >> corruption. > > Yep, locking the VMAs being copied inside dup_mmap() seems to fix the issue: > > for_each_vma(old_vmi, mpnt) { > struct file *file; > > + vma_start_write(mpnt); > if (mpnt->vm_flags & VM_DONTCOPY) { > vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt)); > continue; > } > > At least the reproducer at > https://bugzilla.kernel.org/show_bug.cgi?id=217624 is working now. But > I wonder if that's the best way to fix this. It's surely simple but > locking every VMA is not free and doing that on every fork might > regress performance. That would mean that we can possibly still get page faults concurrent to fork(), on the yet unprocessed part. While that fixes the issue at hand, I cannot reliably tell if this doesn't mess with some other fork() corner case. I'd suggest write-locking all VMAs upfront, before doing any kind of fork-mm operation. Just like the old code did. See below. > >> Thanks, >> Suren. >> >>> >>> >>> We had similar issues in the past with userfaultfd, see the comment at the beginning of do_wp_page(): >>> >>> >>> if (likely(!unshare)) { >>> if (userfaultfd_pte_wp(vma, *vmf->pte)) { >>> pte_unmap_unlock(vmf->pte, vmf->ptl); >>> return handle_userfault(vmf, VM_UFFD_WP); >>> } >>> >>> /* >>> * Userfaultfd write-protect can defer flushes. Ensure the TLB >>> * is flushed in this case before copying. >>> */ >>> if (unlikely(userfaultfd_wp(vmf->vma) && >>> mm_tlb_flush_pending(vmf->vma->vm_mm))) >>> flush_tlb_page(vmf->vma, vmf->address); >>> } > > If do_wp_page() could identify that vmf->vma is being copied, we could > simply return VM_FAULT_RETRY and retry the page fault under mmap_lock, > which would block until dup_mmap() is done... Maybe we could use > mm_tlb_flush_pending() for that? WDYT? I'm not convinced that we should be making that code more complicated simply to speed up fork() with concurrent page faults. My gut feeling is that many operations that could possible take the VMA lock in the future (page pinning triggering unsharing) should not run concurrent with fork(). So IMHO, keep the old behavior of fork() -- i.e., no concurrent page faults -- and unlock that eventually in the future when deemed really required (but people should really avoid fork() in performance-sensitive applications if not absolutely required). -- Cheers, David / dhildenb