From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1077BEB64D9 for ; Tue, 4 Jul 2023 18:02:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E1AC2800A8; Tue, 4 Jul 2023 14:02:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 791D5280096; Tue, 4 Jul 2023 14:02:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 631BF2800A8; Tue, 4 Jul 2023 14:02:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 51DE1280096 for ; Tue, 4 Jul 2023 14:02:02 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 24F441C8B1F for ; Tue, 4 Jul 2023 18:02:02 +0000 (UTC) X-FDA: 80974698084.28.9AA743B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 4EC8A4000E for ; Tue, 4 Jul 2023 18:01:59 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=chcxUJK8; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688493719; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=neDSRcGhe4Ifz04PPt7lJJYcLT9XIsHTGo8JtJ6aMQI=; b=zej5XWEqyGqqHM3f6Inpw86PI897McdN0XgZgRzKZzdNraTAvkQ2OadiDjT/VrFglbS9Ez K9tapEUHCw0f+LRaRVyInbhwW+mQNJsyrTfsl14YZ3A7kb96myE68htIKa2+xXQxXO0mFb ngNss0VENv8Ma3GYeqSBzUtVSac15Io= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=chcxUJK8; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688493719; a=rsa-sha256; cv=none; b=DxkIurL2HEiZXRauTU7J+1kHQMFRAoU6FCsXF/MbOuy3vMsBn3zUxUZuH95GiLcWY/Zl/v hHj/2YUmV4yvry5x+soNv2gXh7TupQTsMS1vKRY6zYMeWA0I4MJaIjBJGcYuLB5SU5Abu0 ASGL541Gjimp7UbrmQ5xtdhtoVDiZU8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688493718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=neDSRcGhe4Ifz04PPt7lJJYcLT9XIsHTGo8JtJ6aMQI=; b=chcxUJK8H6n2Hy4Kkqm3Ewu6aVm/Nc6mYor1R62HPikyfYCxLMKibbM6zjzccNnRK4luvq /N7Ohu5+HXU5CP77mSW4ILMynPpG5rQn2b+wf8ExKgWekhm/LiyJRndaozSkx/tB//J7sA N33Exo89Ka6zdGIZr9tvFYmx3r6HIiA= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-104-ufnPEfrONVmYloF7n6oPWA-1; Tue, 04 Jul 2023 14:01:57 -0400 X-MC-Unique: ufnPEfrONVmYloF7n6oPWA-1 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3113da8b778so3091974f8f.3 for ; Tue, 04 Jul 2023 11:01:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688493716; x=1691085716; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=neDSRcGhe4Ifz04PPt7lJJYcLT9XIsHTGo8JtJ6aMQI=; b=A9UPLjAKpLCACVnKyvs94atLw+TJNDYCogo1kHfroNPvDKTDSad9SDb+S4tNQ0o6LQ UbK68TL2sqi+yNYOnRYVeBg0yGbTfUWorxCCV2+9BExBBuMjk4I6OWb4mMFpvVqQ8GWR 3PNAFrMtls+bzLWOZLlqGl6F3jjQ6RXB/xCDvOV3kzLULuZJ3PPX9FiQFmymB8l7oKYD tDbCdIqt/xjTQx5c4HISC44qPxS9Hw5RuxuKg0o5A4nPU3fa0oCoLDPnmNvfaTATFOxt dm2XExlQQFI201kxht32IfEymJLAxixgZp7kWhvcEawGoOaopl8bw83ChUq73iybM25W aHLg== X-Gm-Message-State: ABy/qLaGTZ3erkGRdrBj9IEhWuu9WVQiu22ocCn1bwaLLYaPkxf7J6EB dptkftN4u4mJmnTcfOrZO8uhMUXZ3ElIepa0yw+l3FMjlpZeNC5agMCSmBgX5101mcI8mWbBt+/ qyOjy8N3Cyx8= X-Received: by 2002:adf:ee51:0:b0:313:fbd0:9810 with SMTP id w17-20020adfee51000000b00313fbd09810mr12256236wro.4.1688493715990; Tue, 04 Jul 2023 11:01:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlH/ON98Od+8DHdIubs32KtwZaEVA/aEb+QBetl/ByJAZeMfTJhCwCl+8HXb7u3+MQUE+U/gaQ== X-Received: by 2002:adf:ee51:0:b0:313:fbd0:9810 with SMTP id w17-20020adfee51000000b00313fbd09810mr12256190wro.4.1688493715165; Tue, 04 Jul 2023 11:01:55 -0700 (PDT) Received: from ?IPV6:2003:cb:c727:3200:b69e:e7e5:8c5d:21a1? (p200300cbc7273200b69ee7e58c5d21a1.dip0.t-ipconnect.de. [2003:cb:c727:3200:b69e:e7e5:8c5d:21a1]) by smtp.gmail.com with ESMTPSA id m1-20020adffe41000000b003143853590csm5429593wrs.104.2023.07.04.11.01.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Jul 2023 11:01:54 -0700 (PDT) Message-ID: <6e241cb5-08d2-b871-88ac-d4c477260857@redhat.com> Date: Tue, 4 Jul 2023 20:01:52 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, jirislaby@kernel.org, jacobly.alt@gmail.com, holger@applied-asynchrony.com, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, linux-mm@kvack.org References: <20230703182150.2193578-1-surenb@google.com> <7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4EC8A4000E X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: mwc5cg4pjeknjkwqfqkdyzotghsebfjw X-HE-Tag: 1688493719-849850 X-HE-Meta: U2FsdGVkX1+l4305hLDfSzjQhuGihyrDk+xWMiFIH1Y4r5eu6E24cfHepXHyAjHfHaNeqZ3AvZzRk6+DuZPw42jaL2UBe5D3o0cQ5NQfTI8xO3irVpZTUKypC2ytNkg4XrpK/VmvcaCgf6C6maNTNgYAERE+4+ToFrFG+20R8n4Rry8stuwbES1mD5Y5N0rVufkJjao3AtqQQ4PbElmMMJ0yqRyU4t0Q6x0YaMwcX0P5gYjPD7JBPAzwJ49A/9o7pqqWuxQNK5J0QB4ghp7xY/4QMQnXen2AanbXxCtwMKEpNsMvvzVe0ZADH/Ve9hwlnz6QGrl7d8VHgxWxrOajNZ1+mBbHg/K8RYHRUlPE2nJqoL4Fepnz2TK5qXeFC50/G7vaO4+KX6FTSHZ+z3UhFjabzSUhbjUZSn+wz0pXLVAhN1+478W67o9StMSl76CGq/UPJY1QPybyPvJ4PJsYEFrNIFtY0FOQoacnmeAYqZi+CPPO6XUnggAuIP6ObD2N5nfocelVWMstj7G7iiFuTdRw4gNHIh1WOHW7zfMpbyV4PJ0TIft4v6nUG3657fp8yPtFL6cHwSjxavhWWDHmcRdNfDovRrm3cM+SsR6Sn5BTXt25x/nXwPh2jfvnVeK/UbXBH1HyNsjeU8wRAUcbgnuGhPekpVPgE31qpfus6MbEJxXXTROn9Fv6fV58F1mlvzD/AN+DHkdGEbSvGEBJ1XDfw5ky3stvLohNkfTrXy+7TWEEs7r2Uc+1KkUtFlYH6tBXf/rQ0bIej2SK3M1naiqZJa8wm3KrnKe5MSqkZ9e+UsdJEjxcuhdqx0HWTazy+0Dzaui3Jga3lql/bbo+Hh5MbA2VjS8D0fE95NumzOSGsVJ5XqL+ZIWwg29emRVtu9+Uf/Sf8mawlv4O723poVhNClTKBUZeYD6BUHN7lHkD/dkCe/CYDHk3uq0S1fEgb/WDFIPn2E6dnKKXa9A yP/OnO/C SgHDrz9+2Mw0RyVGaIJ0k/ziEzzqoW85lmJ92O6HD0KUCjZDfkT9zqpxr8NIyUYARbti9tv+KM1F1Wzr+/OIoBUQ1ix5x6stBHdarONHRpdopqlmcK9N50kejyYLhnWEvWJYIE/2rnp/ryituZ+cnbW82koxGaDzg5D3HXX8UlrMLkODGwgswqDhiHBsijHzPWE4vsHUtd3/OXPz6MJshz4TR9j+m+WuWQKZapbCdmVG3yjv4pgNuW5tlV8Ro+7xNVsS29Br4CrLq+mhNcZ9URGduhmXLKAgwD2sRpADgDRwIzsblBEutnh4Xf3G+z4eq6Ysnco1Zo8SZuULh0oHR15hc9rDc8uvA5R5aLqQOk+eMXkvaKh9hGDHpDhA6KekyKH5uXdKXlhBOsZufBEG7B2+3fdYSulLxdC83fCTI9pzx6M73qSLwVtL/YJ1nDxKIvr4b3BW+MBxNOMrfi7hT8QcVAQJd90aQ21gAganZaUBFMW2xKca/hDA85lgWKbRf8V3nukBmTviC2y/ys9iBATaBMj1LLjuDjdu3ueucuwFyRcdgppJMwi7x9lhZtLI8kzbRKWzGKd9hCLvxIjHJ4x0x4wthh3QtUzwaN7phgBNurD9i6D1d/39Ul0r4aULpGa2X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 04.07.23 09:34, Suren Baghdasaryan wrote: > On Tue, Jul 4, 2023 at 12:18 AM David Hildenbrand wrote: >> >> On 04.07.23 08:50, Suren Baghdasaryan wrote: >>> On Mon, Jul 3, 2023 at 10:39 PM Suren Baghdasaryan wrote: >>>> >>>> On Mon, Jul 3, 2023 at 8:30 PM David Hildenbrand wrote: >>>>> >>>>> On 03.07.23 20:21, Suren Baghdasaryan wrote: >>>>>> A memory corruption was reported in [1] with bisection pointing to the >>>>>> patch [2] enabling per-VMA locks for x86. >>>>>> Disable per-VMA locks config to prevent this issue while the problem is >>>>>> being investigated. This is expected to be a temporary measure. >>>>>> >>>>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=217624 >>>>>> [2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com >>>>>> >>>>>> Reported-by: Jiri Slaby >>>>>> Reported-by: Jacob Young >>>>>> Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") >>>>>> Signed-off-by: Suren Baghdasaryan >>>>>> --- >>>>>> mm/Kconfig | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/mm/Kconfig b/mm/Kconfig >>>>>> index 09130434e30d..de94b2497600 100644 >>>>>> --- a/mm/Kconfig >>>>>> +++ b/mm/Kconfig >>>>>> @@ -1224,7 +1224,7 @@ config ARCH_SUPPORTS_PER_VMA_LOCK >>>>>> def_bool n >>>>>> >>>>>> config PER_VMA_LOCK >>>>>> - def_bool y >>>>>> + bool "Enable per-vma locking during page fault handling." >>>>>> depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP >>>>>> help >>>>>> Allow per-vma locking during page fault handling. >>>>> >>>>> As raised at LSF/MM, I was "surprised" that we can now handle page faults >>>>> concurrent to fork() and was expecting something to be broken already. >>>>> >>>>> What probably happens is that we wr-protected the page in the parent process and >>>>> COW-shared an anon page with the child using copy_present_pte(). >>>>> >>>>> But we only flush the parent MM tlb before we drop the parent MM lock in >>>>> dup_mmap(). >>>>> >>>>> >>>>> If we get a write-fault before that TLB flush in the parent, and we end up >>>>> replacing that anon page in the parent process in do_wp_page() [because, COW-shared with the child], >>>>> this might be problematic: some stale writable TLB entries can target the wrong (old) page. >>>> >>>> Hi David, >>>> Thanks for the detailed explanation. Let me check if this is indeed >>>> what's happening here. If that's indeed the cause, I think we can >>>> write-lock the VMAs being dup'ed until the TLB is flushed and >>>> mmap_write_unlock(oldmm) unlocks them all and lets page faults to >>>> proceed. If that works we at least will know the reason for the memory >>>> corruption. >>> >>> Yep, locking the VMAs being copied inside dup_mmap() seems to fix the issue: >>> >>> for_each_vma(old_vmi, mpnt) { >>> struct file *file; >>> >>> + vma_start_write(mpnt); >>> if (mpnt->vm_flags & VM_DONTCOPY) { >>> vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt)); >>> continue; >>> } >>> >>> At least the reproducer at >>> https://bugzilla.kernel.org/show_bug.cgi?id=217624 is working now. But >>> I wonder if that's the best way to fix this. It's surely simple but >>> locking every VMA is not free and doing that on every fork might >>> regress performance. >> >> >> That would mean that we can possibly still get page faults concurrent to >> fork(), on the yet unprocessed part. While that fixes the issue at hand, >> I cannot reliably tell if this doesn't mess with some other fork() >> corner case. >> >> I'd suggest write-locking all VMAs upfront, before doing any kind of >> fork-mm operation. Just like the old code did. See below. Maybe we could get away by not locking VM_MAYSHARE or VM_DONTCOPY. Possibly also when there are no other threads. But at least to me it feels safer to defer any such optimizations, and to see if it's really required. If there are no other threads, at least there will not be contention on the VMA locks. And if there are others threads, we used to have contention on the mmap lock already. -- Cheers, David / dhildenb