From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1883BC433E0 for ; Mon, 4 Jan 2021 20:20:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A9C0222273 for ; Mon, 4 Jan 2021 20:20:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A9C0222273 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 24AFC8D002B; Mon, 4 Jan 2021 15:20:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D47A8D001C; Mon, 4 Jan 2021 15:20:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 04F628D002B; Mon, 4 Jan 2021 15:20:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id E20D68D001C for ; Mon, 4 Jan 2021 15:20:05 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 927FD181AC9BF for ; Mon, 4 Jan 2021 20:20:05 +0000 (UTC) X-FDA: 77669209170.03.eye46_49025de274d3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 6D80328A4E9 for ; Mon, 4 Jan 2021 20:20:05 +0000 (UTC) X-HE-Tag: eye46_49025de274d3 X-Filterd-Recvd-Size: 5538 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 Jan 2021 20:20:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1609791604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CHG8PYLjaqUlV4xnF924RfJu2w8NSfaaPiEISSrHtOc=; b=ETS2/By4C4AKAPnQeYPgXI5/mj7rW6krv4JLWHgCLM+0ZfZSOC3OFVNUifpzJr5vfMlKZV ToGfdDkmiqyBsd8caBaAp8mag514eZFYXjG4uF8gbKslAgraNCiu3OojBUJ0zD+4/HYNgF 2dvkyU2MexCEM3G6b+x6pcMooHOElhU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-414-ITELSFJ3NtGEu36vk7ubOw-1; Mon, 04 Jan 2021 15:20:00 -0500 X-MC-Unique: ITELSFJ3NtGEu36vk7ubOw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7A0DE107ACE3; Mon, 4 Jan 2021 20:19:58 +0000 (UTC) Received: from mail (ovpn-112-76.rdu2.redhat.com [10.10.112.76]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E4CD37086C; Mon, 4 Jan 2021 20:19:54 +0000 (UTC) Date: Mon, 4 Jan 2021 15:19:54 -0500 From: Andrea Arcangeli To: Nadav Amit Cc: Peter Zijlstra , linux-mm , lkml , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Mel Gorman Subject: Re: [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <20201225092529.3228466-1-namit@vmware.com> <20201225092529.3228466-2-namit@vmware.com> <20210104122227.GL3021@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.4 (2020-12-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 04, 2021 at 07:35:06PM +0000, Nadav Amit wrote: > > On Jan 4, 2021, at 11:24 AM, Andrea Arcangeli w= rote: > >=20 > > Hello, > >=20 > > On Mon, Jan 04, 2021 at 01:22:27PM +0100, Peter Zijlstra wrote: > >> On Fri, Dec 25, 2020 at 01:25:28AM -0800, Nadav Amit wrote: > >>=20 > >>> The scenario that happens in selftests/vm/userfaultfd is as follows= : > >>>=20 > >>> cpu0 cpu1 cpu2 > >>> ---- ---- ---- > >>> [ Writable PTE > >>> cached in TLB ] > >>> userfaultfd_writeprotect() > >>> [ write-*unprotect* ] > >>> mwriteprotect_range() > >>> mmap_read_lock() > >>> change_protection() > >>>=20 > >>> change_protection_range() > >>> ... > >>> change_pte_range() > >>> [ *clear* =E2=80=9Cwrite=E2=80=9D-bit ] > >>> [ defer TLB flushes ] > >>> [ page-fault ] > >>> ... > >>> wp_page_copy() > >>> cow_user_page() > >>> [ copy page ] > >>> [ write to old > >>> page ] > >>> ... > >>> set_pte_at_notify() > >>=20 > >> Yuck! > >=20 > > Note, the above was posted before we figured out the details so it > > wasn't showing the real deferred tlb flush that caused problems (the > > one showed on the left causes zero issues). >=20 > Actually it was posted after (note that this is v2). The aforementioned > scenario that Peter regards to is the one that I actually encountered (= not > the second scenario that is =E2=80=9Ctheoretical=E2=80=9D). This scenar= io that Peter regards > is indeed more =E2=80=9Cstupid=E2=80=9D in the sense that we should jus= t not write-protect > the PTE on userfaultfd write-unprotect. >=20 > Let me know if I made any mistake in the description. I didn't say there is a mistake. I said it is not showing the real deferred tlb flush that cause problems. The issue here is that we have a "defer tlb flush" that runs after "write to old page". If you look at the above, you're induced to think the "defer tlb flush" that causes issues is the one in cpu0. It's not. That is totally harmless. >=20 > > The problematic one not pictured is the one of the wrprotect that has > > to be running in another CPU which is also isn't picture above. More > > accurate traces are posted later in the thread. >=20 > I think I included this scenario as well in the commit log (of v2). Let= me > know if I screwed up and the description is not clear. Instead of not showing the real "defer tlb flush" in the trace and then fixing it up in the comment, why don't you take the trace showing the real problematic "defer tlb flush"? No need to reinvent it. https://lkml.kernel.org/r/X+JJqK91plkBVisG@redhat.com See here the detail underlined: deferred tlb flush <- too late XXXXXXXXXXXXXX BUG RACE window close here This show the real deferred tlb flush, your v2 does not include it instead.