From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 532B5C433DB for ; Tue, 12 Jan 2021 16:58:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AA91C23120 for ; Tue, 12 Jan 2021 16:58:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA91C23120 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6F9F68D00F7; Tue, 12 Jan 2021 11:58:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 65BE78D00F6; Tue, 12 Jan 2021 11:58:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FC898D00F7; Tue, 12 Jan 2021 11:58:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0181.hostedemail.com [216.40.44.181]) by kanga.kvack.org (Postfix) with ESMTP id 36EF38D00F6 for ; Tue, 12 Jan 2021 11:58:25 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E50501EE6 for ; Tue, 12 Jan 2021 16:58:24 +0000 (UTC) X-FDA: 77697731328.12.deer55_400531327517 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id BC3931801D7CF for ; Tue, 12 Jan 2021 16:58:24 +0000 (UTC) X-HE-Tag: deer55_400531327517 X-Filterd-Recvd-Size: 5168 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Jan 2021 16:58:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=W8sMe2vPnGNuau7lYL1rG0wWEO1MGNksSV6JbP+KXRU=; b=fuc0kPd50jC9pP+pM5HMJPVrw1 bn1MsFzPePeT3dMvCXifMdM+cyBh3dyB2A7Ypz3wavELXFqksqp0G7SGCCjDAbs/OyQNjnQNbG5Y6 ydx/Pm0suPbw/iXlkemkBE9fk9Spigk2EG0bpF4Zv3A8Lonpi1gEtbjOHeMLG+7h+r2SWMXdIy/e+ 6UPkQCamW+VA/JqChv9B3RJ7Wp7BeVzztpfBLVdp2NAnu7srinxNUE04HKPfDt7l8xoHNyc5zzRTi x93hzpyt8aN4nWzTi8QobzWx9TIlf3bas8kki5zY8pl4nouDHTL/i9XZJsenqYpOW73qHYgrVPLv0 smU62aIg==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94 #2 (Red Hat Linux)) id 1kzMzW-0054Be-2e; Tue, 12 Jan 2021 16:58:02 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 3932B30015A; Tue, 12 Jan 2021 17:57:55 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 1FE8420BF4004; Tue, 12 Jan 2021 17:57:55 +0100 (CET) Date: Tue, 12 Jan 2021 17:57:55 +0100 From: Peter Zijlstra To: Laurent Dufour Cc: Vinayak Menon , Linus Torvalds , Andy Lutomirski , Peter Xu , Nadav Amit , Yu Zhao , Andrea Arcangeli , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , surenb@google.com Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> <20210105153727.GK3040@hirez.programming.kicks-ass.net> <0201238b-e716-2a3c-e9ea-d5294ff77525@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <0201238b-e716-2a3c-e9ea-d5294ff77525@linux.vnet.ibm.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 12, 2021 at 04:47:17PM +0100, Laurent Dufour wrote: > Le 12/01/2021 =E0 12:43, Vinayak Menon a =E9crit=A0: > > Possibility of race against other PTE modifiers > >=20 > > 1) Fork - We have seen a case of SPF racing with fork marking PTEs RO= and that > > is described and fixed here https://lore.kernel.org/patchwork/patch/1= 062672/ Right, that's exactly the kind of thing I was worried about. > > 2) mprotect - change_protection in mprotect which does the deferred f= lush is > > marked under vm_write_begin/vm_write_end, thus SPF bails out on fault= s > > on those VMAs. Sure, mprotect also changes vm_flags, so it really needs that anyway. > > 3) userfaultfd - mwriteprotect_range is not protected unlike in (2) a= bove. > > But SPF does not take UFFD faults. > > 4) hugetlb - hugetlb_change_protection - called from mprotect and cov= ered by > > (2) above. > > 5) Concurrent faults - SPF does not handle all faults. Only anon page= faults. What happened to shared/file-backed stuff? ISTR I had that working. > > Of which do_anonymous_page and do_swap_page are NONE/NON-PRESENT->PRE= SENT > > transitions without tlb flush. And I hope do_wp_page with RO->RW is f= ine as well. The tricky one is demotion, specifically write to non-write. > > I could not see a case where speculative path cannot see a PTE update= done via > > a fault on another CPU. One you didn't mention is the NUMA balancing scanning crud; although I think that's fine, loosing a PTE update there is harmless. But I've not thought overly hard on it. > You explained it fine. Indeed SPF is handling deferred TLB invalidation= by > marking the VMA through vm_write_begin/end(), as for the fork case you > mentioned. Once the PTL is held, and the VMA's seqcount is checked, the= PTE > values read are valid. That should indeed work, but are we really sure we covered them all? Should we invest in better TLBI APIs to make sure we can't get this wrong?