From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D24AC433F5 for ; Tue, 1 Feb 2022 02:20:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D63D38D004B; Mon, 31 Jan 2022 21:20:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D15198D0028; Mon, 31 Jan 2022 21:20:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB5208D004B; Mon, 31 Jan 2022 21:20:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id A891B8D0028 for ; Mon, 31 Jan 2022 21:20:48 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5501292E1F for ; Tue, 1 Feb 2022 02:20:48 +0000 (UTC) X-FDA: 79092607776.11.13F3BAD Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf01.hostedemail.com (Postfix) with ESMTP id 5E89540003 for ; Tue, 1 Feb 2022 02:20:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=VaPv9in0Q0d5pY9Lnm4deq3TesKVG1iF4M3HxW9oxdw=; b=WYtch715pZRcNyhxCmmdPdqfVb ZvvzOSmL6yGAJhSSWvAdGVvVB1OriraaS+j9USYyk+pMBM+YqSWlBoiBtyeG7WBNz4Z5Nw5lCtATu QoZAjplgw10FePVkMNpbtvXDsPRt2KuJBeVx5dPhmaRF4Q//CcDaNdOzYoSbuPp+lNDcRahqY0zC5 NtPxbajxz6VZpkKhnhDnWIrsXMs+16AMG4Aq2IOPyB6n8smHjhS9qEoeHNV9ybp8QM0b4/FvjFoQE cEWpvX8JQabphnnZ1rQU3z3xTgKsuSCiUcDOAz590YawBuzwJLiZ2ZkkBC4M4HZblty6nPWAy4CAx gnVGx/2A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nEimd-00BCwX-CU; Tue, 01 Feb 2022 02:20:39 +0000 Date: Tue, 1 Feb 2022 02:20:39 +0000 From: Matthew Wilcox To: Andrew Morton Cc: Michel Lespinasse , Linux-MM , linux-kernel@vger.kernel.org, kernel-team@fb.com, Laurent Dufour , Jerome Glisse , Peter Zijlstra , Michal Hocko , Vlastimil Babka , Davidlohr Bueso , Liam Howlett , Rik van Riel , Paul McKenney , Song Liu , Suren Baghdasaryan , Minchan Kim , Joel Fernandes , David Rientjes , Axel Rasmussen , Andy Lutomirski , Sebastian Andrzej Siewior Subject: Re: [PATCH v2 00/35] Speculative page faults Message-ID: References: <20220128131006.67712-1-michel@lespinasse.org> <20220131171434.89870a8f1ae294912e7ff19e@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220131171434.89870a8f1ae294912e7ff19e@linux-foundation.org> Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=WYtch715; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none X-Rspam-User: nil X-Rspamd-Queue-Id: 5E89540003 X-Stat-Signature: agwk6n4jqjh5k4mez1g3auijjah6toyw X-Rspamd-Server: rspam12 X-HE-Tag: 1643682047-716684 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 31, 2022 at 05:14:34PM -0800, Andrew Morton wrote: > On Fri, 28 Jan 2022 05:09:31 -0800 Michel Lespinasse wrote: > > The first step of a speculative page fault is to look up the vma and > > read its contents (currently by making a copy of the vma, though in > > principle it would be sufficient to only read the vma attributes that > > are used in page faults). The mmap sequence count is used to verify > > that there were no mmap writers concurrent to the lookup and copy steps. > > Note that walking rbtrees while there may potentially be concurrent > > writers is not an entirely new idea in linux, as latched rbtrees > > are already doing this. This is safe as long as the lookup is > > followed by a sequence check to verify that concurrency did not > > actually occur (and abort the speculative fault if it did). > > I'm surprised that descending the rbtree locklessly doesn't flat-out > oops the kernel. How are we assured that every pointer which is > encountered actually points at the right thing? Against things > which tear that tree down? It doesn't necessarily point at the _right_ thing. You may get entirely the wrong node in the tree if you race with a modification, but, as Michel says, you check the seqcount before you even look at the VMA (and if the seqcount indicates a modification, you throw away the result and fall back to the locked version). The rbtree always points to other rbtree nodes, so you aren't going to walk into some completely wrong data structure. > > The next step is to walk down the existing page table tree to find the > > current pte entry. This is done with interrupts disabled to avoid > > races with munmap(). > > Sebastian, could you please comment on this from the CONFIG_PREEMPT_RT > point of view? I am not a fan of this approach. For other reasons, I think we want to switch to RCU-freed page tables, and then we can walk the page tables with the RCU lock held. Some architectures already RCU-free the page tables, so I think it's just a matter of converting the rest.