From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2657C35247 for ; Thu, 6 Feb 2020 20:55:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 91B8F218AC for ; Thu, 6 Feb 2020 20:55:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="psjqSCAI" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 91B8F218AC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 17E036B0003; Thu, 6 Feb 2020 15:55:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 107F86B0006; Thu, 6 Feb 2020 15:55:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F11866B0007; Thu, 6 Feb 2020 15:55:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id D5E556B0003 for ; Thu, 6 Feb 2020 15:55:34 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 839852470 for ; Thu, 6 Feb 2020 20:55:34 +0000 (UTC) X-FDA: 76460908188.30.use85_4db6711d56735 X-HE-Tag: use85_4db6711d56735 X-Filterd-Recvd-Size: 5129 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Thu, 6 Feb 2020 20:55:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=S+zoxa/s3gfQ4NIlEpFvq3BPey986PjmVMvcS08DqUY=; b=psjqSCAILAKU16Li2aUQVOKbN0 YWF3SZ2Lp8lJ0I1/r8MLQjQnuVNNkg+QvzItq3wrqznHdc2kqbVJNupddI26T/fkkLGYVgo9r8leC vqd1ndk6db1HN1qmwIjnzobFbUzYiqe0HSfznXadf7rD3HYQR23+GqSqAmn7AzoSZQzjbI/oyo+uq h5NOqY5QeBFAO6f4fP+N8D2asawrW4ct1WX+3sq0aVItMB7fD3m1zoOYcXFK4rJmVAhyfuhVsb32+ OwgItdi5UIX1vWFdFqbjamCKhJNUtwb7MrrCyjq1pJky0MI38b6LZ9FYQrwJvMay86YvK8KPz/X7+ vX8ByYog==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1izoBQ-0004dz-2Z; Thu, 06 Feb 2020 20:55:32 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 039503008A9; Thu, 6 Feb 2020 21:53:43 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 8D5F22B813B93; Thu, 6 Feb 2020 21:55:29 +0100 (CET) Date: Thu, 6 Feb 2020 21:55:29 +0100 From: Peter Zijlstra To: Matthew Wilcox Cc: SeongJae Park , Michal Hocko , Vlastimil Babka , "Kirill A. Shutemov" , linux-mm@kvack.org Subject: Re: Re: Splitting the mmap_sem Message-ID: <20200206205529.GZ14914@hirez.programming.kicks-ass.net> References: <20200109170715.GV4951@dhcp22.suse.cz> <20200109173206.3731-1-sj38.park@gmail.com> <20200109201320.GO6788@bombadil.infradead.org> <20200206135920.GS14914@hirez.programming.kicks-ass.net> <20200206201536.GX8731@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200206201536.GX8731@bombadil.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 06, 2020 at 12:15:36PM -0800, Matthew Wilcox wrote: > On Thu, Feb 06, 2020 at 02:59:20PM +0100, Peter Zijlstra wrote: > > > The proposal consists of three phases. In phase 1, we convert the > > > rbtree to the maple tree, and leave the locking alone. In phase 2, > > > we change the locking to a per-VMA refcount, looked up under RCU. > > > > > > This problem arises during phase 3 where we attempt to handle page > > > faults entirely under the RCU read lock. If we encounter problems, > > > we can fall back to acquiring the VMA refcount, but we need the > > > page allocation to fail rather than sleep (or magically drop the > > > RCU lock and return an indication that it has done so, but that > > > doesn't seem to be an approach that would find any favour). > > > > So why not use SRCU? You can do full blocking faults under SRCU and > > don't need no 'stinkin' refcounts ;-) > > I have to say, SRCU is not in my mental toolbox of "how to solve a > problem", so it simply hadn't occurred to me. Thanks. > > So, we'd DEFINE_SRCU(vma_srcu); in mm/memory.c > > then, at the beginning of a page fault call srcu_read_lock(&vma_srcu); > walk the tree as we do now, allocate memory for PTEs, sleep waiting for > pages to arrive back from disc, etc, etc, then at the end of the fault, > call srcu_read_unlock(&vma_srcu). So far so good,... > munmap() would consist of removing the > VMA from the tree, then calling synchronize_srcu() to wait for all faults > to finish, then putting the backing file, etc, etc and freeing the VMA. call_srcu(), and the (s)rcu callback will then fput() and such things more. synchronize_srcu() (like synchronize_rcu()) is stupid slow and would make munmap()/exit()/etc.. unusable. > This seems pretty reasonable, and investigation could actually proceed > before the Maple tree work lands. Today, that would be: > > srcu_read_lock(&vmas_srcu); > down_read(&mm->mmap_sem); > find_vma(mm, address); > up_read(&mm->mmap_sem); > ... rest of fault handler path ... > srcu_read_unlock(&vmas_srcu); > > Kind of a pain because we still call find_vma() in the per-arch page > fault handler, but for prototyping, we'd only have to do one or two > architectures. If you look at the earlier speculative page-fault patches by Laurent, which were based on my still earlier patches, you'll find most of this there. The tricky bit was validating everything on the second page-table walk, so see if nothing had fundamentally changed, specifically the VMA, before installing the PTE. If you do this without mmap_sem, you need to hold ptlock to pin stuff while validating everything you did earlier.