From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3728FC282DD for ; Thu, 9 Jan 2020 20:13:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ED6EE20673 for ; Thu, 9 Jan 2020 20:13:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="jBZoHGG6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ED6EE20673 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 796E08E0005; Thu, 9 Jan 2020 15:13:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 746ED8E0001; Thu, 9 Jan 2020 15:13:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65CE28E0005; Thu, 9 Jan 2020 15:13:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 5021B8E0001 for ; Thu, 9 Jan 2020 15:13:27 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id E3CCC181AC9CC for ; Thu, 9 Jan 2020 20:13:26 +0000 (UTC) X-FDA: 76359195612.24.paper09_755bf6bace13e X-HE-Tag: paper09_755bf6bace13e X-Filterd-Recvd-Size: 4464 Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Thu, 9 Jan 2020 20:13:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=0dLis4ZjghaZw8KbHXBJgOebXCO9mgUAwj2tE7Xb37U=; b=jBZoHGG64zgzW8XwNPPo3MLIF TyPsxKblX9Mp+oRaeAptjrLgj9+pH1kQGCHyB3GpyDiaLBttDhI4e8tkUMgpVctfGbrv4bT/vV5Vv WsF4YfHGWG6ered7FakYz9NRgHmnGHk7Gl2EfENKJfgKVumF778zPPaIyma6IXA6P49Rsu8u7t0aW ipP0Rtjn7LdPV5REr9Ki3TUWW7EEeiOTkobm3prPkgRfetDP0bLW+yQ3gBvm7ucuPL1DXuHmt0I8K htJtd6XB6llkdE0k0GIyMxQ4JG5GRN54EXovoSt8oXYRK+hfIn0nikketCi+BkQ9+LtOf1S0M2Gra 47mO5HXxg==; Received: from willy by bombadil.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1ipeBF-0002Mr-1P; Thu, 09 Jan 2020 20:13:21 +0000 Date: Thu, 9 Jan 2020 12:13:20 -0800 From: Matthew Wilcox To: SeongJae Park Cc: Michal Hocko , Vlastimil Babka , "Kirill A. Shutemov" , linux-mm@kvack.org, Peter Zijlstra Subject: Re: Re: Splitting the mmap_sem Message-ID: <20200109201320.GO6788@bombadil.infradead.org> References: <20200109170715.GV4951@dhcp22.suse.cz> <20200109173206.3731-1-sj38.park@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200109173206.3731-1-sj38.park@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 09, 2020 at 06:32:06PM +0100, SeongJae Park wrote: > On Thu, 9 Jan 2020 18:07:15 +0100 Michal Hocko wrote: > > > On Thu 09-01-20 18:03:25, Michal Hocko wrote: > > > I might misremember but RCU based VMA handling has > > > been considered in the past. I do not remember details but there were > > > some problems and page tables allocation is not the biggest one. > > > > I have found https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf in my > > notes. I managed to forget everything but maybe it will be useful for a > > reference. > > The subsequent work from the authors > (https://people.csail.mit.edu/nickolai/papers/clements-radixvm-2014-08-05.pdf) > might be also useful for the understanding of the limitations found from the > work. Thanks for both those references. > I also forgot many details but as far as I remember, the biggest problem with > the rcuvm was the update side scalability limitation that results from the > single updater lock and the TLB invalidations. I has also internally > implemented another RCU based vm that utilizing fine-grained update side > synchronization. The write side performance of my version was therefore much > improved, but it also dropped the performance at the end with heavily > write-intensive workloads due to the TLB flush overhead. > > Page table allocations weren't bothered me at that time. As far as I can tell, both these implementations work by using RCU to look up a VMA, taking a reference count on the VMA and dropping the RCU read lock before walking the page tables. Sleeping to allocate page tables will be fine as the reference count prevents the VMA from going away. One of the use cases that we're concerned about involves a high percentage of page faults on a single large (terabytes) VMA (and a highly multithreaded process). Moving the contention from a rwsem in the mm_struct to a refcount in the VMA will not help performance substantially for this user. The proposal consists of three phases. In phase 1, we convert the rbtree to the maple tree, and leave the locking alone. In phase 2, we change the locking to a per-VMA refcount, looked up under RCU. This problem arises during phase 3 where we attempt to handle page faults entirely under the RCU read lock. If we encounter problems, we can fall back to acquiring the VMA refcount, but we need the page allocation to fail rather than sleep (or magically drop the RCU lock and return an indication that it has done so, but that doesn't seem to be an approach that would find any favour).