From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE1A8C48BCF for ; Thu, 10 Jun 2021 01:29:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3F91A613EC for ; Thu, 10 Jun 2021 01:29:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F91A613EC Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD32F6B0036; Wed, 9 Jun 2021 21:29:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A83766B006E; Wed, 9 Jun 2021 21:29:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FD8B6B0070; Wed, 9 Jun 2021 21:29:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id 5E08A6B0036 for ; Wed, 9 Jun 2021 21:29:36 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DF894127F for ; Thu, 10 Jun 2021 01:29:35 +0000 (UTC) X-FDA: 78236081910.32.5DC287C Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by imf18.hostedemail.com (Postfix) with ESMTP id 9F3B72001060 for ; Thu, 10 Jun 2021 01:29:31 +0000 (UTC) Received: by mail-yb1-f169.google.com with SMTP id g38so38239976ybi.12 for ; Wed, 09 Jun 2021 18:29:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZTEcShlGPL6FniQYgwQix81Fzdg8/firXMlwZ5PB4+c=; b=UQZNxuCSNA0rpT09VH5/tYsbS6k55UYd9+DyBby2bucr4Y8p4odZp8Q0+YMiQ9whma DIaSNuaceze2uBaNLn7GsWoXyzc6n6SM5OKnDi86BLIIkvrwy+qULfJMmofLBZCW6pOT YXC35vZ/4lL3OZjojWxgXvAAab75QUZTTILPdKt1+6ITDRbzVjQWerLDrAjcojDuunIb KqP0/luftmFw56X2lMXDCFJuLUrW/rrQQIM9zty9GKVNtfaiVtR4I2RYkPDa9Lx0DcIv p9Za2NDOQynXRa5C+XN2G5Jn9mr6PVqYbxVJ+R7TrkfufzdlsPIJgtVRs5lWMeNOi5Iw PAkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZTEcShlGPL6FniQYgwQix81Fzdg8/firXMlwZ5PB4+c=; b=jvNibn6n9XABfppjCd9HZ5UDLq+AffsSkgmhaS8ySsVV9BJVpLak0SprLVXt1EGjaC TJT+9aPrCOYc6eSPge1qqh9KJCMqjntQjzkzHxpcis7q8kT7Mi0/2N4xbVt2gkulK18y Likxin2H6W37s4mfT36kNmGEupUB7pcFf3E8EoBFBRJtxwKSd79ANJ8reFZgk9XJ91py nPWVnbQeyMsuW6njjXkaIz6A2LL3Gv396Q2BQoP53PNmg6mSe7DGLeZmla/wlbdlyCRh 2wOXqWzVS4MX4bKjavU8tpiLGVuAnDUoN8uo//ATDyitZelyms3S90GsTz3bCZxZHnVk i9oQ== X-Gm-Message-State: AOAM531Txxoqlwvq5/aYHl9/4gnD2+knV+A0ZMh/uTyVBe4PIY55tNL7 a+N9J27l/6a6SuYtl72fJsZFqHaQmbAXnaM/K7pQvw== X-Google-Smtp-Source: ABdhPJw3F7+F60iZn0CdQP2G0e4fw267E9uTBecwRu6mlJcOquAC02V73V/dRVdJcLELNaojhX8wQnOCznbN4AWIA34= X-Received: by 2002:a25:7ec4:: with SMTP id z187mr3957175ybc.136.1623288574490; Wed, 09 Jun 2021 18:29:34 -0700 (PDT) MIME-Version: 1.0 References: <20210601044845.GA12713@lespinasse.org> <20210602033408.GA3229@lespinasse.org> In-Reply-To: <20210602033408.GA3229@lespinasse.org> From: Suren Baghdasaryan Date: Wed, 9 Jun 2021 18:29:23 -0700 Message-ID: Subject: Re: [LSF/MM TOPIC] mmap locking topics To: Michel Lespinasse Cc: Matthew Wilcox , lsf-pc@lists.linux-foundation.org, linux-mm , Laurent Dufour Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=UQZNxuCS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of surenb@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=surenb@google.com X-Stat-Signature: dw9fiqfdhnanf74yrwuuh6j16eiie7y8 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 9F3B72001060 X-HE-Tag: 1623288571-293599 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 1, 2021 at 8:34 PM Michel Lespinasse wrote: > > On Tue, Jun 01, 2021 at 03:01:00PM +0100, Matthew Wilcox wrote: > > On Mon, May 31, 2021 at 09:48:45PM -0700, Michel Lespinasse wrote: > > > I - Speculative page faults > > > > > > The idea there is to avoid taking the mmap lock during page faults, > > > at least for the easier cases. This requiers the fault handler to be > > > a careful to avoid races with mmap writers (and most particularly > > > munmap), and when the new page is ready to be inserted into the user > > > process, to verify, at the last moment (after taking the page table > > > lock), that there has been no race between the fault handler and any > > > mmap writers. Such checks can be implemented locally, without hitting > > > any global locks, which results in very nice scalability improvements > > > when processing concurrent faults. > > > > > > I think the idea is ready for prime time, and a patchset has been proposed, > > > but it is not getting much traction yet. I suspect we will need to discuss > > > the idea in person to figure out the next steps. > > > > There is a lot of interest in this. I disagree with Michel's approach > > in that he wants to use seqlocks to detect whether any modification has > > been made to the process's address space, whereas I want to use the VMA > > tree to detect whether any modification has been made to this VMA. > > I see the sequence count as being the easy & safe approach, but yes it > does have limitations that can lead to unnecessary fast path aborts. > It would be nice checking the VMAs to avoid *some* of these aborts, > but I do not think that is always applicable either - I wrote about > that in https://lwn.net/ml/linux-kernel/20210430224649.GA29203@lespinasse.org/ > ("Thoughts about concurrency checks at the end of the page fault") In Android we use the previous implementation of SPF posted by Laurent Dufour (https://lore.kernel.org/patchwork/project/lkml/list/?series=390741&state=%2A&archive=both) which unfortunately did not make it upstream. It improves start time of some heavy multi-threaded applications and this is one of the most important metrics for mobile devices. This patchset is also one of the biggest out-of-tree patchsets we have to maintain in Android kernel and it would be immensely valuable for us to have it upstreamed. I would love to see this discussed at LSF/MM, will provide as much supporting data as I can and will also push vendors and OEMs to provide data as well. > > > > II - Fine grained MM locking > > > > > > A major limitation of the current mmap lock design is that it covers a > > > process's entire address space. In threaded applications, it is common > > > for threads to issue concurrent requests for non-overlapping parts of > > > the process address space - for example, one thread might be mmaping > > > new memory while another releases a different range, and a third might > > > fault within his own address range too. The current mmap lock design > > > does not take the non-overlapping ranges into consideration, and > > > consequently serialises the 3 above requests rather than letting them > > > proceed in parallel. > > > > > > There has been a lot of work spent mitigating the problem by reducing > > > the mmap lock hold times (for example, dropping the mmap lock during > > > page faults that hit disk, or lowering to a read lock during longer > > > mmap/munmap/populate operations). But this approach is hitting its > > > limits, and I think it would be better to fix the core of the problem > > > by making the mmap lock capable of allowing concurrent non-overlapping > > > operations. > > > > > > I would like to propose an approach that: > > > - separates the mmap lock into two separate locks, one that is only > > > held for short periods of time to protect mm-wide data structures > > > (including the vma tree), and another that functions as a range lock > > > and can be held for longer periods of time; > > > - allows for incremental conversion from the current code to being > > > aware about locking ranges; > > > > > > I have been maintaining a prototype for this, which has been shared > > > with a small set of people. The main holdup is with page fault > > > performance; in order to allow non-overlapping writers to proceed > > > while some page faults are in progress, the prototype needs to > > > maintain a shared structure holding addresses for each pending page > > > fault. Updating this shared structure gets very expenside in high > > > concurrency page fault benchmarks, though it seems quite unnoticeable > > > in macro benchmarks I hae looked at. > > > > Here I have larger disagreements with Michel. I do not believe the > > range lock is a useful tool for this problem. > > Regardless of any proposed solution, do you agree that most of the cases > where mmap lock blocking happens are between non-overlapping memory > operations that could conceivably be handled concurrently ? > > In other words - do you believe that range locks would be too slow to > be a useful solution, or is it that you do not think they would > actually solve the issue ? > > > The two topics above seem large enough, but there are other important > > users of the mmap_sem that also hit contention. /proc/$pid/maps, smaps > > and similar files hit priority inversion problems which have been reduced, > > but not solved. > > Yes - I do think this would be worth discussing too. Not sure if that is > a separate topic, or if this should be brought under a larger theme. > > Generally - I think there are many issues people have with mmap > locking, and it's been really hard to make progress addressing these - > even when prototype solutions exist - due to a lack of concensus (many > of the people involved have different ideas as to which of the issues > are important to them). But I think that's what makes this an important > topic to be discussed ? > > -- > Michel "walken" Lespinasse >