From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9586BC2B9F4 for ; Thu, 17 Jun 2021 15:02:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1C71661417 for ; Thu, 17 Jun 2021 15:02:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1C71661417 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5D6CB6B0070; Thu, 17 Jun 2021 11:02:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5855E6B0071; Thu, 17 Jun 2021 11:02:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 426106B0072; Thu, 17 Jun 2021 11:02:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id 0E2A06B0070 for ; Thu, 17 Jun 2021 11:02:18 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A7AE8117FB for ; Thu, 17 Jun 2021 15:02:17 +0000 (UTC) X-FDA: 78263531514.29.FFECBB1 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf12.hostedemail.com (Postfix) with ESMTP id 33215541 for ; Thu, 17 Jun 2021 15:02:02 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id B7D7C613FB; Thu, 17 Jun 2021 15:02:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623942134; bh=EYhlWS7OTYyNZbRfd1Ohd426NpjeeKUY8OwIo+YZT4E=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=inZYgQ5bi8uw6oOAX8EPOOtfKFDme0eliRQCBHGcYtTPCdQgFm7kSehT40IZDhwmM j9yFA4byMAK2cOqkoLtSNQe0bdInutyaeuvRd4VFac83IAkwZhZxn5NYR3JTcX/YXf zSGdF2Y8sk1RriYT/0Jv5jCCMshQ6luu5baJQ2aVGKRrQfgA+inSSeYDJBSY2IWoqc 72ANYFGU4rSMcDqQ8wLB37eYsBI7EHUNfn/tfDO3CJrybo3ksV4DYA3141QjilF6RK BFN7jxPyAtrpYqIRUl+psX5ig39XJdBidkQpgmd1XamdAwax5ASMW4TgUXv/S4TROI zxeePoHFBwLmQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 8B6AF5C02A9; Thu, 17 Jun 2021 08:02:14 -0700 (PDT) Date: Thu, 17 Jun 2021 08:02:14 -0700 From: "Paul E. McKenney" To: Andy Lutomirski Cc: Nicholas Piggin , "Peter Zijlstra (Intel)" , Rik van Riel , Andrew Morton , Dave Hansen , Linux Kernel Mailing List , linux-mm@kvack.org, Mathieu Desnoyers , the arch/x86 maintainers Subject: Re: [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Message-ID: <20210617150214.GX4397@paulmck-ThinkPad-P17-Gen-1> Reply-To: paulmck@kernel.org References: <1623816595.myt8wbkcar.astroid@bobo.none> <617cb897-58b1-8266-ecec-ef210832e927@kernel.org> <1623893358.bbty474jyy.astroid@bobo.none> <58b949fb-663e-4675-8592-25933a3e361c@www.fastmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 33215541 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=inZYgQ5b; spf=pass (imf12.hostedemail.com: domain of "SRS0=4W88=LL=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 198.145.29.99 as permitted sender) smtp.mailfrom="SRS0=4W88=LL=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org"; dmarc=pass (policy=none) header.from=kernel.org X-Stat-Signature: jmrruq9myqt6gjgjha1pit7ohxiy48sn X-HE-Tag: 1623942122-920542 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 16, 2021 at 10:32:15PM -0700, Andy Lutomirski wrote: > On Wed, Jun 16, 2021, at 7:57 PM, Andy Lutomirski wrote: > > On Wed, Jun 16, 2021, at 6:37 PM, Nicholas Piggin wrote: > > > Excerpts from Andy Lutomirski's message of June 17, 2021 4:41 am: > > > > On 6/16/21 12:35 AM, Peter Zijlstra wrote: > > > >> On Wed, Jun 16, 2021 at 02:19:49PM +1000, Nicholas Piggin wrote: > > > >>> Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 p= m: > > > >>>> membarrier() needs a barrier after any CPU changes mm. There = is currently > > > >>>> a comment explaining why this barrier probably exists in all c= ases. This > > > >>>> is very fragile -- any change to the relevant parts of the sch= eduler > > > >>>> might get rid of these barriers, and it's not really clear to = me that > > > >>>> the barrier actually exists in all necessary cases. > > > >>> > > > >>> The comments and barriers in the mmdrop() hunks? I don't see wh= at is=20 > > > >>> fragile or maybe-buggy about this. The barrier definitely exist= s. > > > >>> > > > >>> And any change can change anything, that doesn't make it fragil= e. My > > > >>> lazy tlb refcounting change avoids the mmdrop in some cases, bu= t it > > > >>> replaces it with smp_mb for example. > > > >>=20 > > > >> I'm with Nick again, on this. You're adding extra barriers for n= o > > > >> discernible reason, that's not generally encouraged, seeing how = extra > > > >> barriers is extra slow. > > > >>=20 > > > >> Both mmdrop() itself, as well as the callsite have comments sayi= ng how > > > >> membarrier relies on the implied barrier, what's fragile about t= hat? > > > >>=20 > > > >=20 > > > > My real motivation is that mmgrab() and mmdrop() don't actually n= eed to > > > > be full barriers. The current implementation has them being full > > > > barriers, and the current implementation is quite slow. So let's= try > > > > that commit message again: > > > >=20 > > > > membarrier() needs a barrier after any CPU changes mm. There is = currently > > > > a comment explaining why this barrier probably exists in all case= s. The > > > > logic is based on ensuring that the barrier exists on every contr= ol flow > > > > path through the scheduler. It also relies on mmgrab() and mmdro= p() being > > > > full barriers. > > > >=20 > > > > mmgrab() and mmdrop() would be better if they were not full barri= ers. As a > > > > trivial optimization, mmgrab() could use a relaxed atomic and mmd= rop() > > > > could use a release on architectures that have these operations. > > >=20 > > > I'm not against the idea, I've looked at something similar before (= not > > > for mmdrop but a different primitive). Also my lazy tlb shootdown s= eries=20 > > > could possibly take advantage of this, I might cherry pick it and t= est=20 > > > performance :) > > >=20 > > > I don't think it belongs in this series though. Should go together = with > > > something that takes advantage of it. > >=20 > > I=E2=80=99m going to see if I can get hazard pointers into shape quic= kly. One textbook C implementation is in perfbook CodeSamples/defer/hazptr.[hc= ] git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git A production-tested C++ implementation is in the folly library: https://github.com/facebook/folly/blob/master/folly/synchronization/Hazpt= r.h However, the hazard-pointers get-a-reference operation requires a full barrier. There are ways to optimize this away in some special cases, one of which is used in the folly-library hash-map code. > Here it is. Not even boot tested! >=20 > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?= h=3Dsched/lazymm&id=3Decc3992c36cb88087df9c537e2326efb51c95e31 >=20 > Nick, I think you can accomplish much the same thing as your patch by: >=20 > #define for_each_possible_lazymm_cpu while (false) >=20 > although a more clever definition might be even more performant. >=20 > I would appreciate everyone's thoughts as to whether this scheme is san= e. >=20 > Paul, I'm adding you for two reasons. First, you seem to enjoy bizarre= locking schemes. Secondly, because maybe RCU could actually work here. = The basic idea is that we want to keep an mm_struct from being freed at = an inopportune time. The problem with naively using RCU is that each CPU= can use one single mm_struct while in an idle extended quiescent state (= but not a user extended quiescent state). So rcu_read_lock() is right ou= t. If RCU could understand this concept, then maybe it could help us, bu= t this seems a bit out of scope for RCU. OK, I should look at your patch, but that will be after morning meetings. On RCU and idle, much of the idle code now allows rcu_read_lock() to be directly, thanks to Peter's recent work. Any sort of interrupt or NMI from idle can also use rcu_read_lock(), including the IPIs that are now done directly from idle. RCU_NONIDLE() makes RCU pay attention to the code supplied as its sole argument. Or is your patch really having the CPU expect a mm_struct to stick around across the full idle sojourn, and without the assistance of mmgrab() and mmdrop()? Anyway, off to meetings... Hope this helps in the meantime. Thanx, Paul