From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 742BEC2B9F4 for ; Thu, 17 Jun 2021 06:52:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 23BA3613C1 for ; Thu, 17 Jun 2021 06:52:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 23BA3613C1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B7F326B0071; Thu, 17 Jun 2021 02:52:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B57516B0072; Thu, 17 Jun 2021 02:52:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F7A36B0073; Thu, 17 Jun 2021 02:52:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0254.hostedemail.com [216.40.44.254]) by kanga.kvack.org (Postfix) with ESMTP id 6E8436B0071 for ; Thu, 17 Jun 2021 02:52:27 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 12D13180AD806 for ; Thu, 17 Jun 2021 06:52:27 +0000 (UTC) X-FDA: 78262297134.02.D00BC04 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf07.hostedemail.com (Postfix) with ESMTP id 8FD2CA000241 for ; Thu, 17 Jun 2021 06:52:15 +0000 (UTC) Received: by mail-pj1-f43.google.com with SMTP id p4-20020a17090a9304b029016f3020d867so1182912pjo.3 for ; Wed, 16 Jun 2021 23:52:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:references:in-reply-to:mime-version :message-id:content-transfer-encoding; bh=28kUQfGygV49IF31XNAuBhBzoibrbYMJaDYm3b4QmS0=; b=HmycbB2P6Pqy9aesrWyPbd/h2iBv/ykl6YMZ+ZG/677lfq7IW1vxRZlU//yKXG+T29 dWb8iquVluzOVcHD/0NS76AZ44HJm+Rr3jxcYrbNwHEOI5KHCy+5PRXmkgQl4VQ3ghAj f1iR0ix45yGRfnSZstlyFCf6U5kg9v8LsHWoHsk5n1uprBj1kHhDjSLZfA0MXC0HZpzF 9Zf+VG8+vPPyAIojCskKeLCj/YMBL8bzQXt4NSafaNcv+ytzqVc1e14adamtutngTf21 O+MhTZu1oOladIm1fSJorScbtD3KtIvtJDQUj4CRBHWVaVNnE/x0iAfOIoaGJNU18wPW eojg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:references:in-reply-to :mime-version:message-id:content-transfer-encoding; bh=28kUQfGygV49IF31XNAuBhBzoibrbYMJaDYm3b4QmS0=; b=Z9Dr/fEeFutTkn2r6DM9OG8JYDbA0cxE5MAgduVgJy49Yz+5PnBDdiAhZ9/X+sg6zf vGJ2AKpIIYflGh6za0BGfZGX7m8wNH+1jZmPDE1j+LCNBRS23EBbCYskZ6QiBglrwXcu 2HEbebuWOxFjQ7IzgibokkN83YA1lJeQBK3gQnFik/2r9YsbX04++wpyLtGYxKZrw3+Q Cqex9Q9XLRNTssaRzEXb0FxanJx9ZYlYa9cJPHZcu4feq/aND4Rj8zFRu99HgdzZbMV0 QssoE4kFl9xy+DSbmRVb2R4ILqH+z+vaf6FWvF8NCgJRMrV3NXlELdqUnkX9gngBqNho hMfw== X-Gm-Message-State: AOAM533o2YyFRwAMAL38RFRScrTOv90K99rcRnNMROHc2J2G1Vh++QGT g6JiQXQUv27WDeKnn+Haa5Y= X-Google-Smtp-Source: ABdhPJw8LPzJCxKDAIzcVYoSW1B87WaU9dpjrOUcjVwdk9HTgMZVRF1Alr6+o1tlW706DhvOOKcgkw== X-Received: by 2002:a17:902:c78a:b029:109:edbb:44de with SMTP id w10-20020a170902c78ab0290109edbb44demr3252959pla.6.1623912715929; Wed, 16 Jun 2021 23:51:55 -0700 (PDT) Received: from localhost (60-242-147-73.tpgi.com.au. [60.242.147.73]) by smtp.gmail.com with ESMTPSA id g63sm3989091pfb.55.2021.06.16.23.51.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Jun 2021 23:51:55 -0700 (PDT) Date: Thu, 17 Jun 2021 16:51:49 +1000 From: Nicholas Piggin Subject: Re: [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit To: Andy Lutomirski , "Peter Zijlstra (Intel)" , Rik van Riel Cc: Andrew Morton , Dave Hansen , Linux Kernel Mailing List , linux-mm@kvack.org, Mathieu Desnoyers , "Paul E. McKenney" , the arch/x86 maintainers References: <1623816595.myt8wbkcar.astroid@bobo.none> <617cb897-58b1-8266-ecec-ef210832e927@kernel.org> <1623893358.bbty474jyy.astroid@bobo.none> <58b949fb-663e-4675-8592-25933a3e361c@www.fastmail.com> In-Reply-To: MIME-Version: 1.0 Message-Id: <1623911501.q97zemobmw.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=HmycbB2P; spf=pass (imf07.hostedemail.com: domain of npiggin@gmail.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=npiggin@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: 38w3663jhgs1y5jrwo695dy68z839cbx X-Rspamd-Queue-Id: 8FD2CA000241 X-Rspamd-Server: rspam06 X-HE-Tag: 1623912735-271419 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Excerpts from Andy Lutomirski's message of June 17, 2021 3:32 pm: > On Wed, Jun 16, 2021, at 7:57 PM, Andy Lutomirski wrote: >>=20 >>=20 >> On Wed, Jun 16, 2021, at 6:37 PM, Nicholas Piggin wrote: >> > Excerpts from Andy Lutomirski's message of June 17, 2021 4:41 am: >> > > On 6/16/21 12:35 AM, Peter Zijlstra wrote: >> > >> On Wed, Jun 16, 2021 at 02:19:49PM +1000, Nicholas Piggin wrote: >> > >>> Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 pm: >> > >>>> membarrier() needs a barrier after any CPU changes mm. There is = currently >> > >>>> a comment explaining why this barrier probably exists in all case= s. This >> > >>>> is very fragile -- any change to the relevant parts of the schedu= ler >> > >>>> might get rid of these barriers, and it's not really clear to me = that >> > >>>> the barrier actually exists in all necessary cases. >> > >>> >> > >>> The comments and barriers in the mmdrop() hunks? I don't see what = is=20 >> > >>> fragile or maybe-buggy about this. The barrier definitely exists. >> > >>> >> > >>> And any change can change anything, that doesn't make it fragile. = My >> > >>> lazy tlb refcounting change avoids the mmdrop in some cases, but i= t >> > >>> replaces it with smp_mb for example. >> > >>=20 >> > >> I'm with Nick again, on this. You're adding extra barriers for no >> > >> discernible reason, that's not generally encouraged, seeing how ext= ra >> > >> barriers is extra slow. >> > >>=20 >> > >> Both mmdrop() itself, as well as the callsite have comments saying = how >> > >> membarrier relies on the implied barrier, what's fragile about that= ? >> > >>=20 >> > >=20 >> > > My real motivation is that mmgrab() and mmdrop() don't actually need= to >> > > be full barriers. The current implementation has them being full >> > > barriers, and the current implementation is quite slow. So let's tr= y >> > > that commit message again: >> > >=20 >> > > membarrier() needs a barrier after any CPU changes mm. There is cur= rently >> > > a comment explaining why this barrier probably exists in all cases. = The >> > > logic is based on ensuring that the barrier exists on every control = flow >> > > path through the scheduler. It also relies on mmgrab() and mmdrop()= being >> > > full barriers. >> > >=20 >> > > mmgrab() and mmdrop() would be better if they were not full barriers= . As a >> > > trivial optimization, mmgrab() could use a relaxed atomic and mmdrop= () >> > > could use a release on architectures that have these operations. >> >=20 >> > I'm not against the idea, I've looked at something similar before (not >> > for mmdrop but a different primitive). Also my lazy tlb shootdown seri= es=20 >> > could possibly take advantage of this, I might cherry pick it and test= =20 >> > performance :) >> >=20 >> > I don't think it belongs in this series though. Should go together wit= h >> > something that takes advantage of it. >>=20 >> I=E2=80=99m going to see if I can get hazard pointers into shape quickly= . >=20 > Here it is. Not even boot tested! >=20 > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h= =3Dsched/lazymm&id=3Decc3992c36cb88087df9c537e2326efb51c95e31 >=20 > Nick, I think you can accomplish much the same thing as your patch by: >=20 > #define for_each_possible_lazymm_cpu while (false) I'm not sure what you mean? For powerpc, other CPUs can be using the mm=20 as lazy at this point. I must be missing something. >=20 > although a more clever definition might be even more performant. >=20 > I would appreciate everyone's thoughts as to whether this scheme is sane. powerpc has no use for it, after the series in akpm's tree there's just a small change required for radix TLB flushing to make the final flush=20 IPI also purge lazies, and then the shootdown scheme runs with zero additional IPIs so essentially no benefit to the hazard pointer games. I have found the additional IPIs aren't bad anyway, so not something=20 we'd bother trying to optmise away on hash, which is slowly being de-prioritized. I must say, I still see active_mm featuring prominently in our patch which comes as a surprise. I would have thought some preparation and=20 cleanup work first to fix the x86 deficienies you were talking about=20 should go in first, I'm eager to see those. But either way I don't see a fundamental reason this couldn't be done to support archs for which=20 the standard or shootdown refcounting options aren't sufficient. IIRC I didn't see a fundamental hole in it last time you posted the idea but I admittedly didn't go through it super carefully. Thanks, Nick >=20 > Paul, I'm adding you for two reasons. First, you seem to enjoy bizarre l= ocking schemes. Secondly, because maybe RCU could actually work here. The= basic idea is that we want to keep an mm_struct from being freed at an ino= pportune time. The problem with naively using RCU is that each CPU can use= one single mm_struct while in an idle extended quiescent state (but not a = user extended quiescent state). So rcu_read_lock() is right out. If RCU c= ould understand this concept, then maybe it could help us, but this seems a= bit out of scope for RCU. >=20 > --Andy >=20