From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 903E6C433F5 for ; Mon, 10 Jan 2022 20:53:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E84876B0071; Mon, 10 Jan 2022 15:53:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E32706B0073; Mon, 10 Jan 2022 15:53:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD3966B0074; Mon, 10 Jan 2022 15:53:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id C13A16B0071 for ; Mon, 10 Jan 2022 15:53:27 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 5B4FE181DFAA2 for ; Mon, 10 Jan 2022 20:53:27 +0000 (UTC) X-FDA: 79015578054.28.F9C99C9 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id 43471A0014 for ; Mon, 10 Jan 2022 20:53:26 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id ACF8EB8107D; Mon, 10 Jan 2022 20:53:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A8C2AC36AEF; Mon, 10 Jan 2022 20:53:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641848003; bh=0PW5r2nP460ugTGZVLLTEhHzGNNGpRMC9O/OqvQe9os=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=pCmSKdi+bgYvC07R1azklpA2Fy2qqBRuFn2xNiw0cwTHQCzZHlDNuar0TCQDuHY2m H8eUUSkX0QprEZELN7hLsN4hJG6bobmfYhaywHGde0PhVbQBXA8qQdZxWx02W53dPH B6eICKx5HvMCNt0ku7ffCPS/BIit3JjMN/fzO2pfQbJNjuZZr2dOsgMbeTGgxRBUqy OhFSfvBkjR2P/3ZHOlNE83mx9xxiIIxAr/cE9kgP+IFB7kjMxq71Y+J5UvFLO5NId+ U+LahHw7uV6cPwc98WdodXl7HTsk2mFAZmLOP/tAuimoQ7W85xJcWRoqEd1oEpC0iJ bm+yfuc7J6I+w== Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 657B627C0054; Mon, 10 Jan 2022 15:53:21 -0500 (EST) Received: from imap48 ([10.202.2.98]) by compute6.internal (MEProxy); Mon, 10 Jan 2022 15:53:21 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrudehuddgudduiecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpedvleehjeejvefhuddtgeegffdtjedtffegveethedvgfejieev ieeufeevuedvteenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id A744D21E0073; Mon, 10 Jan 2022 15:53:20 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-4527-g032417b6d6-fm-20220109.002-g032417b6 Mime-Version: 1.0 Message-Id: <0d905aef-f53c-4102-931f-a22edd084fae@www.fastmail.com> In-Reply-To: <1641790309.2vqc26hwm3.astroid@bobo.none> References: <7c9c388c388df8e88bb5d14828053ac0cb11cf69.1641659630.git.luto@kernel.org> <3586aa63-2dd2-4569-b9b9-f51080962ff2@www.fastmail.com> <430e3db1-693f-4d46-bebf-0a953fe6c2fc@www.fastmail.com> <484a7f37-ceed-44f6-8629-0e67a0860dc8@www.fastmail.com> <1641790309.2vqc26hwm3.astroid@bobo.none> Date: Mon, 10 Jan 2022 12:52:49 -0800 From: "Andy Lutomirski" To: "Nicholas Piggin" , "Linus Torvalds" Cc: "Andrew Morton" , "Anton Blanchard" , "Benjamin Herrenschmidt" , "Catalin Marinas" , "Dave Hansen" , linux-arch , Linux-MM , "Mathieu Desnoyers" , "Nadav Amit" , "Paul Mackerras" , "Peter Zijlstra (Intel)" , "Randy Dunlap" , "Rik van Riel" , "Will Deacon" , "the arch/x86 maintainers" Subject: Re: [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 43471A0014 X-Stat-Signature: bxmz35x6aeeantrnc9czgnwg5hmx8g6m Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=pCmSKdi+; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of luto@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=luto@kernel.org X-Rspamd-Server: rspam08 X-HE-Tag: 1641848006-872409 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jan 9, 2022, at 8:56 PM, Nicholas Piggin wrote: > Excerpts from Linus Torvalds's message of January 10, 2022 7:51 am: >> [ Ugh, I actually went back and looked at Nick's patches again, to >> just verify my memory, and they weren't as pretty as I thought they >> were ] >>=20 >> On Sun, Jan 9, 2022 at 12:48 PM Linus Torvalds >> wrote: >>> >>> I'd much rather have a *much* smaller patch that says "on x86 and >>> powerpc, we don't need this overhead at all". >>=20 >> For some reason I thought Nick's patch worked at "last mmput" time and >> the TLB flush IPIs that happen at that point anyway would then make >> sure any lazy TLB is cleaned up. >>=20 >> But that's not actually what it does. It ties the >> MMU_LAZY_TLB_REFCOUNT to an explicit TLB shootdown triggered by the >> last mmdrop() instead. Because it really tied the whole logic to the >> mm_count logic (and made lazy tlb to not do mm_count) rather than the >> mm_users thing I mis-remembered it doing. > > It does this because on powerpc with hash MMU, we can't use IPIs for > TLB shootdowns. I know nothing about powerpc=E2=80=99s mmu. If you can=E2=80=99t do IPI = shootdowns, it sounds like the hazard pointer scheme might actually be p= retty good. > >> So at least some of my arguments were based on me just mis-remembering >> what Nick's patch actually did (mainly because I mentally recreated >> the patch from "Nick did something like this" and what I thought would >> be the way to do it on x86). > > With powerpc with the radix MMU using IPI based shootdowns, we can=20 > actually do the switch-away-from-lazy on the final TLB flush and the > final broadcast shootdown thing becomes a no-op. I didn't post that > additional patch because it's powerpc-specific and I didn't want to > post more code so widely. > >> So I guess I have to recant my arguments. >>=20 >> I still think my "get rid of lazy at last mmput" model should work, >> and would be a perfect match for x86, but I can't really point to Nick >> having done that. >>=20 >> So I was full of BS. >>=20 >> Hmm. I'd love to try to actually create a patch that does that "Nick >> thing", but on last mmput() (ie when __mmput triggers). Because I >> think this is interesting. But then I look at my schedule for the >> upcoming week, and I go "I don't have a leg to stand on in this >> discussion, and I'm just all hot air". > > I agree Andy's approach is very complicated and adds more overhead than > necessary for powerpc, which is why I don't want to use it. I'm still > not entirely sure what the big problem would be to convert x86 to use > it, I admit I haven't kept up with the exact details of its lazy tlb > mm handling recently though. The big problem is the entire remainder of this series! If x86 is going= to do shootdowns without mm_count, I want the result to work and be mai= ntainable. A few of the issues that needed solving: - x86 tracks usage of the lazy mm on CPUs that have it loaded into the M= MU, not CPUs that have it in active_mm. Getting this in sync needed cor= e changes. - mmgrab and mmdrop are barriers, and core code relies on that. If we ge= t rid of a bunch of calls (conditionally), we need to stop depending on = the barriers. I fixed this. - There were too many mmgrab and mmdrop calls, and the call sites had di= fferent semantics and different refcounting rules (thanks, kthread). I = cleaned this up. - If we do a shootdown instead of a refcount, then, when exit() tears do= wn its mm, we are lazily using *that* mm when we do the shootdowns. If a= ctive_mm continues to point to the being-freed mm and an NMI tries to de= reference it, we=E2=80=99re toast. I fixed those issues. - If we do a UEFI runtime service call while lazy or a text_poke while l= azy and the mm goes away while this is happening, we would blow up. Refc= ounting prevents this but, in current kernels, a shootdown IPI on x86 wo= uld not prevent this. I fixed these issues (and removed duplicate code). My point here is that the current lazy mm code is a huge mess. 90% of th= e complexity in this series is cleaning up core messiness and x86 messin= ess. I would still like to get rid of ->active_mm entirely (it appears t= o serve no good purpose on any architecture), it that can be saved for = later, I think.