From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 852B5C433EF for ; Sat, 8 Jan 2022 22:04:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C5C9E6B0075; Sat, 8 Jan 2022 17:04:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE4CC6B007B; Sat, 8 Jan 2022 17:04:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A37B96B007D; Sat, 8 Jan 2022 17:04:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id 8E2C76B0075 for ; Sat, 8 Jan 2022 17:04:42 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4D34F9368F for ; Sat, 8 Jan 2022 22:04:42 +0000 (UTC) X-FDA: 79008500004.12.779EA20 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf15.hostedemail.com (Postfix) with ESMTP id C0F81A0015 for ; Sat, 8 Jan 2022 22:04:41 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id BF205B80972; Sat, 8 Jan 2022 22:04:39 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE819C36AED; Sat, 8 Jan 2022 22:04:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641679478; bh=YCkcNIFoSnQTOpI44hdO2U+NzN6m2dKXS4DTiEKgJpY=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=b2kT3brG2TYG8+XnNKzCWzIgj9tCYKhpW68fKwPvC6Zjr7R9OFM5obLsZ3+MAaeXG IXS+327Txxq0sp7OKJZwkd2/Qm4+LHctoAflIMiDBmJalv8JYhdTJsz/hwH0K+QFcj t2IwpmqabVvMzSyNIRbGz1ZrwvNVrmcaSm3wPoFF/zLBHkYkuArmUSOKEl6XEVeNwB g/jim2CHvXnn35BshiToJ4hALP5nIVDweUTu6DTEu3X4CTEqrWQywx9MDkcRUmV0EC RYnHJ4UOMudp6x6Up6Zgbe3bvho0n22bJhySwHVxnuVhq9RCVKDLGpdObf0k8ALXcw Uh9rQqQIB2QKg== Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 9375027C0054; Sat, 8 Jan 2022 17:04:36 -0500 (EST) Received: from imap48 ([10.202.2.98]) by compute6.internal (MEProxy); Sat, 08 Jan 2022 17:04:36 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrudeghedgudefvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtgfesthhqredtreerjeenucfhrhhomhepfdet nhguhicunfhuthhomhhirhhskhhifdcuoehluhhtoheskhgvrhhnvghlrdhorhhgqeenuc ggtffrrghtthgvrhhnpedvleehjeejvefhuddtgeegffdtjedtffegveethedvgfejieev ieeufeevuedvteenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpegrnhguhidomhgvshhmthhprghuthhhphgvrhhsohhnrghlihhthidqudduiedu keehieefvddqvdeifeduieeitdekqdhluhhtoheppehkvghrnhgvlhdrohhrgheslhhinh hugidrlhhuthhordhush X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 751B321E006E; Sat, 8 Jan 2022 17:04:35 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-4526-gbc24f4957e-fm-20220105.001-gbc24f495 Mime-Version: 1.0 Message-Id: <3586aa63-2dd2-4569-b9b9-f51080962ff2@www.fastmail.com> In-Reply-To: References: <7c9c388c388df8e88bb5d14828053ac0cb11cf69.1641659630.git.luto@kernel.org> Date: Sat, 08 Jan 2022 15:04:14 -0700 From: "Andy Lutomirski" To: "Linus Torvalds" Cc: "Andrew Morton" , Linux-MM , "Nicholas Piggin" , "Anton Blanchard" , "Benjamin Herrenschmidt" , "Paul Mackerras" , "Randy Dunlap" , linux-arch , "the arch/x86 maintainers" , "Rik van Riel" , "Dave Hansen" , "Peter Zijlstra (Intel)" , "Nadav Amit" , "Mathieu Desnoyers" Subject: Re: [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms Content-Type: text/plain;charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C0F81A0015 X-Stat-Signature: juszxmnc8dop53ah8a6o5ak1qkhgt5dx Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=b2kT3brG; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of luto@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=luto@kernel.org X-HE-Tag: 1641679481-686907 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jan 8, 2022, at 12:22 PM, Linus Torvalds wrote: > On Sat, Jan 8, 2022 at 8:44 AM Andy Lutomirski wrote: >> >> To improve scalability, this patch adds a percpu hazard pointer schem= e to >> keep lazily-used mms alive. Each CPU has a single pointer to an mm t= hat >> must not be freed, and __mmput() checks the pointers belonging to all= CPUs >> that might be lazily using the mm in question. > > Ugh. This feels horribly fragile to me, and also looks like it makes > some common cases potentially quite expensive for machines with large > CPU counts if they don't do that mm_cpumask optimization - which in > turn feels quite fragile as well. > > IOW, this just feels *complicated*. > > And I think it's overly so. I get the strong feeling that we could > make the rules much simpler and more straightforward. > > For example, how about we make the rules be There there, Linus, not everything is as simple^Wincapable as x86 bare m= etal, and mm_cpumask does not have useful cross-arch semantics. Is that= good? > > - a lazy TLB mm reference requires that there's an actual active user > of that mm (ie "mm_users > 0") > > - the last mm_users decrement (ie __mmput) forces a TLB flush, and > that TLB flush must make sure that no lazy users exist (which I think > it does already anyway). It does, on x86 bare metal, in exit_mmap(). It=E2=80=99s implicit, but = it could be made explicit, as below. > > Doesn't that seem like a really simple set of rules? > > And the nice thing about it is that we *already* do that required TLB > flush in all normal circumstances. __mmput() already calls > exit_mmap(), and exit_mm() already forces that TLB flush in every > normal situation. Exactly. On x86 bare metal and similar architectures, this flush is done= by IPI, which involves a loop over all CPUs that might be using the mm.= And other patches in this series add the core ability for x86 to shoot= down the lazy TLB cleanly so the core drops its reference and wires it = up for x86. > > So we might have to make sure that every architecture really does that > "drop lazy mms on TLB flush", and maybe add a flag to the existing > 'struct mmu_gather tlb' to make sure that flush actually always > happens (even if the process somehow managed to unmap all vma's even > before exiting). So this requires that all architectures actually walk all relevant CPUs = to see if an IPI is needed and send that IPI. On architectures that act= ually need an IPI anyway (x86 bare metal, powerpc (I think) and others, = fine. But on architectures with a broadcast-to-all-CPUs flush (ARM64 IIU= C), then the extra IPI will be much much slower than a simple load-acqui= re in a loop. In fact, arm64 doesn=E2=80=99t even track mm_cpumask at all last time I = checked, so even an IPI lazy shoot down would require looping *all* CPUs= , doing a load-acquire, and possibly doing an IPI. I much prefer doing a= load-acquire and possibly a cmpxchg. (And x86 PV can do hypercall flushes. If a bunch of vCPUs are not runni= ng, an IPI shootdown will end up sleeping until they run, whereas this p= atch will allow the hypervisor to leave them asleep and thus to finish _= _mmput without waking them. This only matters on a CPU-oversubscribed ho= st, but still. And it kind of looks like hardware remote flushes are co= ming in AMD land eventually.) But yes, I fully agree that this patch is complicated and subtle. > > Is there something silly I'm missing? Somebody pat me on the head, and > say "There, there, Linus, don't try to get involved with things you > don't understand.." and explain to me in small words. > > Linus