From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AC2BC433F5 for ; Sun, 9 Jan 2022 19:53:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 627F86B0072; Sun, 9 Jan 2022 14:53:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D6546B0073; Sun, 9 Jan 2022 14:53:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 49D5B6B0074; Sun, 9 Jan 2022 14:53:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0066.hostedemail.com [216.40.44.66]) by kanga.kvack.org (Postfix) with ESMTP id 398836B0072 for ; Sun, 9 Jan 2022 14:53:25 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E53021822B82E for ; Sun, 9 Jan 2022 19:53:24 +0000 (UTC) X-FDA: 79011797928.03.1DFBAAA Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by imf13.hostedemail.com (Postfix) with ESMTP id 5276B2000C for ; Sun, 9 Jan 2022 19:53:24 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 78E88B80D97; Sun, 9 Jan 2022 19:53:22 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9A14BC36AE3; Sun, 9 Jan 2022 19:53:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1641758001; bh=yCZUstcghgu9NSj/hFbT7xlPKiqKj92jtzc7pZLwNpU=; h=In-Reply-To:References:Date:From:To:Cc:Subject:From; b=ODKraXgcpU5/fZQIUwZxTMxp+EsJ9gKu1/6wbL1sTHhpUbdrKnIHqEP+8POLRDbpk UQdhbxWPv7LZNPd3Mk8hIwPXhjYyC5h4JB1GyMpGx5VhlVutdMe1Cqv1TL4idjQnPT vx8afs+Xv8/FscIYx3BEx/RZki9wAyzWmWwlTKyM3RoFGdptj2n3R+laSzndRFp+mk /PGT3z8tAhZdZHTpOOIWVumQL1lsGXOfImjla8N7D7SNcuH5EOePMClZtCWNIkZVfa 0pmIPtQPNKMhGPiasTBGveGhkueHqEviUuXdmtURVzhNDmtMhr0F0U1V6qoQoe6C5o rtmoMYMtypgRQ== Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailauth.nyi.internal (Postfix) with ESMTP id 52E8727C0054; Sun, 9 Jan 2022 14:53:19 -0500 (EST) Received: from imap48 ([10.202.2.98]) by compute6.internal (MEProxy); Sun, 09 Jan 2022 14:53:19 -0500 X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrudegkedgudefvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtsehttdertderredtnecuhfhrohhmpedftehn ugihucfnuhhtohhmihhrshhkihdfuceolhhuthhosehkvghrnhgvlhdrohhrgheqnecugg ftrfgrthhtvghrnheptdfhheettddvtedvtedugfeuuefhtddugedvleevleefvdetleff gfefvdekgeefnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homheprghnugihodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduudeiudek heeifedvqddvieefudeiiedtkedqlhhuthhopeepkhgvrhhnvghlrdhorhhgsehlihhnuh igrdhluhhtohdruhhs X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 9DAD121E0073; Sun, 9 Jan 2022 14:53:17 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.5.0-alpha0-4526-gbc24f4957e-fm-20220105.001-gbc24f495 Mime-Version: 1.0 Message-Id: <355c148c-06a8-4e15-a77b-0ea2e22bf708@www.fastmail.com> In-Reply-To: References: <7c9c388c388df8e88bb5d14828053ac0cb11cf69.1641659630.git.luto@kernel.org> <739A3109-04DD-4BA5-A02B-52EE30E820AE@gmail.com> Date: Sun, 09 Jan 2022 11:52:52 -0800 From: "Andy Lutomirski" To: "Linus Torvalds" , "Nadav Amit" Cc: "Andrew Morton" , Linux-MM , "Nicholas Piggin" , "Anton Blanchard" , "Benjamin Herrenschmidt" , "Paul Mackerras" , "Randy Dunlap" , linux-arch , "the arch/x86 maintainers" , "Rik van Riel" , "Dave Hansen" , "Peter Zijlstra (Intel)" , "Mathieu Desnoyers" Subject: Re: [PATCH 16/23] sched: Use lightweight hazard pointers to grab lazy mms Content-Type: text/plain X-Rspamd-Queue-Id: 5276B2000C X-Stat-Signature: b1mtn4gsc8rdjetb6k8449uy5m65x5h7 Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ODKraXgc; spf=pass (imf13.hostedemail.com: domain of luto@kernel.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Rspamd-Server: rspam10 X-HE-Tag: 1641758004-999595 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Jan 9, 2022, at 11:10 AM, Linus Torvalds wrote: > On Sun, Jan 9, 2022 at 12:49 AM Nadav Amit wrote: >> >> I do not know whether it is a pure win, but there is a tradeoff. > > Hmm. I guess only some serious testing would tell. > > On x86, I'd be a bit worried about removing lazy TLB simply because > even with ASID support there (called PCIDs by Intel for NIH reasons), > the actual ASID space on x86 was at least originally very very > limited. > > Architecturally, x86 may expose 12 bits of ASID space, but iirc at > least the first few implementations actually only internally had one > or two bits, and hashed the 12 bits down to that internal very limited > hardware TLB ID space. > > We only use a handful of ASIDs per CPU on x86 partly for this reason > (but also since there's no remote hardware TLB shootdown, there's no > reason to have a bigger global ASID space, so ASIDs aren't _that_ > common). > > And I don't know how many non-PCID x86 systems (perhaps virtualized?) > there might be out there. > > But it would be very interesting to test some "disable lazy tlb" > patch. The main problem workloads tend to be IO, and I'm not sure how > many of the automated performance tests would catch issues. I guess > some threaded pipe ping-pong test (with each thread pinned to > different cores) would show it. My original PCID series actually did remove lazy TLB on x86. I don't remember why, but people objected. The issue isn't the limited PCID space -- IIRC it's just that MOV CR3 is slooooow. If we get rid of lazy TLB on x86, then we are writing CR3 twice on even a very short idle. That adds maybe 1k cycles, which isn't great. > > And I guess there is some load that triggered the original powerpc > patch by Nick&co, and that Andy has been using.. I don't own a big enough machine. The workloads I'm aware of with the problem have massively multithreaded programs using many CPUs, and transitions into and out of lazy mode ping-pong the cacheline. > > Anybody willing to cook up a patch and run some benchmarks? Perhaps > one that basically just replaces "set ->mm to NULL" with "set ->mm to > &init_mm" - so that the lazy TLB code is still *there*, but it never > triggers.. It would > > I think it's mainly 'copy_thread()' in kernel/fork.c and the 'init_mm' > initializer in mm/init-mm.c, but there's probably other things too > that have that knowledge of the special "tsk->mm = NULL" situation. I think, for a little test, we would leave all the mm == NULL code alone and just change the enter-lazy logic. On top of all the cleanups in this series, that would be trivial. > > Linus