From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61663E77188 for ; Fri, 10 Jan 2025 05:32:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A42D46B007B; Fri, 10 Jan 2025 00:32:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9F2AD6B0082; Fri, 10 Jan 2025 00:32:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E0EF6B0083; Fri, 10 Jan 2025 00:32:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 713746B007B for ; Fri, 10 Jan 2025 00:32:43 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1CFB743FFE for ; Fri, 10 Jan 2025 05:32:43 +0000 (UTC) X-FDA: 82990422606.24.A5BCB44 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf17.hostedemail.com (Postfix) with ESMTP id 6DF9040008 for ; Fri, 10 Jan 2025 05:32:40 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736487161; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K2nWM1PRT2qxrke6ryG7EbNvbKNpaiwxaSkYm0QWUhM=; b=DnpGepWiehHkzL5x57BPlrtXWT46PdHfK8vhAtH/luphF22C3HTUrzeWQP/2wrbxhDyzYp BNDwNjLhaPPH5uHRu+l1ORrOs3hyyqYy9PgS5Avh4SFPTdPTMS5ohzNHxt29VJvgzF6hwy 1OlSwkmUaEAYxsnomkOW21tq0390SHg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736487161; a=rsa-sha256; cv=none; b=rFAlAoNaLN9M59yAtQuJM4rWp+PdMxCt+jXvZ/7XcTRzW7ee1SB9iVD52QS+9+JPbkCzhX MApmfW2gMLJfhvp1ELJUmYYprfQky2W5hxB1QQnAtLVUatp+q3mzPPMcUs3pnx356MkAqN S6JtJw4e6owMlYu4dMEOUh1/MJUE9X0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; spf=pass (imf17.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tW7cC-000000004lC-09Hx; Fri, 10 Jan 2025 00:31:24 -0500 Message-ID: <1409344951af9427799bd28d7865c9ea7fa87ed3.camel@surriel.com> Subject: Re: [PATCH 06/12] x86/mm: use INVLPGB for kernel TLB flushes From: Rik van Riel To: Dave Hansen , x86@kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@meta.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, akpm@linux-foundation.org, nadav.amit@gmail.com, zhengqi.arch@bytedance.com, linux-mm@kvack.org Date: Fri, 10 Jan 2025 00:31:23 -0500 In-Reply-To: <426011a9-1fbc-415c-bac7-df5d67417df3@intel.com> References: <20241230175550.4046587-1-riel@surriel.com> <20241230175550.4046587-7-riel@surriel.com> <855298e6e981378c3afeab93b8c3cb821a7a5b88.camel@surriel.com> <426011a9-1fbc-415c-bac7-df5d67417df3@intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.1 (3.54.1-1.fc41) MIME-Version: 1.0 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 6DF9040008 X-Stat-Signature: 4fs43potp7unjeo5ow5pqk41u8633bdx X-Rspam-User: X-HE-Tag: 1736487160-913520 X-HE-Meta: U2FsdGVkX186IfG2m6ErZQd2GRfTkloCA7tsJh67LVod8snhq3HfYmGUgo8PgT1W7+rFMnxITeml8empGJj6kkxxdkUzGnMlt9Qb9NUeQNAkE0dGXqgYlnPPxSBedu5mQIc1PL9XmQQzlGHNdL012/TiO91U7DOkWGGYsG6eHgXyqdayAjtv6B2M/W50RY2QwaVDwgUraKFfRAYdzaBftG8Nnftbjg0Ra21Jhz7lS2Tlgc1/Uh3kZ4JaBrcDpyOZy4PtnBZSKrQqAEvmhaosUGH1ebIiQqWD8fwPXgvESVYiZLY3WkK0LqkBhhgztN/dZjsGK8f2n2tBlsajD/nIym5q3HqcxZTMDEiSD0Rc+zw7X9LPjsQ7xbFUMSZ2dSdd5J/AQ8raJix12A5CiFLQDfUPt4KOGPQiChpF3uycf4fKXlNnb53w+VaDazN6JihqZlGdtlhWOUSEw3UNoSreLj+2je1WQ+gomlmsxHhC841xWXSR7DtaBkaIPq3+RKVUrs4ksYK0oD6LJOCkwbb3tmi0JUd85LkqgzPjSorgnjuOuH/kY881pEdYJ9dWJ4T+C5ddda+0bsFGdFKnmkY0gq6SSJA3aeXhnLOdPQTKtLqjbcrvxJMk+OdngwHpHG6GQkEFrbsY9KZpg8yo1BHouZKNFHTpTyqVnsz1a1sP6fz6QVNOswOwsMrukMn3v5+FOt+8nicgaKoT4p0aq8YUMsBYGvVoh197M7LfYokiVADjG4k/YilKiXl73DV/jiOEoZmZ7vgf5GV2h1ibEIoqAWbru8hJfD3tTGOeFI2CQ1enC2unzGJR0ba8kDl/M7TTh4fl+JB0vlEr7I71ZPTib08Y/IJSrj/WUhlSs1z6jEcaB0d6jPoI1at6o2rd+Yjbxewm7+1KjW3iPJk+veYMc4llI8jJLdp9xIekoSGz7pB8+YDWYaNA3Yp9NKAxkQsr9tjTXPbcrCpprUCBab1 FTg/YN2P 4Q+RYTBthKejFYBr7CxLaeN/PiXxvGLWH0y4F1pJJYPe/GnU65ZyVjq2xu2LUzWTFr/H57ZF8WfbdyzKpSo3hT39j+Lp4+QWwbD7TdWAUFOdyO25rPL4D8RyNSWi594L1xUDnWkS0oGDEovaH2r15s99UJMMc5wRSalv8verbAsZTBjjSzVadCgM9An/8NGCpSxE4/rkZhRCv0PkbnpO5fQsDC2cwyihbuDCkrPmbmOtGmavRl8SLAH2SqBuuG6UsXS1rEFtdJBjE21pT9EzJrI0NSgJHe5dqa8+EfkGbAJUpIS8c9BSOLNOYP+wfD/yvArMtLAMlimV/shh5zzyLclr1ow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 2025-01-09 at 13:18 -0800, Dave Hansen wrote: >=20 > But actually I think INVLPGB is *WAY* better than INVLPG here.=C2=A0 > INVLPG > doesn't have ranged invalidation. It will only architecturally > invalidate multiple 4K entries when the hardware fractured them in > the > first place. I think we should probably take advantage of what > INVLPGB > can do instead of following the INVLPG approach. >=20 > INVLPGB will invalidate a range no matter where the underlying > entries > came from. Its "increment the virtual address at the 2M boundary" > mode > will invalidate entries of any size. That's my reading of the docs at > least. Is that everyone else's reading too? Ohhhh, good point! I glossed over that the first half dozen times I was reading the document, because I was trying to use the ASID, and working to figure out why things kept crashing (turns out I can only=20 use the PCID on bare metal) >=20 > So, let's pick a number "Z" which is >=3D invlpgb_count_max. Z could > arguably be set to tlb_single_page_flush_ceiling. Then do this: >=20 > =C2=A0=C2=A0 4k -> Z*4k =3D> use 4k step > >Z*4k -> Z*2M =3D> use 2M step > >Z*2M =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D> invalidate everything >=20 > Invalidations <=3DZ*4k are exact. They never zap extra TLB entries. >=20 > Invalidations that use the 2M step *might* unnecessarily zap some > extra > 4k mappings in the last 2M, but this is *WAY* better than > invalidating > everything. >=20 This is a great idea. Then the code in get_flush_tlb_info can adjust start, end, and stride_shift as needed. INVLPGB also supports invalidation of an entire 1GB region, so we can take your idea one step further :) With up to 8 pages zapped by a single INVLPGB instruction, and multiple in flight simultaneously, maybe we could set the threshold to 64, for 8 INVLPGBs in flight at once? That way we can invalidate up to 1/8th of a 512 entry range with individual zaps, before just zapping the higher level entry. > "Invalidate everything" obviously stinks, but it should only be for > pretty darn big invalidations.=20 That would only come into play when we get past several GB worth of invalidation. --=20 All Rights Reversed.