From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAB7BE77197 for ; Thu, 9 Jan 2025 21:33:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 71DF46B00AC; Thu, 9 Jan 2025 16:33:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A64B6B00AE; Thu, 9 Jan 2025 16:33:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5200E6B00B0; Thu, 9 Jan 2025 16:33:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 286646B00AC for ; Thu, 9 Jan 2025 16:33:35 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D627D1A0528 for ; Thu, 9 Jan 2025 21:33:34 +0000 (UTC) X-FDA: 82989215148.08.E62C2D5 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf02.hostedemail.com (Postfix) with ESMTP id 0178F8001B for ; Thu, 9 Jan 2025 21:33:32 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4Q3XimGd; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736458413; a=rsa-sha256; cv=none; b=wEQISY/87yaR/22QFhXT4/S0uUJsSxMO/taJ7KQZxivzrh5XNrSUI8BVEu7//Ftpiyvy4N xh+WMXdzOnREagcMV/99zVNamzENdLvlz3f8UHj40n3vkHfCy4ot/XcjPZ6D750yIA7+k5 N61tBRb90tRmd+GR/9tuwx2aIw94Inc= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4Q3XimGd; spf=pass (imf02.hostedemail.com: domain of yosryahmed@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736458413; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d+QILFLTq8KXUF4t8zyDXTIm5Rhb3EsKvRalGyiKQCM=; b=PFsKFdH7bPW3fzZveTg4UpjPf2wZr2meF0CVs0c+P7WZNWPZHj7TpiC+AUkEx6Wcs/RgXa ZuPpmTY5KygMJNy004C0e/+9m22eGeKHk6TeoK/vdSrotDLK0nllu4ZRU5swOQ2VHTZof9 Su+Cyevu94vCk7mO5quTLYpU/nNXy0c= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-46c8474d8daso5880021cf.3 for ; Thu, 09 Jan 2025 13:33:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736458412; x=1737063212; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=d+QILFLTq8KXUF4t8zyDXTIm5Rhb3EsKvRalGyiKQCM=; b=4Q3XimGdLAjmwpUQfQYu1K6JEsQXbGL3VaIz3Vryax3MIZhspe4GRlUMsa2bcNLDRy tcdcCipMXb8l4G/X5kg3LgYNlSirV52XKB112FaCAeGQNI90a8lQlZCkfGK7NHkPCcpt I0/Gxyg8+DmZ2TuYRGi6RNOYlY7PGQBAfFPdah+eUqs4Kqaup08Yd6UI2nHzc9/BY5ny +DDm2QdJaYnmQf8wIpYpjVNGQ+EBLNXuY/2Wc8tV2julCKzViklRLPvaceOcapIQIcWC dV+H5kdO/8HN80YGJ+eLBsNINoC53I6ISAWDL8mbRAqvnxYTNEkhiEdCG1Jv65W0TUiN D6Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736458412; x=1737063212; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d+QILFLTq8KXUF4t8zyDXTIm5Rhb3EsKvRalGyiKQCM=; b=EG+fmY6oVS/OF/fFkThTpWA4UMEmhqdf7x9Km4e6Kn8ZmXFaZidjGqa1VMIVw2wAgl KfujWlIV3EAttnBkqsrcAWisDxUSTxfgRKs0blpy5iAIraMNgiISxXUCKo3r9gY09mut fsnCndyXlnLSHFcHj5OM8uskvi3Y8bs/f61r0Siu9HXyJ4Cj26ahSaPbxkSz4s87oiZ5 6/8N+hoQtQJ0P4Oi58gx40QotjL3EPVaIkVc4JcNXo2EvCy8vWHcn0jaMCq65n3AHTBp sq2y4X5kMsaEo8T/5yXtR08dfOFIUU2JQlg1JDilfdJb/16vrcwEpnAX+g1bSsSzJmsk vf8g== X-Forwarded-Encrypted: i=1; AJvYcCWzTmefy4PobF+T1J4cQ9ShPTlT+gzBH4vYx158NWMFeMaZXr+0Btens+V72ilNQJZf0zM7rhdAUg==@kvack.org X-Gm-Message-State: AOJu0Yz2PWDgSVISVCVIzuqYZbhN5LrwciM/vL1yK1U1Pj/JRCXOeory WyzCDPHTjBlz8z4qPid04XVTvSRd60Jjr2tOetzm2IJh6rfw+mrqMrlz3mGjLt2Yj2CLU1p1hbZ Oc7UuUSKlt9GBnjuAIa2BUHsZhaETMe/8qaDj X-Gm-Gg: ASbGncuRTWJ06/Uw1FVvBXswo9ewA+wveM17ANy/yPsdvUdrDATOOThL/TSJfIsrkNB Z9oNYYp8Pngea+ZiI9VAWvNvVJOq7jEexMwJoozfxif1aupxtCwlzv0MYFVkqu4YnGnQF X-Google-Smtp-Source: AGHT+IF3q0l6LKMW5n/CR/u6dUBPxQa9NNw4Ydyt1nhTgAe/o2PCtC90KSFpP+lYANxnGYuMp1oEoa7yP362cLynF5Q= X-Received: by 2002:a05:6214:1249:b0:6d8:b371:6a0f with SMTP id 6a1803df08f44-6df9b2d5c50mr144413046d6.31.1736458411707; Thu, 09 Jan 2025 13:33:31 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Thu, 9 Jan 2025 13:32:55 -0800 X-Gm-Features: AbW1kvaPCaZFvwm0UuZXKQfD16GYWVqQOUQx6NQaaS0SmVM4zCTtSR27egvJSWI Message-ID: Subject: Re: [PATCH v3 00/12] AMD broadcast TLB invalidation To: Andrew Cooper Cc: akpm@linux-foundation.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jackmanb@google.com, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, luto@kernel.org, mingo@redhat.com, nadav.amit@gmail.com, peterz@infradead.org, reijiw@google.com, riel@surriel.com, tglx@linutronix.de, x86@kernel.org, zhengqi.arch@bytedance.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0178F8001B X-Stat-Signature: z1h97ewk97mg7xod8hzeeru33dp154yj X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1736458412-422573 X-HE-Meta: U2FsdGVkX1+cocBw7sshypC5T74Q5Ewtbg2nMdZUbVPPR/wYXIDKixTzYWdSHrnEojteRsN1IBBZorT1WJppLy/ISmkjQK8xrmimpM55J1iVF52gH34SnUDRfVgbMtkUnWdk7Z+VfUIZIb8CqfXMivOIDUEkixTixWKpMeVXZ+L9ryxLVr4/THWiMtOic+olrsm9e00LQ8TUVAIJsxtRQbubk89vJFra+MFoEFDRCRuCZ7U0un73kBO9FDZvvDi5t7gPAl5s/BUTmqSp4hKFFV3g2fmFRrOYjz6EYXM12pChyF+FVO99jsQgVXWRmq7nnFmGmi7Kj4qoFit/8sxyWTUpt27a3DHSu4FX3tLO/ACXgcIqi7jzLXddxv4DP1NWsndsI0QZ4NGMMrlLhdn60dn+G4OliCT0H2FSSbkZnq6eWczHKg5n6spJtnok816OO3HV9f8bJsUMyrh9GSnariyOdR9viQngiRh3SBy8pn8DdzYV72jk00f0o0zhRlTO9H5cPhhOCjRnRnsYdLtynQEBs+nEqgyDQWn2se9SEU8b0YC21MlTyGl9zPqBOlKpj9x0e19Ux6prhgFso3qCrEhTm5f4tr2e9IFxM6bRLckWyoAWRn3VR/+d5g09ShXhfojKZckkDvXQMY6j6iBE5WeStJ/bK6DFD4HEHorZa40+5Lbqgfmp0H97OG0LKKpExrW7uOBrTAvVyNxBYDuDD4p9j63Ywt1tB9w07Iq1WfYly1bFlwAy/8C+ijuPe2hfwRIXR+0mdkIaxhodrD+ugrXNfZknYTU4s/Na2o4n89c2QyWickluael3sUxsZycjAJ4igbnC3X2Aj05Uslh7DUVnY44J1aRSNoyBEOHGZBJQOYWD7rpKHCFCmSh3xh9/zuUdnO7eTQjTBl3K9BfKP3FU8slhVUGPjphpcrQLs6ZT8CYufffE8/Wsh5rj/tGXXpavtbbUuYl3XdnJ5vJ xazfKuOb W7XT4AGbqXN/MRUSlyt+IwjBdMBSUe9JcKxbpTAeCFY+BG4WEFu+m0QCl3d6efxbmu0mvudXV9JnOZFK06ToZuGVqpsdtbF6MCOPMsqV807Qqfhx4hUvOj9n7OYmayKNij1iftf4NxWcOtkZNlykY0nmLUT535wCoGo+7xahO9gpu3/VkRO2F7Te7B6UhCYMNoYl1hP9Q/noCYgDnz7CbxcsRS/QgZqiV7rPPgvjCnOIrg7LmN1ZS4Fmg1OCQzCIk5RIztlA46kkMxes2vj6gFTvH3wrRiznQ1/5WvZJE7w2V/KUCS45YT5a3xdN0Uro049n7RlxF6tuHTB6I6m+dbtQ/gA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 6:47=E2=80=AFPM Andrew Cooper wrote: > > >> I suspect AMD wouldn't tell us exactly ;) > > > > Well, ideally they would just tell us the conditions under which CPUs > > respond to the broadcast TLB flush or the expectations around latency. > > [Resend, complete this time] > > Disclaimer. I'm not at AMD; I don't know how they implement it; I'm > just a random person on the internet. But, here are a few things that > might be relevant to know. > > AMD's SEV-SNP whitepaper [1] states that RMP permissions "are cached in > the CPU TLB and related structures" and also "When required, hardware > automatically performs TLB invalidations to ensure that all processors > in the system see the updated RMP entry information." > > That sentence doesn't use "broadcast" or "remote", but "all processors" > is a pretty clear clue. Broadcast TLB invalidations are a building > block of all the RMP-manipulation instructions. > > Furthermore, to be useful in this context, they need to be ordered with > memory. Specifically, a new pagewalk mustn't start after an > invalidation, yet observe the stale RMP entry. > > > x86 CPUs do have reasonable forward-progress guarantees, but in order to > achieve forward progress, they need to e.g. guarantee that one memory > access doesn't displace the TLB entry backing a different memory access > from the same instruction, or you could livelock while trying to > complete a single instruction. > > A consequence is that you can't safely invalidate a TLB entry of an > in-progress instruction (although this means only the oldest instruction > in the pipeline, because everything else is speculative and potentially > transient). > > > INVLPGB invalidations are interrupt-like from the point of view of the > remote core, but are microarchitectural and can be taken irrespective of > the architectural Interrupt and Global Interrupt Flags. As a > consequence, they'll need wait until an instruction boundary to be > processed. While not AMD, the Intel RAR whitepaper [2] discusses the > handling of RARs on the remote processor, and they share a number of > constraints in common with INVLPGB. > > > Overall, I'd expect the INVLPGB instructions to be pretty quick in and > of themselves; interestingly, they're not identified as architecturally > serialising. The broadcast is probably posted, and will be dealt with > by remote processors on the subsequent instruction boundary. TLBSYNC is > the barrier to wait until the invalidations have been processed, and > this will block for an unspecified length of time, probably bounded by > the "longest" instruction in progress on a remote CPU. e.g. I expect it > probably will suck if you have to wait for a WBINVD instruction to > complete on a remote CPU. > > That said, architectural IPIs have the same conditions too, except on > top of that you've got to run a whole interrupt handler. So, with > reasonable confidence, however slow TLBSYNC might be in the worst case, > it's got absolutely nothing on the overhead of doing invalidations the > old fashioned way. Generally speaking, I am not arguing that TLB flush IPIs are worse than INLPGB/TLBSYNC, I think we should expect the latter to perform better in most cases. But there is a difference here because the processor executing TLBSYNC cannot serve interrupts or NMIs while waiting for remote CPUs, because they have to be served at an instruction boundary, right? Unless TLBSYNC is an exception to that rule, or its execution is considered completed before remote CPUs respond (i.e. the CPU executes it quickly then enters into a wait doing "nothing"). There are also intriguing corner cases that are not documented. For example, you mention that it's reasonable to expect that a remote CPU does not serve TLBSYNC except at the instruction boundary. What if that CPU is executing TLBSYNC? Do we have to wait for its execution to complete? Is it possible to end up in a deadlock? This goes back to my previous point about whether TLBSYNC is a special case or when it's considered to have finished executing. I am sure people thought about that and I am probably worried over nothing, but there's little details here so one has to speculate. Again, sorry if I am making a fuss over nothing and it's all in my head. > > > ~Andrew > > [1] > https://www.amd.com/content/dam/amd/en/documents/epyc-business-docs/white= -papers/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-mo= re.pdf > [2] > https://www.intel.com/content/dam/develop/external/us/en/documents/341431= -remote-action-request-white-paper.pdf