From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64AC0C282EC for ; Fri, 14 Mar 2025 13:14:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2E35A280002; Fri, 14 Mar 2025 09:14:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 29067280001; Fri, 14 Mar 2025 09:14:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15960280002; Fri, 14 Mar 2025 09:14:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EC3BE280001 for ; Fri, 14 Mar 2025 09:14:45 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 27AEF1CA9E0 for ; Fri, 14 Mar 2025 13:14:47 +0000 (UTC) X-FDA: 83220201414.11.8631A1A Received: from mail.alien8.de (mail.alien8.de [65.109.113.108]) by imf25.hostedemail.com (Postfix) with ESMTP id 7E3F6A0009 for ; Fri, 14 Mar 2025 13:14:44 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=buQiEUuv; spf=pass (imf25.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741958085; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rK3jFCPqqItfl6ZwdgpRWuAuoUZAqjm53rLp3ckpUT0=; b=XkPMM4HpxrGruHzkI4fK8hc3VpwfF3f8OAiJj23u7VXxsPBvMFQ/Es9aoPKeXmxdYwl3Of wWetEvEo/MPwMWyTHzGkhqj1kSxcdWgzchVAjrPlkDYpigJUVrOnETtdETa6SlGdjx/VaQ eeHMvCcKObnb9l1axyRt0uCUrKvIF2A= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=alien8.de header.s=alien8 header.b=buQiEUuv; spf=pass (imf25.hostedemail.com: domain of bp@alien8.de designates 65.109.113.108 as permitted sender) smtp.mailfrom=bp@alien8.de; dmarc=pass (policy=none) header.from=alien8.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741958085; a=rsa-sha256; cv=none; b=lLvb/o9wdFmsNV5sgXltPDjYt2764Q1Nfpqs+vyCdCiIOvfp2VKx9yc6IVrbZEsuNS8QS2 uu+NLKnLiUacIXbdIDmr8l7QozMPEJ17EqxoYgYElZFKfGRQuuyv7d0bgVjS72J7xORqI9 bs85jqrqTP6MzMUzTiuHaRdeqEeb7SQ= Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTP id 270D340E015E; Fri, 14 Mar 2025 13:14:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at mail.alien8.de Received: from mail.alien8.de ([127.0.0.1]) by localhost (mail.alien8.de [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id z-pXS9vvo2h3; Fri, 14 Mar 2025 13:14:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=alien8; t=1741958076; bh=rK3jFCPqqItfl6ZwdgpRWuAuoUZAqjm53rLp3ckpUT0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=buQiEUuvGPz5ahlbn9PZtwPr+7vVbnqzon3HVoWjnvJq7YlGd1eCukMFuwc5DbzNq +BBRyB87TCk0D6ai9KuTHjQxQgVo83ZljRWXRyJ5xCSwl3aEOfYGhGUX3zDZDLnqn+ s2MU7ElEGXfjAV89BWTTnroEzUBIS3Wf3RUT/kV2+7OjClA9DQG/Hvopy2xRaK7Bn8 lmG1GyeoHZ8k5v3BYHWevFZgw8ryzPcmzY7T/A8YyzHsCdFH//sIIUfXajgXZbotnx IhXKl0/iiNsMHRKO7/uUyXgXoiIaz5gaEaAA8W8VdjBES3RrkRV+0aDAd3j+6kSHUV 6k++mMLZZyAN2jc0Hu6M4sR91Y9zoHt/jS7tc7Bgbr/ypFP1R4MkETlxEvEn8gvWO/ Y7rPYfvUfrnjUTMWCLU9v/5/s2XYAqsS6ftcIib2NBhMmT2R7V3ROm6PNzSbzb3Rvl 9JgDLtLPWisWvlUMzaDaH5Unqhf8eG2wm/sVnKTsR98OQDs/Ooj0UY6GeH6Pu6P1J/ 2qAAJ83OLDHUGkfe0hPZzL6plpiHGeQlHlXjJoHhtm8q57frspkNwtX2/53ad6KY0u S32/I19c9GIX27/+TaNuIc4iI9x2XZe0tre/KLSLBG8Wtyc5d6Fs/HOtJAVxxel8/N jmLhNS9GU2k3ONwVLl4hBbDw= Received: from zn.tnic (pd95303ce.dip0.t-ipconnect.de [217.83.3.206]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256) server-digest SHA256) (No client certificate requested) by mail.alien8.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 1A99F40E023B; Fri, 14 Mar 2025 13:14:25 +0000 (UTC) Date: Fri, 14 Mar 2025 14:14:19 +0100 From: Borislav Petkov To: Brendan Jackman Cc: akpm@linux-foundation.org, dave.hansen@linux.intel.com, yosryahmed@google.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, peterz@infradead.org, seanjc@google.com, tglx@linutronix.de, x86@kernel.org Subject: Re: [PATCH RFC v2 03/29] mm: asi: Introduce ASI core API Message-ID: <20250314131419.GJZ9Qrq8scAtDyBUcg@fat_crate.local> References: <20250227120607.GPZ8BVL2762we1j3uE@fat_crate.local> <20250228084355.2061899-1-jackmanb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250228084355.2061899-1-jackmanb@google.com> X-Rspam-User: X-Rspamd-Queue-Id: 7E3F6A0009 X-Rspamd-Server: rspam03 X-Stat-Signature: b8p8cncbcnbbu7ibookaf7zg4ncy39dy X-HE-Tag: 1741958084-872043 X-HE-Meta: U2FsdGVkX19xPUIo2ARogiht6csycszSvD4SKrx/v/c8aFDvxyjA7osWBo/D5poRbQXncSoPtxZaxOEvwphLAIfU6v6ZpPio2VL+tE5PNGXCHbwvpz5fKCM8mcgP6A/kSQx+G6E9AP9+dWwivDFl9OI+qA6rTr8i/tJdAFdsFo27dT+nEpWZlIcYloNkmRybDB/bbNOerAz82VZlm5tyoV01Wd4Bxl0B8lfgs7BYvFBp4Q7nVB5xuMkgEWilrIQGKSZTxMjtmJb5vO042s4qZwO11wmrJyLKauLE9Fmqj1xLnlEtbuWqfkohYKgs6fhmpBro2p2yD16Y4Zh+OHYGsHbqGQeuY1A8EpVqyUavPIxNIJ7q3u7LbT4BzLXUUiijI6AZegpS62/lq2JdT8yB/5L3n9L1JzL/YfqSLpwUaUuQ28XIYnQgNVIbEYlgozKEs7mteGz257D+KYWQgg2/0BIem+uoFntmHgKFDopJ3th1o79fBOb12s0NCiFUjDv8S9rs6hvok6pzMzVTDVf7+ppRLoLW4boyCPSZNbizb94Agm4lPcbgL/1Ho6swu1fZP131uKfJbhFSsie6pk8RiixsS5G0xGjVsttautXepBHjLmnaQPRy49pkNv2MlRbTk21l2QFa7zeWu3ht+R/MvQnNCHZZsfIf76S3BDT1ns3JXfvjg6ZHyjqYbGG5vDcfmbXMYKTYLLat+pCqv5XvhKqX1NPgwW1kbLhcpvgqDoKJ4yKMRN7HeipLvEdlWxrJ2iKb6m2jJDxJ/OSXfHoU0tQSiLe2idc5JemTTGB/jVOEUrQxt2XW6VgBge8n4bYwTWtxuCZuU2kjvna9hTTeoZ1abUQU5XOcDBjxbW0BKEzLOKpMp0tEYamG0SJw5TpZ+e4SlkaXQ9iNv2kyoC+IxWXS1PLH9n0c6n7VZlVO8LylhIXg/VHECkhz3HBjWDvPTwEvtabkGi/300ytlTw zLx8Hy1v g9poPR52VD8lF73bH4tu1msHzrFP6ZfF3lHMKaMVA6jwHhCzsrJ7qiGleOls2dWyZDQom/cGVFCClQr9ef3LrCSKt7tRtjULTIHe7xrkr1ix/IFGtI7fK3Nxbg5NowqK8uVE+o726fOAXnOkS2pbe5A7C89Dt4J1D3pEbyJqELqzjcLY4H38KcHQzr/cF9l6UMPxEdOwTv8aAEC3YZJ19/8hBk6dcsuVIh4lkVMFLc/VVpOSWAjVArEMSJU9ZoQZrKm7OFgEuUvzNA0o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 28, 2025 at 08:43:55AM +0000, Brendan Jackman wrote: > Yeah I see what you mean. I think the issues are: > > 1. We're mixing up two different aspects in the API: > > a. Starting and finishing "critical sections" (i.e. the region > between asi_enter() and asi_relax()) > > b. Actually triggering address space transitions. > > 2. There is a fundamental asymmetry at play here: asi_enter() and > asi_exit() can both be NOPs (when we're already in the relevant > address space), and asi_enter() being a NOP is really the _whole > point of ASI_. I'm guessing you mean this thing in __asi_enter(): + if (!target || target == this_cpu_read(curr_asi)) + return; The assumption being that curr_asi will be the target most of the time after having done the expensive switch once... > The ideal world is where asi_exit() is very very rare, so > asi_enter() is almost always a NOP. ... asi_exit() being the actual switch to the unrestricted CR3. And asi_relax() being the switch of current task's asi target ptr to NULL. Comment says "Domain to enter when returning to process context." but I'm none-the-wiser. So, why are we doing that relaxing thing? I'm guessing the relaxing is marking the end of the region where we're running untrusted code. After asi_relax() we are still in the restricted CR3 but we're not running untrusted code. > So we could disentangle part 1 by just rejigging things as you suggest, > and I think the naming would be like: > > asi_enter > asi_start_critical > asi_end_critical > asi_exit Yap, that's what I was gonna suggest: asi_enter and asi_exit do the actual CR3 build and switch and start_critical and end_critical do the cheaper tracking thing. > But the issue with that is that asi_start_critical() _must_ imply > asi_enter() What does that mean exactly? asi_start_critical() must never be called before asi_enter()? If so, I'm sure there are ways to track and check that and warn if not, right? > (otherwise if we get an NMI between asi_enter() and > asi_start_critical(), and that causes a #PF, we will start the > critical section in the wrong address space and ASI won't do its job). > So, we are somewhat forced to mix up a. and b. from above. I don't understand: asi_enter() can be interrupted by an NMI at any random point. How is the current, imbalanced interface not vulnerable to this scenario? > BTW, there is another thing complicating this picture a little: ASI > "clients" (really just meaning KVM code at this point) are not not > really supposed to care at all about the actual address space, the fact > that they currently have to call asi_exit() in part 4b is just a > temporary thing to simplify the initial implementation. It has a > performance cost (not enormous, serious KVM platforms try pretty hard You mean the switch to the unrestricted_cr3? I can imagine... > to avoid returning to user space, but it does still matter) so > Google's internal version has already got rid of it and that's where I > expect this thing to evolve too. But for now it just lets us keep > things simple since e.g. we never have to think about context > switching in the restricted address space. > > With that in mind, what if it looked like this: > > ioctl(KVM_RUN) { > enter_from_user_mode() > while !need_userspace_handling() > // This implies asi_enter(), but this code "doesn't care" > // about that. > asi_start_critical(); > vmenter(); > asi_end_critical(); > } > // TODO: This is temporary, it should not be needed. > asi_exit(); > exit_to_user_mode() > } > > Once the asi_exit() call disappears, it will be symmetrical from the > "client API"'s point of view. And while we still mix up address space > switching with critical section boundaries, the address space > switching is "just an implementation detail" and not really visible as > part of the API. So I'm still unclear on that whole design here so I'm asking silly questions but I know that: 1. you can do empty calls to keep the interface balanced and easy to use 2. once you can remove asi_exit(), you should be able to replace all in-tree users in one atomic change so that they're all switched to the new, simplified interface But I still have the feeling that we could re-jig what asi_enter/relax/exit do and thus have a balanced interface. We'll see... > I have now setup Mutt. I did that 20 years ago. Never looked back. I'd say you're on the right track :-) > But, for now I am replying with plan vim + git-send-email, because I also > sent this RFC to a ridiculous CC list (I just blindly used the > get_maintainers.pl output, I don't know why I thought that was a reasonable > approach) and it turns out this is the easiest way to trim it in a reply! > Hopefully I can get the headers right... Yeah, works. On the next version, you could trim it to the couple relevant lists and to whoever reviewed this. The others can always get the thread from lore and there's really no need anymore to Cc the whole world :) Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette