From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0ABDAC4332F for ; Wed, 1 Nov 2023 21:55:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 714DF8D0053; Wed, 1 Nov 2023 17:55:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69E938D0050; Wed, 1 Nov 2023 17:55:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53E9A8D0053; Wed, 1 Nov 2023 17:55:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3F05D8D0050 for ; Wed, 1 Nov 2023 17:55:51 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 05946140D25 for ; Wed, 1 Nov 2023 21:55:50 +0000 (UTC) X-FDA: 81410743302.27.88FD387 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf10.hostedemail.com (Postfix) with ESMTP id 46AACC000A for ; Wed, 1 Nov 2023 21:55:49 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="x/0vXCY8"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of 3ZMlCZQYKCDMhTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3ZMlCZQYKCDMhTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698875749; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PcqXHpLnYUoCfOstmB/gusaSnz1wSaaIZvticgi9YrM=; b=DuHjHdcTIWcx145YHu489KAUTvYyO9vrERm+/O/MqR7ZRp2N7FqZy2RFfXQb4AURhQwMcP xOPI46vt7Ci9Cki5MOhcZO+gWBy0UQQNPZP8+0yGi+6c31Q6d1Ciqz3dOovUCpap3IADOj Bb/pNPSwKFm1g+rQJuSVxxCBAUrtlIs= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="x/0vXCY8"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of 3ZMlCZQYKCDMhTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3ZMlCZQYKCDMhTPcYRVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698875749; a=rsa-sha256; cv=none; b=TTdi3dpT+5q2tbNKZE0ypQMIoadGvjt8lRgPcF2ePOruMDYECJZgdWwEy8rOkJu0UH9nqp cLZtmmECHeLkF7Uw2hyxGVeQep+H5wHlstpbKHVhzs30F/Np9U8YIPW2pbEGqXIDSfcaZ5 bWq5lwKc+kuqwxliV/q0Z0gtmwRlCns= Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-da033914f7cso315873276.0 for ; Wed, 01 Nov 2023 14:55:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698875748; x=1699480548; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PcqXHpLnYUoCfOstmB/gusaSnz1wSaaIZvticgi9YrM=; b=x/0vXCY8Gaqh45UCYMVBqE9qKXPbp9egWtPA0ug82uw5dzk8uu/HB0Y3XBu42iLfIb lBA9+ZGiOF9+KDaeTiuGQMLABRbaoQdhx3UzNRhrhTpH7exZL0kvQJqju7KLsVuDgFKQ UDysVjBcRS7wC+1KyT8ncg3USsScZLoGgfoBXNZTJ07y/EgP7PrbPcHwUvIYYPslSUtk HWRbfxn5wCTodjre4tiHufreHriNzCy/72+26ZfKhGIQQg5GdsHZVe/5hcb3tR3NAuMd KKinJd8DM1SklCdMBKQijmVZB9RfqqSYLAL+KNcBtA/Zn3/lGt832nqYVreZVEjYGpXx a+yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698875748; x=1699480548; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PcqXHpLnYUoCfOstmB/gusaSnz1wSaaIZvticgi9YrM=; b=DX2/1XSp81NPXg+/SXcB0OF3zL83oCNALzkjc5r/1ANppPOSj8xcKQZI4CFx4LwfI6 NkdRfXw4lrtSUSswWQobjGZAfh4EPuOOmhmJNGs3WB7kUk2B+g+f3wpHcz4u7WoXV+Nr pVHmZBamMAZPmSav2XsNatZNIYMvwj5hvTeEjQI8cdBYa2YIhy7z0YzeiVr1HxUFaXmz xd90wP7gsgLh8N30HMWTdwXbeu/GX2I/xEusTX50Ts3d44+YuVBTj+m5NR1p+qraxZOH +WwPPwy2AOnFmcYjax2al3cstXneLp2kJe3tPP7PKTepTOwxg53JxwHigQgbhnqMf5Eu 9+OA== X-Gm-Message-State: AOJu0YwYkHbQBkJmIW8efxQIWLU2vtN3iOLqD5bv8ZzBY9LBsCTvTMcL /BQFllUCwn13oZ8xhHi5bGWLnqH5+As= X-Google-Smtp-Source: AGHT+IE2sMY2tsOG0bVHgm+pvNoyezNqlGkOo8OmSBoeRdI+YDBZP5bgBr5ffHZ4bVc102IwBIrDYpGpUa8= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:1083:b0:d9a:c3b8:4274 with SMTP id v3-20020a056902108300b00d9ac3b84274mr405001ybu.7.1698875748263; Wed, 01 Nov 2023 14:55:48 -0700 (PDT) Date: Wed, 1 Nov 2023 14:55:46 -0700 In-Reply-To: Mime-Version: 1.0 References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-17-seanjc@google.com> Message-ID: Subject: Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , "=?utf-8?Q?Micka=C3=ABl_Sala=C3=BCn?=" , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="us-ascii" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 46AACC000A X-Stat-Signature: cje5nryrbt48npwifw3w7jyembnsmj3c X-HE-Tag: 1698875749-874228 X-HE-Meta: U2FsdGVkX1/sjsx9/71gN75UUCgSbO0WaWLrnKfCDrvTRLnUP+H114UM9ddUwAE/UfyDpM49YWdZM5cnM+uIH6T4Uqjp/AJrUIZK1RNenuWRiSgr02+XpnAmNjB3RayOVy7zUsB+llIVbITZfrfx4E4yZ/wuYiRHLy6UUnaKlizZ8NJg0Kgl1wrJvaXTlQCkhYvZVxpps+lUuU0Hj4weBBu32wSu/N40rroxr97IGkBGUuTvIeryS7SZM33N494wCDJEVVlX0sOeA0ga7q1FwbgQqnMOkRG59TNCMPf5jG6eGKuHAftlihn+lZ/T1M0h5pr/ZlI4R0AmgtyA54o/TWqckj1GTmklNU4DkfqZTZJRzfzwSUgN1Paeaw01Y5uK5RN6vvOHr0KA1Hnpq2yTiIL5z7GCNSdvOuam1xAFe3YijC5URdIY4gsdOk3oNiGWqFTu/99VMFvFl2fl8/1S3II4bYVb6vyEE0EeOElJqAoUcEQsSlbHvS3sXryU0uR2U5sl4Rz2MBc0aurGdM4Y4p5H5Zt1D4/fcA3WxNYEYAN+/wZb0/3b/yODjCqrN5Tsms7HCvOULkWyjMixd9tLbncPRnwHi4FyWvJ7XFuK21BR0MTE4pbLNaHR5LvC9f0mPmG5y9LGm+1vA2ZXzDn4TMmpuQuBp8meQ3KIp5qjdL8Ge4sm0AShmDovPZFW+utBNraeL7HsJlfllCaxfJVamTZYc++NNCYwvLsPkirFoc1pln2L0NvZfCabJRAwa/8f1AhYjrkUdk97kr7+yKTDU27LiYVLHBwKfZVlPgJuXs69avkZHenyVXl7f9kwd3YVR4kZTKLzg+/tfqPIaCpBFpcsm7BpW3r9dOVLYlCAxJeIN9JF4qhtWL14G8cHbiblwHSv4pUBHk2wauaGX383EmZUeEkF2VT5iPwApruDas5/ivfn0mGxztA8JGmwgGd7wPc9izuy/3Ye1c5Z4K1 Prxy0plB 26bv0GOKSWy2PmILs2ER0Wc8rzSCJTkzE25jHc2BHqZt3BpICoBwxjpQEoLrnPLiZDFT9KLTr0vNG43PwZT2FqLIQM/YKDYUdFqOX7DgVyb4OUb1N5F6yP3DtnRztSl32ml04IKZpzGsnKLkgo+Q+m01SbtLT79YHGJoOdeC9CMEDrb/84Yk3rmamf8ttUXJnDjqF4XWDb3pX/LCHBIR8BuZTYlDFNdNgv0sntp6iRCh1X8Dv0ixBtKe1hU3lnU6XGHyApCj1NVyo+vXslEpRRT5Dg3tq12SBDW10rumiD/6kaMH4tyfvHLRfsxepMLCRSc/AA8Y6PUfMDzQaldAg1yRxHoPRRjE0vsx2iJvkj5iJozyZFi2GLLz5n4laCeapp8fhmHI2YtppHwAJGQr0Hbtyq5cB71UtQRgq5gFMa1ny+48HCzd40qlanFTykvvEjknQ05kpOcLQEU6wRetdE94B7/lN2TwRRlnmJNGL/M5FvW30XFScpE6JwAkHZuYTRlyH7moMJatV7jmHLzSm2WrJPJGg2bfL5PD78xOmhpZzqojZJH8gf0J+DYzgQJBkdf2wAU0y6n+SQVlIuLUJIFxpbU9qZ7Y6cATQOQ/85A2ab6tfj8jd59/VbHPoaW8LfLg7DhxaVKaMJvVsgKApZrVF+KhQKhJeRx0anKgx+pEtAblspvHVAJu0b7JGDSwogIVV3+GZ5HfuuR4EO6SysQcKnuX82cJxss3sDEmTU0nyXLFBn0r2Tt48X8uCEa71kPf6Vd8tKj/CQlEGXGxnWP+NLbBlqjHn4N1+/2xKF+UNEgqzoaxrcgDJvA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 01, 2023, Fuad Tabba wrote: > > > > @@ -1034,6 +1034,9 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) > > > > /* This does not remove the slot from struct kvm_memslots data structures */ > > > > static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) > > > > { > > > > + if (slot->flags & KVM_MEM_PRIVATE) > > > > + kvm_gmem_unbind(slot); > > > > + > > > > > > Should this be called after kvm_arch_free_memslot()? Arch-specific ode > > > might need some of the data before the unbinding, something I thought > > > might be necessary at one point for the pKVM port when deleting a > > > memslot, but realized later that kvm_invalidate_memslot() -> > > > kvm_arch_guest_memory_reclaimed() was the more logical place for it. > > > Also, since that seems to be the pattern for arch-specific handlers in > > > KVM. > > > > Maybe? But only if we can about symmetry between the allocation and free paths > > I really don't think kvm_arch_free_memslot() should be doing anything beyond a > > "pure" free. E.g. kvm_arch_free_memslot() is also called after moving a memslot, > > which hopefully we never actually have to allow for guest_memfd, but any code in > > kvm_arch_free_memslot() would bring about "what if" questions regarding memslot > > movement. I.e. the API is intended to be a "free arch metadata associated with > > the memslot". > > > > Out of curiosity, what does pKVM need to do at kvm_arch_guest_memory_reclaimed()? > > It's about the host reclaiming ownership of guest memory when tearing > down a protected guest. In pKVM, we currently teardown the guest and > reclaim its memory when kvm_arch_destroy_vm() is called. The problem > with guestmem is that kvm_gmem_unbind() could get called before that > happens, after which the host might try to access the unbound guest > memory. Since the host hasn't reclaimed ownership of the guest memory > from hyp, hilarity ensues (it crashes). > > Initially, I hooked reclaim guest memory to kvm_free_memslot(), but > then I needed to move the unbind later in the function. I realized > later that kvm_arch_guest_memory_reclaimed() gets called earlier (at > the right time), and is more aptly named. Aha! I suspected that might be the case. TDX and SNP also need to solve the same problem of "reclaiming" memory before it can be safely accessed by the host. The plan is to add an arch hook (or two?) into guest_memfd that is invoked when memory is freed from guest_memfd. Hooking kvm_arch_guest_memory_reclaimed() isn't completely correct as deleting a memslot doesn't *guarantee* that guest memory is actually reclaimed (which reminds me, we need to figure out a better name for that thing before introducing kvm_arch_gmem_invalidate()). The effective false positives aren't fatal for the current usage because the hook is used only for x86 SEV guests to flush caches. An unnecessary flush can cause performance issues, but it doesn't affect correctness. For TDX and SNP, and IIUC pKVM, false positives are fatal because KVM could assign memory back to the host that is still owned by guest_memfd. E.g. a misbehaving userspace could prematurely delete a memslot. And the more fun example is intrahost migration, where the plan is to allow pointing multiple guest_memfd files at a single guest_memfd inode: https://lore.kernel.org/all/cover.1691446946.git.ackerleytng@google.com There was a lot of discussion for this, but it's scattered all over the place. The TL;DR is is that the inode will represent physical memory, and a file will represent a given "struct kvm" instance's view of that memory. And so the memory isn't reclaimed until the inode is truncated/punched. I _think_ this reflects the most recent plan from the guest_memfd side: https://lore.kernel.org/all/1233d749211c08d51f9ca5d427938d47f008af1f.1689893403.git.isaku.yamahata@intel.com