From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22779C48260 for ; Thu, 8 Feb 2024 10:57:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 025616B0071; Thu, 8 Feb 2024 05:57:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F18406B0074; Thu, 8 Feb 2024 05:57:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDEB66B0075; Thu, 8 Feb 2024 05:57:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id CF63E6B0071 for ; Thu, 8 Feb 2024 05:57:28 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4C408140E69 for ; Thu, 8 Feb 2024 10:57:28 +0000 (UTC) X-FDA: 81768335376.12.69A451A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf19.hostedemail.com (Postfix) with ESMTP id 7AE021A0002 for ; Thu, 8 Feb 2024 10:57:26 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of suzuki.poulose@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=suzuki.poulose@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707389846; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dA06g3veDix2TBkH5aiK1uwcUgxB4Es8siA/167z+Hs=; b=T6nrxlCYIs8jk+7UKX2czq3+n4hLusQw5Z4Ccte2V8r9Zp24rw9wnQXChGQtxGmfMGFT/c ePZUwcG51Todfbic7+8ka0OToqXFSmA7MVwjtInykuVLQ5QZOKFL5h7V7X5abdSp2tkGOd RMUrF0XtRAXc6ILXup3o4gZV8BSKOCc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf19.hostedemail.com: domain of suzuki.poulose@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=suzuki.poulose@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707389846; a=rsa-sha256; cv=none; b=4laeah6iM08bainvEDiI+JLy+heI+cCyMQyxZNKVfFYZQR3ykFKJi1j8BbCVKQebqkTwOU yquqSbsE6Z4ODDTSSXj6AwEtXB3o3Wz9Q7oEI2wa8FD943ovrHKZzs3yqa9GaoLcTjJJ+r 6CI0mRcZBnd2NxxyRIXy8QfNkji2XP8= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 99EB01FB; Thu, 8 Feb 2024 02:58:07 -0800 (PST) Received: from [10.57.10.153] (unknown [10.57.10.153]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D64DA3F762; Thu, 8 Feb 2024 02:57:22 -0800 (PST) Message-ID: <761a3982-c7a1-40f1-92d8-5c08dad8383a@arm.com> Date: Thu, 8 Feb 2024 10:57:21 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC gmem v1 3/8] KVM: x86: Add gmem hook for initializing memory Content-Language: en-GB To: Michael Roth , kvm@vger.kernel.org Cc: linux-coco@lists.linux.dev, linux-mm@kvack.org, linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, pbonzini@redhat.com, seanjc@google.com, isaku.yamahata@intel.com, ackerleytng@google.com, vbabka@suse.cz, ashish.kalra@amd.com, nikunj.dadhania@amd.com, jroedel@suse.de, pankaj.gupta@amd.com References: <20231016115028.996656-1-michael.roth@amd.com> <20231016115028.996656-4-michael.roth@amd.com> From: Suzuki K Poulose In-Reply-To: <20231016115028.996656-4-michael.roth@amd.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 7AE021A0002 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: nkysy9chx8o1bx4acehzuiku9z1ums6s X-HE-Tag: 1707389846-871777 X-HE-Meta: U2FsdGVkX19YZRgEHgA1Oe3l5OBNhrW9hAxJD6zHFqlAKUnJTa/TZpn57+CkxrB0M10S1IQfvLnGCTBtEqePpoGpQ0kpiD2NSyyfEWAxtWdSGd+ccF2jIObKc6Yfpxs/BJjeSSfog75hKbC/Ke7ux90qZLcv8gp4ZhCs1i4CKX2xWf5SnMf1X4ZpWAoNsCI5ThZRg/oYOW9cKE0d5TcjX68v/XKO9RTDGb8QiZ2BlABr5+b7sKusEU/w/OeFCex5q5jhkFTA7tR66eUWO4AcLsFDt0NmNsvAqmaad+6dpoU+9LEaCydluKF45qWi0QDZLXyOuADZct/p/lpzx39JvPFAnuM/zMgurHNv73RurVDF6DRgWqWkOh5/1++4qi3bPOFnBt0GtZ+TSwYbPanJZpdsxmbBxW0DfaU5iShT+0LTxMHEMIoeRAg8BL8JK8cyISA6qSr6hVMid5AtS4VnWkwbx2K3PK2LffxsTKAGvAXGXmNgDES1RxrXzBScEX9hQW8+7SAXMgeaW4pEC3W0hZ7XM4899Htgi7x2MkiqE+3uAeegbEoVVrKi1Th4KVb5ioXV8vcTFSfzUX9yCyCsrXGCUaTBRz+p2U7/d+xVLRO5OlwNuDFpjRdOoh7NwWyHL8eXLXZM4TbhHc3yPglIFpSQRcR+qfJDYm7xRgMtd/3uOxDuYRWKkjLHllmg6FHDRnVfd87jiq19t0zwck4glF5warssPYoyJZlxWhwKohfnzHGYgb5KvNyHQhY9tKAJQKYadvJT1xJmqoJEgQ0l4SJxMQRjm9FaKK2/cNCuCft/DALANIuXEAIzhfERwLKw9tx+6PEf8wgrFRBGUzXYRmz6lnFAC1rN03ZApKMuHi8zj/lNNhTbqmUpqjE81wg3hI+Oi893tO44WsIN74hYQH4AMZnQdfupicXE+gk+vYz4ILVKaHcVDrFJo+4hRr9WeS9/hVXV2kGHxsDeveG BC1IqMua Iz7ay8bomFhYPJ/gx7t3awjt/6LgBx7Aw3wuq7IoNsam/8tig+Q28sk/9hx3U6UP6f+bL1w82NkVBXfFFR2YZd0HLjzwfdbwprZKsjimijMCR6Nt6+MePyGDtGEB+wnqvwaWpnqQnpgPE5TZkxwyyyceEvB2ZJqMicir3lZDGQ4tlxWHPbb6NJtSXfDsDsd7T+qxtsenvdtvyOrm8SU9Y4RqCifxI+r3uZrz8D1URYd8kLzZjw7I/xsabeda5SF+z6xqD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On 16/10/2023 12:50, Michael Roth wrote: > guest_memfd pages are generally expected to be in some arch-defined > initial state prior to using them for guest memory. For SEV-SNP this > initial state is 'private', or 'guest-owned', and requires additional > operations to move these pages into a 'private' state by updating the > corresponding entries the RMP table. > > Allow for an arch-defined hook to handle updates of this sort, and go > ahead and implement one for x86 so KVM implementations like AMD SVM can > register a kvm_x86_ops callback to handle these updates for SEV-SNP > guests. > > The preparation callback is always called when allocating/grabbing > folios via gmem, and it is up to the architecture to keep track of > whether or not the pages are already in the expected state (e.g. the RMP > table in the case of SEV-SNP). > > In some cases, it is necessary to defer the preparation of the pages to > handle things like in-place encryption of initial guest memory payloads > before marking these pages as 'private'/'guest-owned', so also add a > helper that performs the same function as kvm_gmem_get_pfn(), but allows > for the preparation callback to be bypassed to allow for pages to be > accessed beforehand. This will be useful for Arm CCA, where the pages need to be moved into "Realm state". Some minor comments below. > > Link: https://lore.kernel.org/lkml/ZLqVdvsF11Ddo7Dq@google.com/ > Signed-off-by: Michael Roth > --- > arch/x86/include/asm/kvm-x86-ops.h | 1 + > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/x86.c | 6 ++++ > include/linux/kvm_host.h | 14 ++++++++ > virt/kvm/Kconfig | 4 +++ > virt/kvm/guest_memfd.c | 56 +++++++++++++++++++++++++++--- > 6 files changed, 78 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h > index e3054e3e46d5..0c113f42d5c7 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -134,6 +134,7 @@ KVM_X86_OP(msr_filter_changed) > KVM_X86_OP(complete_emulated_msr) > KVM_X86_OP(vcpu_deliver_sipi_vector) > KVM_X86_OP_OPTIONAL_RET0(vcpu_get_apicv_inhibit_reasons); > +KVM_X86_OP_OPTIONAL_RET0(gmem_prepare) > > #undef KVM_X86_OP > #undef KVM_X86_OP_OPTIONAL > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 95018cc653f5..66fc89d1858f 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1752,6 +1752,8 @@ struct kvm_x86_ops { > * Returns vCPU specific APICv inhibit reasons > */ > unsigned long (*vcpu_get_apicv_inhibit_reasons)(struct kvm_vcpu *vcpu); > + > + int (*gmem_prepare)(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order); > }; > > struct kvm_x86_nested_ops { > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 767236b4d771..33a4cc33d86d 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -13301,6 +13301,12 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu) > } > EXPORT_SYMBOL_GPL(kvm_arch_no_poll); > > +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE > +int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order) > +{ > + return static_call(kvm_x86_gmem_prepare)(kvm, pfn, gfn, max_order); > +} > +#endif > > int kvm_spec_ctrl_test_value(u64 value) > { > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 8c5c017ab4e9..c7f82c2f1bcf 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -2403,9 +2403,19 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) > #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ > > #ifdef CONFIG_KVM_PRIVATE_MEM > +int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > + gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep); > int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > gfn_t gfn, kvm_pfn_t *pfn, int *max_order); > #else > +static inline int __kvm_gmem_get_pfn(struct kvm *kvm, > + struct kvm_memory_slot *slot, gfn_t gfn, > + kvm_pfn_t *pfn, int *max_order) Missing "bool prep" here ? minor nit: Do we need to export both __kvm_gmem_get_pfn and kvm_gmem_get_pfn ? I don't see anyone else using the former. We could have : #ifdef CONFIG_KVM_PRIVATE_MEM int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep); #else static inline int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep) { KVM_BUG_ON(1, kvm); return -EIO; } #endif static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, int *max_order) { return __kvm_gmem_get_pfn(kvm, slot, gfn, pfn, max_order, true); } Suzuki > + KVM_BUG_ON(1, kvm); > + return -EIO; > +} > + > static inline int kvm_gmem_get_pfn(struct kvm *kvm, > struct kvm_memory_slot *slot, gfn_t gfn, > kvm_pfn_t *pfn, int *max_order) > @@ -2415,4 +2425,8 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, > } > #endif /* CONFIG_KVM_PRIVATE_MEM */ > > +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE > +int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order); > +#endif > + > #endif > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > index 2c964586aa14..992cf6ed86ef 100644 > --- a/virt/kvm/Kconfig > +++ b/virt/kvm/Kconfig > @@ -109,3 +109,7 @@ config KVM_GENERIC_PRIVATE_MEM > select KVM_GENERIC_MEMORY_ATTRIBUTES > select KVM_PRIVATE_MEM > bool > + > +config HAVE_KVM_GMEM_PREPARE > + bool > + depends on KVM_PRIVATE_MEM > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index f6f1b17a319c..72ff8b7b31d5 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -44,7 +44,40 @@ static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t index) > #endif > } > > -static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) > +static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct folio *folio) > +{ > +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE > + struct list_head *gmem_list = &inode->i_mapping->private_list; > + struct kvm_gmem *gmem; > + > + list_for_each_entry(gmem, gmem_list, entry) { > + struct kvm_memory_slot *slot; > + struct kvm *kvm = gmem->kvm; > + struct page *page; > + kvm_pfn_t pfn; > + gfn_t gfn; > + int rc; > + > + slot = xa_load(&gmem->bindings, index); > + if (!slot) > + continue; > + > + page = folio_file_page(folio, index); > + pfn = page_to_pfn(page); > + gfn = slot->base_gfn + index - slot->gmem.pgoff; > + rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, compound_order(compound_head(page))); > + if (rc) { > + pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx, error %d.\n", > + index, rc); > + return rc; > + } > + } > + > +#endif > + return 0; > +} > + > +static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prep) > { > struct folio *folio; > > @@ -74,6 +107,12 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) > folio_mark_uptodate(folio); > } > > + if (prep && kvm_gmem_prepare_folio(inode, index, folio)) { > + folio_unlock(folio); > + folio_put(folio); > + return NULL; > + } > + > /* > * Ignore accessed, referenced, and dirty flags. The memory is > * unevictable and there is no storage to write back to. > @@ -178,7 +217,7 @@ static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len) > break; > } > > - folio = kvm_gmem_get_folio(inode, index); > + folio = kvm_gmem_get_folio(inode, index, true); > if (!folio) { > r = -ENOMEM; > break; > @@ -537,8 +576,8 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) > fput(file); > } > > -int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > - gfn_t gfn, kvm_pfn_t *pfn, int *max_order) > +int __kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > + gfn_t gfn, kvm_pfn_t *pfn, int *max_order, bool prep) > { > pgoff_t index, huge_index; > struct kvm_gmem *gmem; > @@ -559,7 +598,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > goto out_fput; > } > > - folio = kvm_gmem_get_folio(file_inode(file), index); > + folio = kvm_gmem_get_folio(file_inode(file), index, prep); > if (!folio) { > r = -ENOMEM; > goto out_fput; > @@ -600,4 +639,11 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > > return r; > } > +EXPORT_SYMBOL_GPL(__kvm_gmem_get_pfn); > + > +int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > + gfn_t gfn, kvm_pfn_t *pfn, int *max_order) > +{ > + return __kvm_gmem_get_pfn(kvm, slot, gfn, pfn, max_order, true); > +} > EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn);