From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5D27C3DA40 for ; Fri, 21 Jul 2023 22:27:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 929EF8D0002; Fri, 21 Jul 2023 18:27:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DA928D0001; Fri, 21 Jul 2023 18:27:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 754AF8D0002; Fri, 21 Jul 2023 18:27:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6368B8D0001 for ; Fri, 21 Jul 2023 18:27:10 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 312961A0442 for ; Fri, 21 Jul 2023 22:27:10 +0000 (UTC) X-FDA: 81037055820.15.A39021A Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by imf03.hostedemail.com (Postfix) with ESMTP id 49E1A2000E for ; Fri, 21 Jul 2023 22:27:08 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=FysoijIr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of isaku.yamahata@gmail.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=isaku.yamahata@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689978428; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eDKyepYXyjRRch0BmZeah1kwkPDuydn93OQ8Q7H7q9k=; b=La14sYs9ctXxpUlQv+6TXMcjBIX63Hi+BPbjXPYeKG4c5zlm/OSiHnFGqV1sIH7DGdWqYh yzbTYMupiPnLUzvYg/bDWRRYuqrkyQQVLjLJDamrHwMTDx+mQqesbPGNY1LWoTbTyIr5NR KT/9Edmdjlz13Mz8PnkNK/36XtW2LGw= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=FysoijIr; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of isaku.yamahata@gmail.com designates 209.85.210.171 as permitted sender) smtp.mailfrom=isaku.yamahata@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689978428; a=rsa-sha256; cv=none; b=khR7wkM/H9vtc3kdzA5kaUmkacnW4qIf1BiknBFliN+kTutick2JyZzkb0+qs63ZBbul7b ahd2EzcJYIw82wGoCbLq67HMKsKCfezQbEZE5KN9ZFsBqHGr4NW+r7h15Z3D6dbZULblJE wuuez+OZfjdF4DEwN5zde76HM4u5to8= Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6686a05bc66so1717398b3a.1 for ; Fri, 21 Jul 2023 15:27:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689978427; x=1690583227; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=eDKyepYXyjRRch0BmZeah1kwkPDuydn93OQ8Q7H7q9k=; b=FysoijIrzywkPRpXms8d7t649/LAeMOJfcfBabgutkiGft+surAV7nj6eTDEc2XKW9 gjALtuaWQa3zBKtZX7S/oEaLIF6CLCY53h5DgfWjLGLMPbnnWvMNmXDRrvJ++Vkw5FwU tpC1GfCpFg5tAqzwT9HC9bz9Q7ZjTAxT/T85kzevY64kQIHJn9yAWtMGSWnAH/acEYqw lrHUA3yYULrqPHuvHsRf3ACxsaQSruv+iqHLCjR6wvjogKozKzIPwh6upOm70oBclv8u nUkk9Av66XrahTP0EoN9Lke63Fdk5bz2DJyAKFDVc/n2eNw1wnGSCDIkKDUMYiFSTWvV Pbrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689978427; x=1690583227; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=eDKyepYXyjRRch0BmZeah1kwkPDuydn93OQ8Q7H7q9k=; b=X0o0l9nUiHylKj1iqazEcEHDFxASyLFOTzg0sk9CmmuOakdWirEr+jKlbbGQAS5MiK 5pu1RiMQMyuTDr0U/ylT4jTS4KRaGay1co6145i2xEvwTdBTmCkUEZ1Wcsi3AgbhM0IN juFs2GEY3b1wPAimlteATgqVfK7fvm4DEDGcEWmi5aPQ2tfbpksM+3KVasPHDNOyDy15 v5oFBfuSfBemdcUPVEL2XcmaUgF+l27petFXdoAerxO0bpy/mvdMFhzJxIdpo/9LvtY/ I+tR7QS4RLHC0NiRFdFdKW+WK3q2ylfFnfM9/gxmN9VUmE05lDAxDxD+LSWVdyp6NwqY qYew== X-Gm-Message-State: ABy/qLaqqpcYE6HiTyH8VM2//nFAKvBcgillNVdIUq2wNXpYgzVQKc/5 GdCL4mnXZ86pSVBKQ+ZMl58= X-Google-Smtp-Source: APBJJlFqx5KBzbf+bcpJBIs8RKvjVeA0hMI7L53m0MptEpae0uUrLDbExBMYFr3bDfXWpEHeiuvheg== X-Received: by 2002:a05:6a20:3216:b0:134:6839:c497 with SMTP id hl22-20020a056a20321600b001346839c497mr2694272pzc.11.1689978426895; Fri, 21 Jul 2023 15:27:06 -0700 (PDT) Received: from localhost ([192.55.54.50]) by smtp.gmail.com with ESMTPSA id s24-20020a170902a51800b001b890b3bbb1sm3968363plq.211.2023.07.21.15.27.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Jul 2023 15:27:06 -0700 (PDT) Date: Fri, 21 Jul 2023 15:27:04 -0700 From: Isaku Yamahata To: Yuan Yao Cc: Sean Christopherson , Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Yu Zhang , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , Vlastimil Babka , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Subject: Re: [RFC PATCH v11 12/29] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory Message-ID: <20230721222704.GJ25699@ls.amr.corp.intel.com> References: <20230718234512.1690985-1-seanjc@google.com> <20230718234512.1690985-13-seanjc@google.com> <20230721061314.3ls6stdawz53drv3@yy-desk-7060> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20230721061314.3ls6stdawz53drv3@yy-desk-7060> X-Rspamd-Queue-Id: 49E1A2000E X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: cycajapwb89q5b4wadsgs91zrhkzej78 X-HE-Tag: 1689978428-32053 X-HE-Meta: U2FsdGVkX1+AjXCs/ljDYb8thC5dhKuVUkJJnISezAgjeO49HFwuNbeF8GmVISG8BEMrp6cwTMhJI/juxNueyV8t9/QgVaBAYJ0LZK3Ipke7BQu0G+h6gJwiZU+er4BgBwIGm4FHkvkov5pEEbqDp4v1sN1D/5CrY/9QYGA7GM/SU+mnDzzvEVXaeozu3EADCctjP+iCGXDNNV5IT2NHAeV+wPKGCShsFO0e6lRiF7E+YPeBPxSbpDtBrPttbOtTCYnf5w0x0QQDFz8vKFevAhGepru4S6sKuBarGKkkPnliTpMKjvm4DlxAcvZgS9pYrJV3APckvyVYOJcklu3MVM1yk5UUSkFtRequLsekwJbvmsO1YJcVzUaavzFM0c+HeM4OeeyxQIle0FNRFrp6w+s5dbgXlRdlv4mix9G20BdiX1AS0Q03R2/QY6SJgvNvmso1K2OpVSKRAAJkwgySanpqDkxbbMhsKgeaqqZcjck8eKZySH76FUsp5TnrStnTDJExbIZQ4Ly3kGd2G3tGTEX2jIhcwsMJ/7vpq0vyDbPee0dDl2ZTL6WNF5xD8O/+wbsEc+WQeVKinGQN0fze4VOwwbvSxmjIzcaTPKIbqC49o20+2E7iKRHO45WDTQoy5GCvboMXahsDQPV4T9wTNjliDMhLYJr7fZRd+oUFrMuPaM6gOZdJzd5oSj3MiCWzKhF3OrYwc3zwcf27OtY4NEKO2ToW7MYhbJzpWQazVvrcRzgGUfI/AK3zJi75smi8Mc3LXOtTYs6bg1f7MWn0nQom2tml3z/0vOW9t1SGWxbvjbOo9urEUpr7LhHTSqYgvtjvwOXUJAJEUVjVHe9m3lXNTe64/+PkASO4GvsUtT4aEFZO3x/1tp3mH8TItnco/BKmJytmDtStmfZ2tLr72sSDCXOk6OJofBeq1DkWM1bBoVuBjFPoubxmRv2C9Gqb7Rh/wLwnTT5WmXFopq/ vVwMOdAN lm73aChveOvfOwR965uS1dTw1jaUk/R6RcfdYzN99a5FLQ5o8FbyjPS9Nik2NQGk1BFWmE70JBgogcm3pQEXtII5mG08Tep3qA+mR7NgNDQ30hOJRe+7pO6X3KLszXC2j72s/MpcKXAeaVke6CN6+9xTufO4M8HTRWfMqgtq8WBny2SJ14htp1rPrk3PKo8MRyihqMQ4nbNOPhSBfEKJuJgfcKfmRzwR+Alq9j9tuw06FQPJm5gSTNqJ8DN+noPJef2Hx7Qb9E5GIOGDStzqcJ3at6CUHz3NErSvN4zmng07UkHJJnuu8A+6E9WiHi8lr18qVfi8omuJQT9EPE6P5BPMCTjuBvWF2nsh87V0tJorJu5R3kzXmmKhBkflonCPwI8TBiKuMEFGXiz0M6AnvMDycr7Htace76NVFeaNrhSIN1EO6ljRtEU9O7A0Kuu4qoJlZokwp5Q4Ctg+NovBohz/pKzjsOwqaGreCD3BTw7C7KgFTb2ThMa8LPWwtHz8M/Iv+0s5VsXKv/7m/T+SXGh20LZXW2DcNqxnLIE0R/2jKhIw5OFr1nFHQD796QEtcB1nxB5v310knDvmK1u+vkSAIEWPy7j0BP0oh+/gKcWTmmZ9v2EhQ/WpoVFRIB+3ANIrkDAn5q+hKzpsBaoqcgdVN42l5CotWG+1a0B/prR2fKqt7rtH3iOst/Do9uTRLmPaBrQ99vTsxf+etz+yx0uEvXY2lERYII/BoOafQXyKrVdU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 21, 2023 at 02:13:14PM +0800, Yuan Yao wrote: > On Tue, Jul 18, 2023 at 04:44:55PM -0700, Sean Christopherson wrote: > > TODO > > > > Cc: Fuad Tabba > > Cc: Vishal Annapurve > > Cc: Ackerley Tng > > Cc: Jarkko Sakkinen > > Cc: Maciej Szmigiero > > Cc: Vlastimil Babka > > Cc: David Hildenbrand > > Cc: Quentin Perret > > Cc: Michael Roth > > Cc: Wang > > Cc: Liam Merwick > > Cc: Isaku Yamahata > > Co-developed-by: Kirill A. Shutemov > > Signed-off-by: Kirill A. Shutemov > > Co-developed-by: Yu Zhang > > Signed-off-by: Yu Zhang > > Co-developed-by: Chao Peng > > Signed-off-by: Chao Peng > > Co-developed-by: Ackerley Tng > > Signed-off-by: Ackerley Tng > > Signed-off-by: Sean Christopherson > > --- > > include/linux/kvm_host.h | 48 +++ > > include/uapi/linux/kvm.h | 14 +- > > include/uapi/linux/magic.h | 1 + > > virt/kvm/Kconfig | 4 + > > virt/kvm/Makefile.kvm | 1 + > > virt/kvm/guest_mem.c | 591 +++++++++++++++++++++++++++++++++++++ > > virt/kvm/kvm_main.c | 58 +++- > > virt/kvm/kvm_mm.h | 38 +++ > > 8 files changed, 750 insertions(+), 5 deletions(-) > > create mode 100644 virt/kvm/guest_mem.c > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 97db63da6227..0d1e2ee8ae7a 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -592,8 +592,20 @@ struct kvm_memory_slot { > > u32 flags; > > short id; > > u16 as_id; > > + > > +#ifdef CONFIG_KVM_PRIVATE_MEM > > + struct { > > + struct file __rcu *file; > > + pgoff_t pgoff; > > + } gmem; > > +#endif > > }; > > > > +static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot) > > +{ > > + return slot && (slot->flags & KVM_MEM_PRIVATE); > > +} > > + > > static inline bool kvm_slot_dirty_track_enabled(const struct kvm_memory_slot *slot) > > { > > return slot->flags & KVM_MEM_LOG_DIRTY_PAGES; > > @@ -688,6 +700,17 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu) > > } > > #endif > > > > +/* > > + * Arch code must define kvm_arch_has_private_mem if support for private memory > > + * is enabled. > > + */ > > +#if !defined(kvm_arch_has_private_mem) && !IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) > > +static inline bool kvm_arch_has_private_mem(struct kvm *kvm) > > +{ > > + return false; > > +} > > +#endif > > + > > struct kvm_memslots { > > u64 generation; > > atomic_long_t last_used_slot; > > @@ -1380,6 +1403,7 @@ void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); > > void kvm_mmu_invalidate_begin(struct kvm *kvm); > > void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end); > > void kvm_mmu_invalidate_end(struct kvm *kvm); > > +bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); > > > > long kvm_arch_dev_ioctl(struct file *filp, > > unsigned int ioctl, unsigned long arg); > > @@ -2313,6 +2337,30 @@ static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn > > > > bool kvm_arch_post_set_memory_attributes(struct kvm *kvm, > > struct kvm_gfn_range *range); > > + > > +static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) > > +{ > > + return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) && > > + kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE; > > +} > > +#else > > +static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) > > +{ > > + return false; > > +} > > #endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */ > > > > +#ifdef CONFIG_KVM_PRIVATE_MEM > > +int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > > + gfn_t gfn, kvm_pfn_t *pfn, int *max_order); > > +#else > > +static inline int kvm_gmem_get_pfn(struct kvm *kvm, > > + struct kvm_memory_slot *slot, gfn_t gfn, > > + kvm_pfn_t *pfn, int *max_order) > > +{ > > + KVM_BUG_ON(1, kvm); > > + return -EIO; > > +} > > +#endif /* CONFIG_KVM_PRIVATE_MEM */ > > + > > #endif > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index f065c57db327..9b344fc98598 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -102,7 +102,10 @@ struct kvm_userspace_memory_region2 { > > __u64 guest_phys_addr; > > __u64 memory_size; > > __u64 userspace_addr; > > - __u64 pad[16]; > > + __u64 gmem_offset; > > + __u32 gmem_fd; > > + __u32 pad1; > > + __u64 pad2[14]; > > }; > > > > /* > > @@ -112,6 +115,7 @@ struct kvm_userspace_memory_region2 { > > */ > > #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > > #define KVM_MEM_READONLY (1UL << 1) > > +#define KVM_MEM_PRIVATE (1UL << 2) > > > > /* for KVM_IRQ_LINE */ > > struct kvm_irq_level { > > @@ -2284,4 +2288,12 @@ struct kvm_memory_attributes { > > > > #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > > > +#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd) > > + > > +struct kvm_create_guest_memfd { > > + __u64 size; > > + __u64 flags; > > + __u64 reserved[6]; > > +}; > > + > > #endif /* __LINUX_KVM_H */ > > diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h > > index 6325d1d0e90f..15041aa7d9ae 100644 > > --- a/include/uapi/linux/magic.h > > +++ b/include/uapi/linux/magic.h > > @@ -101,5 +101,6 @@ > > #define DMA_BUF_MAGIC 0x444d4142 /* "DMAB" */ > > #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ > > #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ > > +#define GUEST_MEMORY_MAGIC 0x474d454d /* "GMEM" */ > > > > #endif /* __LINUX_MAGIC_H__ */ > > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig > > index 8375bc49f97d..3ee3205e0b39 100644 > > --- a/virt/kvm/Kconfig > > +++ b/virt/kvm/Kconfig > > @@ -103,3 +103,7 @@ config KVM_GENERIC_MMU_NOTIFIER > > config KVM_GENERIC_MEMORY_ATTRIBUTES > > select KVM_GENERIC_MMU_NOTIFIER > > bool > > + > > +config KVM_PRIVATE_MEM > > + select XARRAY_MULTI > > + bool > > diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm > > index 2c27d5d0c367..a5a61bbe7f4c 100644 > > --- a/virt/kvm/Makefile.kvm > > +++ b/virt/kvm/Makefile.kvm > > @@ -12,3 +12,4 @@ kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o > > kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o > > kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o > > kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o > > +kvm-$(CONFIG_KVM_PRIVATE_MEM) += $(KVM)/guest_mem.o > > diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c > > new file mode 100644 > > index 000000000000..1b705fd63fa8 > > --- /dev/null > > +++ b/virt/kvm/guest_mem.c > > @@ -0,0 +1,591 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > + > > +#include "kvm_mm.h" > > + > > +static struct vfsmount *kvm_gmem_mnt; > > + > > +struct kvm_gmem { > > + struct kvm *kvm; > > + struct xarray bindings; > > + struct list_head entry; > > +}; > > + > > +static struct folio *kvm_gmem_get_folio(struct file *file, pgoff_t index) > > +{ > > + struct folio *folio; > > + > > + /* TODO: Support huge pages. */ > > + folio = filemap_grab_folio(file->f_mapping, index); > > + if (!folio) > > + return NULL; > > + > > + /* > > + * Use the up-to-date flag to track whether or not the memory has been > > + * zeroed before being handed off to the guest. There is no backing > > + * storage for the memory, so the folio will remain up-to-date until > > + * it's removed. > > + * > > + * TODO: Skip clearing pages when trusted firmware will do it when > > + * assigning memory to the guest. > > + */ > > + if (!folio_test_uptodate(folio)) { > > + unsigned long nr_pages = folio_nr_pages(folio); > > + unsigned long i; > > + > > + for (i = 0; i < nr_pages; i++) > > + clear_highpage(folio_page(folio, i)); > > + > > + folio_mark_uptodate(folio); > > + } > > + > > + /* > > + * Ignore accessed, referenced, and dirty flags. The memory is > > + * unevictable and there is no storage to write back to. > > + */ > > + return folio; > > +} > > + > > +static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, > > + pgoff_t end) > > +{ > > + struct kvm_memory_slot *slot; > > + struct kvm *kvm = gmem->kvm; > > + unsigned long index; > > + bool flush = false; > > + > > + KVM_MMU_LOCK(kvm); > > + > > + kvm_mmu_invalidate_begin(kvm); > > + > > + xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { > > + pgoff_t pgoff = slot->gmem.pgoff; > > + > > + struct kvm_gfn_range gfn_range = { > > + .start = slot->base_gfn + max(pgoff, start) - pgoff, > > + .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, > > + .slot = slot, > > + .may_block = true, > > + }; > > + > > + flush |= kvm_mmu_unmap_gfn_range(kvm, &gfn_range); > > + } > > + > > + if (flush) > > + kvm_flush_remote_tlbs(kvm); > > + > > + KVM_MMU_UNLOCK(kvm); > > +} > > + > > +static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start, > > + pgoff_t end) > > +{ > > + struct kvm *kvm = gmem->kvm; > > + > > + KVM_MMU_LOCK(kvm); > > + if (xa_find(&gmem->bindings, &start, end - 1, XA_PRESENT)) > > + kvm_mmu_invalidate_end(kvm); > > + KVM_MMU_UNLOCK(kvm); > > +} > > + > > +static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len) > > +{ > > + struct list_head *gmem_list = &inode->i_mapping->private_list; > > + pgoff_t start = offset >> PAGE_SHIFT; > > + pgoff_t end = (offset + len) >> PAGE_SHIFT; > > + struct kvm_gmem *gmem; > > + > > + /* > > + * Bindings must stable across invalidation to ensure the start+end > > + * are balanced. > > + */ > > + filemap_invalidate_lock(inode->i_mapping); > > + > > + list_for_each_entry(gmem, gmem_list, entry) > > + kvm_gmem_invalidate_begin(gmem, start, end); > > + > > + truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); > > + > > + list_for_each_entry(gmem, gmem_list, entry) > > + kvm_gmem_invalidate_end(gmem, start, end); > > + > > + filemap_invalidate_unlock(inode->i_mapping); > > + > > + return 0; > > +} > > + > > +static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t len) > > +{ > > + struct address_space *mapping = inode->i_mapping; > > + pgoff_t start, index, end; > > + int r; > > + > > + /* Dedicated guest is immutable by default. */ > > + if (offset + len > i_size_read(inode)) > > + return -EINVAL; > > + > > + filemap_invalidate_lock_shared(mapping); > > + > > + start = offset >> PAGE_SHIFT; > > + end = (offset + len) >> PAGE_SHIFT; > > + > > + r = 0; > > + for (index = start; index < end; ) { > > + struct folio *folio; > > + > > + if (signal_pending(current)) { > > + r = -EINTR; > > + break; > > + } > > + > > + folio = kvm_gmem_get_folio(inode, index); > > + if (!folio) { > > + r = -ENOMEM; > > + break; > > + } > > + > > + index = folio_next_index(folio); > > + > > + folio_unlock(folio); > > + folio_put(folio); > > + > > + /* 64-bit only, wrapping the index should be impossible. */ > > + if (WARN_ON_ONCE(!index)) > > + break; > > + > > + cond_resched(); > > + } > > + > > + filemap_invalidate_unlock_shared(mapping); > > + > > + return r; > > +} > > + > > +static long kvm_gmem_fallocate(struct file *file, int mode, loff_t offset, > > + loff_t len) > > +{ > > + int ret; > > + > > + if (!(mode & FALLOC_FL_KEEP_SIZE)) > > + return -EOPNOTSUPP; > > + > > + if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) > > + return -EOPNOTSUPP; > > + > > + if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) > > + return -EINVAL; > > + > > + if (mode & FALLOC_FL_PUNCH_HOLE) > > + ret = kvm_gmem_punch_hole(file_inode(file), offset, len); > > + else > > + ret = kvm_gmem_allocate(file_inode(file), offset, len); > > + > > + if (!ret) > > + file_modified(file); > > + return ret; > > +} > > + > > +static int kvm_gmem_release(struct inode *inode, struct file *file) > > +{ > > + struct kvm_gmem *gmem = file->private_data; > > + struct kvm_memory_slot *slot; > > + struct kvm *kvm = gmem->kvm; > > + unsigned long index; > > + > > + filemap_invalidate_lock(inode->i_mapping); > > + > > + /* > > + * Prevent concurrent attempts to *unbind* a memslot. This is the last > > + * reference to the file and thus no new bindings can be created, but > > + * dereferencing the slot for existing bindings needs to be protected > > + * against memslot updates, specifically so that unbind doesn't race > > + * and free the memslot (kvm_gmem_get_file() will return NULL). > > + */ > > + mutex_lock(&kvm->slots_lock); > > + > > + xa_for_each(&gmem->bindings, index, slot) > > + rcu_assign_pointer(slot->gmem.file, NULL); > > + > > + synchronize_rcu(); > > + > > + /* > > + * All in-flight operations are gone and new bindings can be created. > > + * Zap all SPTEs pointed at by this file. Do not free the backing > > + * memory, as its lifetime is associated with the inode, not the file. > > + */ > > + kvm_gmem_invalidate_begin(gmem, 0, -1ul); > > + kvm_gmem_invalidate_end(gmem, 0, -1ul); > > + > > + mutex_unlock(&kvm->slots_lock); > > + > > + list_del(&gmem->entry); > > + > > + filemap_invalidate_unlock(inode->i_mapping); > > + > > + xa_destroy(&gmem->bindings); > > + kfree(gmem); > > + > > + kvm_put_kvm(kvm); > > + > > + return 0; > > +} > > + > > +static struct file *kvm_gmem_get_file(struct kvm_memory_slot *slot) > > +{ > > + struct file *file; > > + > > + rcu_read_lock(); > > + > > + file = rcu_dereference(slot->gmem.file); > > + if (file && !get_file_rcu(file)) > > + file = NULL; > > + > > + rcu_read_unlock(); > > + > > + return file; > > +} > > + > > +static const struct file_operations kvm_gmem_fops = { > > + .open = generic_file_open, > > + .release = kvm_gmem_release, > > + .fallocate = kvm_gmem_fallocate, > > +}; > > + > > +static int kvm_gmem_migrate_folio(struct address_space *mapping, > > + struct folio *dst, struct folio *src, > > + enum migrate_mode mode) > > +{ > > + WARN_ON_ONCE(1); > > + return -EINVAL; > > +} > > + > > +static int kvm_gmem_error_page(struct address_space *mapping, struct page *page) > > +{ > > + struct list_head *gmem_list = &mapping->private_list; > > + struct kvm_memory_slot *slot; > > + struct kvm_gmem *gmem; > > + unsigned long index; > > + pgoff_t start, end; > > + gfn_t gfn; > > + > > + filemap_invalidate_lock_shared(mapping); > > + > > + start = page->index; > > + end = start + thp_nr_pages(page); > > + > > + list_for_each_entry(gmem, gmem_list, entry) { > > + xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { > > + for (gfn = start; gfn < end; gfn++) { > > Why the start end range used as gfn here ? > > the page->index is offset of inode's page cache mapping and > gmem address space, IIUC, gfn calculation should follow same > way as kvm_gmem_invalidate_begin(). Also instead of sending signal multiple times, we can utilize lsb argument. Something like this? diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index a14eaac9dbad..8072ac901855 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -349,20 +349,35 @@ static int kvm_gmem_error_page(struct address_space *mapping, struct page *page) struct kvm_gmem *gmem; unsigned long index; pgoff_t start, end; - gfn_t gfn; + unsigned int order; + int nr_pages; + gfn_t gfn, gfn_end; filemap_invalidate_lock_shared(mapping); start = page->index; end = start + thp_nr_pages(page); + nr_pages = thp_nr_pages(page); + order = thp_order(page); list_for_each_entry(gmem, gmem_list, entry) { xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { - for (gfn = start; gfn < end; gfn++) { - if (WARN_ON_ONCE(gfn < slot->base_gfn || - gfn >= slot->base_gfn + slot->npages)) - continue; + gfn = slot->base_gfn + page->index - slot->gmem.pgoff; + if (page->index + nr_pages <= slot->gmem.pgoff + slot->npages && + !(gfn & ~((1ULL << order) - 1))) { + /* + * FIXME: Tell userspace that the *private* + * memory encountered an error. + */ + send_sig_mceerr(BUS_MCEERR_AR, + (void __user *)gfn_to_hva_memslot(slot, gfn), + order, current); + break; + } + + gfn_end = min(gfn + nr_pages, slot->base_gfn + slot->npages); + for (; gfn < gfn_end; gfn++) { /* * FIXME: Tell userspace that the *private* * memory encountered an error. -- Isaku Yamahata