From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53757D3C536 for ; Thu, 17 Oct 2024 21:53:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DBBE36B007B; Thu, 17 Oct 2024 17:53:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6A186B0082; Thu, 17 Oct 2024 17:53:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C31C06B0083; Thu, 17 Oct 2024 17:53:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A5B8C6B007B for ; Thu, 17 Oct 2024 17:53:53 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 536F4A0350 for ; Thu, 17 Oct 2024 21:53:32 +0000 (UTC) X-FDA: 82684446726.27.B3B2BDD Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf22.hostedemail.com (Postfix) with ESMTP id 0A188C0017 for ; Thu, 17 Oct 2024 21:53:38 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fLrIyUTm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3bYcRZwsKCAIcemgtng0vpiiqqing.eqonkpwz-oomxcem.qti@flex--ackerleytng.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3bYcRZwsKCAIcemgtng0vpiiqqing.eqonkpwz-oomxcem.qti@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729201871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=JH9th1Lqa6pfzHzGWf6LLQmXCVTNne5Va6Ua71kvm+k=; b=M0hHBsEUFkQpz8JoT3exFpeul5G1cvOinGvglqNmW2WWVgX5fcEO5/GkJ5R5XuCCNaRPcv C0OVSvswtzCQPUfaz2LyACQQ5bX0XtfohkqdrmHkoHvMKsxyr1tu2/SAGjnqvkZaVmdA0m 9DQ4EuoOMWhHK9owLW+dZ23nrrTYIls= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729201871; a=rsa-sha256; cv=none; b=255yssDsr5LzjIrea9tnIv5+dKWyYK18vzhDKU5F5cWOfEUpN/FjzHx4aiVVwTY6xHyUzm XsKX5xD12h+WcWS9UCTd+TO2OYtBaAPmCEc4inthvJTvLkCufXIT5fdFR052RlrDds11Sn 2PV1iOv1dCqGiMwFfDl1E/F8iuEF5Mw= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=fLrIyUTm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of 3bYcRZwsKCAIcemgtng0vpiiqqing.eqonkpwz-oomxcem.qti@flex--ackerleytng.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3bYcRZwsKCAIcemgtng0vpiiqqing.eqonkpwz-oomxcem.qti@flex--ackerleytng.bounces.google.com Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-7ea8baba60dso1459876a12.3 for ; Thu, 17 Oct 2024 14:53:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729202030; x=1729806830; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=JH9th1Lqa6pfzHzGWf6LLQmXCVTNne5Va6Ua71kvm+k=; b=fLrIyUTmJKaeN6ffv7wkmH47R37xxwyCv/J47gMMTZImWZjPUF6o0nBFYdx0o7LewM 9wlxlVn5uflPpIjBCuru+Nr3EZqhWkeFuPI/mXFrS6ttK5tkwXdbOk3DwMbtcwsabmQo ySz5h/MaesyMCuHqi/Vecfp6h7LjeSUNvKGp78LTKzZUFNJNmh8jiDkOTT27K15jFmsD Gw0Ll1ng1H/2pQLHY4g1ZnlyGAJbcf0FBdSok++tDnsy7usiARYETEhT7qktfU9EvaTX EtNH5SZVvPnWi16sVduImfU6lNJONiOgSc0FMadyhAyjwF42KqdQt2LamQKm77LqdcCF lJug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729202030; x=1729806830; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JH9th1Lqa6pfzHzGWf6LLQmXCVTNne5Va6Ua71kvm+k=; b=Y+riDq7CcNf20uwl0SxpdycGBna07XoGpAnQWUVdICdCTNFcQMzGYf81VCZkS/eMB0 USx110wTPkmTAW3nuIn+C44oWFxjVxgNKNvR+T7anpvY8VkS+PbLKH+ln5gzpqkNCDG7 yKezh/IBWXW8SfsMni8zb+Ii4KY+gnH3LbH2MM/cCiXDsOHKl6fi/Urc2feQIfBoPG7/ olS/BOMT8qK8lbiWKN0VjD6V4M1e3kT8c1fYLb4jes8nbBpZHXr6tky7s4lHWtMZkS73 pROsjmcyTH4T/OnkoZ7tDVu5Zn7WUTEiEZ8Xv9+VnFmGk8gzf8a76oEZey5Uf9e/9igF SSeg== X-Forwarded-Encrypted: i=1; AJvYcCWhJDbxj5jsRCUuMITAp3vehI3aN05RKVI/oOyltWmpTgrUOjCO9KukF93QvQhg+2vARL9hSdL5jA==@kvack.org X-Gm-Message-State: AOJu0YyOfuhDE6hjp1Z0hXrTqQdGVhhYjF2N15S07PGSqTWr4QES7f2S Uv/JYUugOoyN3q7x4D0ZHrV/rO6fnrS9AQ0dMyqVNO8LrKvL9GUXOF2hLIXHMwomn3S9VvhXep7 gZBoXjLUmTWhJ8I1PT5kQbA== X-Google-Smtp-Source: AGHT+IEtfvKkZLa8O/1tfb6cckR8JiTF44qMCQGS2HKZ5P3Id/ZV1DoHDT6P9VCmzFNbhYTrSRz253FF15kikQpVTA== X-Received: from ackerleytng-ctop-specialist.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1612]) (user=ackerleytng job=sendgmr) by 2002:a63:e505:0:b0:7a1:6633:6a07 with SMTP id 41be03b00d2f7-7eacc6db742mr82a12.2.1729202029676; Thu, 17 Oct 2024 14:53:49 -0700 (PDT) Date: Thu, 17 Oct 2024 21:53:46 +0000 In-Reply-To: <20241010085930.1546800-6-tabba@google.com> (message from Fuad Tabba on Thu, 10 Oct 2024 09:59:24 +0100) Mime-Version: 1.0 Message-ID: Subject: Re: [PATCH v3 05/11] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() From: Ackerley Tng To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0A188C0017 X-Stat-Signature: tjjdmo39wday5ib44he5t9ne4prjtfz9 X-Rspam-User: X-HE-Tag: 1729202018-122609 X-HE-Meta: U2FsdGVkX18bOcESttCnBfPW1/I8Is3OALFQX8ryuiGrBjnFm7fDsLnDhf+xFWDkwunCN2ufuACXUD/sO96IbWa0glFMC2F0MeWSGngiU6IR9eBd7GQyWBQwtVHWwRmF9Q3AdEk/dCb4QTGSsTS8UlMEuAex7CeITu9MocNLynb86W0OMTS/8ahhqxeydmreEhcwZuicM7XruFPdT/CyANdrfWbDi0yBbF4CRTfqiRmZ22tNnfkVmxrOCLnjFZZ4yjXXBjYcQCj1pU2zHXOKvNuRM1YiCaofCXqLGjuLvPUruR8ernAY2GyzZx/+5+cE32ClO+Nrbq6meM7IEBkmAJx4KAeoOIXDWmTsVyslX2Yh1bVE5HqueB3SIiy0FTGp2Y359NPJ8juHY3bvQ1XxXJ2ASBrI17WHnY3Sbfqv39R00TFziWXqcrzhue6w+/0Z5u5a6bCCobmWHrPWO/I7ZGY6VexdWQcpSZgRky8QEVoJZYIJIhQSqSuMwn/vJ32ewM2Xg8Td318/8X6FbTO4a6AVS2LET9/5euw2mysFHN01JVoQwc3m5ljfiKRf82d9JPoxMurhJP6RHgNcPycVvRCrtmIpGdOPd3Qpp1lGksyfAIj2C+kjxOMmaV3c6AXD+xsWmf6b1eCHiCQTpb/N6QR7a25+x6Jsr9U+EtZxrQOnNnddh4VPzBM1FoNTxWDhSqK53G6O+AQtu3QoWTaaXkUCh5IRlIYl6TatJdwzWCje59UnFWt2d7KhcmtdUV5HmpIrhnAJoccZIR9rA5YjQzKYbtmXRO7cjyA7Z5KoTW6j4ZDQVDRdfH9T7MnfXhq0i2uJ4OYr+h5vd7sqRKIiGwjOhYGLAj2T+oMb0x8UBY8VkPIw1aXBavlxDEzY3Uck5dD6aqTire8cjwffr/JyTjU0N8bfT5tkmtlfcN/SfjNWcpl3+kmNb//s5sHHm9INwD81C4/12nf2SBPy99p 0Rf+5aaw MfvaobHQ2QtWCa1ydeYl97MsfqL9BEUjvDKDccARxQNOnVwOXn9RNMVO0R1lZiTcaHoa/qaKm5ZLqcNmmt0NkK2bqLUC3w6zV5tCWLE/cF2xLHaP4ABHgBFPSuLkz+7Q1atDzW+eslKx90QklRz4Dr4FWzLjEmwuZ1wWDELBgJ+FUgwmSLdBL0gNBMTBZSPY3SiwsTnlZ44PgFHys15htHOU6TLFUP7j/qwLIBOfnufF7RMCVUDIWXyUnX8EfGomJXNx1uivQGruNA7VIn4gWlLKGil761XzePITFGxuhKhoTcfVsTrcUbdjFm6lMlkMO9Fg+WZkMW7ry80AbeqJyj1DJ+uIuVHZQK6Pi8zXPqx4Dsy0KcLuPU1M9qy6KMlK5bMx8ATVswcgyoENhNTnlgazm5bnayzEWIctZrC4IMoSOvYmiR41L262IT37JN1diZPWi4iBZ7JfhlUBfPkBl154RF/VfqlQAmD2Kr6UDn0x26VJSOhMqLe4UwrxRj3Yy2H2RJ0UzlNAzmD5DSIYFC6DC0g+KWwrrQQECjCWXXTm3wNSttDzyFLo3fxjXwI1BA/Tc26hTuhG7mhqaUMeIcS5oL7IrE1gZ1bpnWEgWSWs9P3FtxPowNZft+zTqnlYld5XmLgUJ3b/JHTfeFIxIEH3jZrh+QOeJYO8rrG2g8r0grttYiN/u8s4FhKty3Zy/NMCb+rPMhbB8NXQ/9DiALyQYiSxk3uILrJBQ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: > Make kvm_(read|/write)_guest_page() capable of accessing guest > memory for slots that don't have a userspace address, but only if > the memory is mappable, which also indicates that it is > accessible by the host. Fuad explained to me that this patch removes the need for userspace to mmap a guest_memfd fd just to provide userspace_addr when only a limited range of shared pages are required, e.g. for kvm_clock. Questions to anyone who might be more familiar: 1. Should we let userspace save the trouble of providing userspace_addr if only KVM (and not userspace) needs to access the shared pages? 2. Other than kvm_{read,write}_guest_page, are there any other parts of KVM that may require updates so that guest_memfd can be used directly from the kernel? Patrick, does this help to answer the question of "how does KVM internally access guest_memfd for non-CoCo VMs" that you brought up in this other thread [*]? [*] https://lore.kernel.org/all/6bca3ad4-3eca-4a75-a775-5f8b0467d7a3@amazon.co.uk/ > > Signed-off-by: Fuad Tabba > --- > virt/kvm/kvm_main.c | 137 ++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 118 insertions(+), 19 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index aed9cf2f1685..77e6412034b9 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3399,23 +3399,114 @@ int kvm_gmem_clear_mappable(struct kvm *kvm, gfn_t start, gfn_t end) > return kvm_gmem_toggle_mappable(kvm, start, end, false); > } > > +static int __kvm_read_guest_memfd_page(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn, void *data, int offset, > + int len) > +{ > + struct page *page; > + u64 pfn; > + int r; > + > + /* > + * Holds the folio lock until after checking whether it can be faulted > + * in, to avoid races with paths that change a folio's mappability. > + */ > + r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, NULL); > + if (r) > + return r; > + > + page = pfn_to_page(pfn); > + > + if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) { > + r = -EPERM; > + goto unlock; > + } > + memcpy(data, page_address(page) + offset, len); > +unlock: > + if (r) > + put_page(page); > + else > + kvm_release_pfn_clean(pfn); > + unlock_page(page); > + > + return r; > +} > + > +static int __kvm_write_guest_memfd_page(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn, const void *data, > + int offset, int len) > +{ > + struct page *page; > + u64 pfn; > + int r; > + > + /* > + * Holds the folio lock until after checking whether it can be faulted > + * in, to avoid races with paths that change a folio's mappability. > + */ > + r = kvm_gmem_get_pfn_locked(kvm, slot, gfn, &pfn, NULL); > + if (r) > + return r; > + > + page = pfn_to_page(pfn); > + > + if (!kvm_gmem_is_mappable(kvm, gfn, gfn + 1)) { > + r = -EPERM; > + goto unlock; > + } > + memcpy(page_address(page) + offset, data, len); > +unlock: > + if (r) > + put_page(page); > + else > + kvm_release_pfn_dirty(pfn); > + unlock_page(page); > + > + return r; > +} > +#else > +static int __kvm_read_guest_memfd_page(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn, void *data, int offset, > + int len) > +{ > + WARN_ON_ONCE(1); > + return -EIO; > +} > + > +static int __kvm_write_guest_memfd_page(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn, const void *data, > + int offset, int len) > +{ > + WARN_ON_ONCE(1); > + return -EIO; > +} > #endif /* CONFIG_KVM_GMEM_MAPPABLE */ > > /* Copy @len bytes from guest memory at '(@gfn * PAGE_SIZE) + @offset' to @data */ > -static int __kvm_read_guest_page(struct kvm_memory_slot *slot, gfn_t gfn, > - void *data, int offset, int len) > + > +static int __kvm_read_guest_page(struct kvm *kvm, struct kvm_memory_slot *slot, > + gfn_t gfn, void *data, int offset, int len) > { > - int r; > unsigned long addr; > > if (WARN_ON_ONCE(offset + len > PAGE_SIZE)) > return -EFAULT; > > + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && > + kvm_slot_can_be_private(slot) && > + !slot->userspace_addr) { > + return __kvm_read_guest_memfd_page(kvm, slot, gfn, data, > + offset, len); > + } > + > addr = gfn_to_hva_memslot_prot(slot, gfn, NULL); > if (kvm_is_error_hva(addr)) > return -EFAULT; > - r = __copy_from_user(data, (void __user *)addr + offset, len); > - if (r) > + if (__copy_from_user(data, (void __user *)addr + offset, len)) > return -EFAULT; > return 0; > } > @@ -3425,7 +3516,7 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset, > { > struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); > > - return __kvm_read_guest_page(slot, gfn, data, offset, len); > + return __kvm_read_guest_page(kvm, slot, gfn, data, offset, len); > } > EXPORT_SYMBOL_GPL(kvm_read_guest_page); > > @@ -3434,7 +3525,7 @@ int kvm_vcpu_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, > { > struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); > > - return __kvm_read_guest_page(slot, gfn, data, offset, len); > + return __kvm_read_guest_page(vcpu->kvm, slot, gfn, data, offset, len); > } > EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_page); > > @@ -3511,22 +3602,30 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); > > /* Copy @len bytes from @data into guest memory at '(@gfn * PAGE_SIZE) + @offset' */ > static int __kvm_write_guest_page(struct kvm *kvm, > - struct kvm_memory_slot *memslot, gfn_t gfn, > - const void *data, int offset, int len) > + struct kvm_memory_slot *slot, gfn_t gfn, > + const void *data, int offset, int len) > { > - int r; > - unsigned long addr; > - > if (WARN_ON_ONCE(offset + len > PAGE_SIZE)) > return -EFAULT; > > - addr = gfn_to_hva_memslot(memslot, gfn); > - if (kvm_is_error_hva(addr)) > - return -EFAULT; > - r = __copy_to_user((void __user *)addr + offset, data, len); > - if (r) > - return -EFAULT; > - mark_page_dirty_in_slot(kvm, memslot, gfn); > + if (IS_ENABLED(CONFIG_KVM_GMEM_MAPPABLE) && > + kvm_slot_can_be_private(slot) && > + !slot->userspace_addr) { > + int r = __kvm_write_guest_memfd_page(kvm, slot, gfn, data, > + offset, len); > + > + if (r) > + return r; > + } else { > + unsigned long addr = gfn_to_hva_memslot(slot, gfn); > + > + if (kvm_is_error_hva(addr)) > + return -EFAULT; > + if (__copy_to_user((void __user *)addr + offset, data, len)) > + return -EFAULT; > + } > + > + mark_page_dirty_in_slot(kvm, slot, gfn); > return 0; > }