From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2182C02187 for ; Thu, 16 Jan 2025 15:17:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 362A4280001; Thu, 16 Jan 2025 10:17:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 313C16B0088; Thu, 16 Jan 2025 10:17:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B3E1280001; Thu, 16 Jan 2025 10:17:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id F23456B0083 for ; Thu, 16 Jan 2025 10:17:21 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A89A944F8F for ; Thu, 16 Jan 2025 15:17:21 +0000 (UTC) X-FDA: 83013668682.30.95957DD Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf28.hostedemail.com (Postfix) with ESMTP id 5DC49C0022 for ; Thu, 16 Jan 2025 15:17:19 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c6teVBbb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of tabba@google.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737040639; a=rsa-sha256; cv=none; b=LJaj5mjU10OEB1ilbyjT/HQaYYZuLxDeEnvrvaxyLnKYP9BSPuC9O4+4NndhJBYTX7WYE0 Lv2yw2WtJtWuMKnl9+ch33AFNlXiMHD9JGi1yq48SLJa7gximGooSD844Zd2KFC+TbqhBl yDDVGGzyH5yece1vbznfgSnqH4qacxo= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c6teVBbb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of tabba@google.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737040639; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pwaNTbOXMKkPxQLJFe431ZnDyCQwmIS9FL0eLIesENM=; b=ijz49BaixtSyt9ANlcWe9sfaz9ODT8X+c+IbpIJohXrpKhiY4SdZa4eRSJ4mdjWsYj/wZJ E76W/HTSJVyIPILe+BEzO29KV1c1KRKZqjxjee+WZY+qj5Avg15IPpNryv6M9BIwIiQtnC JDj46bAsvdw0XWZLbGSK7bT2YJ9Clj0= Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-43621d2dd4cso59525e9.0 for ; Thu, 16 Jan 2025 07:17:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737040638; x=1737645438; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=pwaNTbOXMKkPxQLJFe431ZnDyCQwmIS9FL0eLIesENM=; b=c6teVBbb+NEGOIsk2TMR5Fhu91Cvgv0hHtTLhQDNji1DOhFBtPpu4VwdEgQp+WFoRo HlQBrxBROVpVt5OjUGVnk0Bsg9iHH/kSLXSYsINcpbCqYKkGT4aIXAbSj5I3zuvo4AeY isWcFoNOwVXdn8ml3uj0t1ZgUpL+XRw6SFwhR8JGRiycSU8heEbwN5NviwfWlagsn7rH m1Nn4x8j3ucyodNHE0vzf8nuepX0+l8JEDBxcmfDyHSG0iUL+HUv5WT/MX4ZfQsyYySM Ss4Jr/XAtbtze0p14eNX/x26o76fC3wNEk4Jm5Symllew2xiTre8opohHtq5KHYcUOQf 67CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737040638; x=1737645438; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pwaNTbOXMKkPxQLJFe431ZnDyCQwmIS9FL0eLIesENM=; b=d6/OYVc0asUqE+1LjMEQkLz917lgLA2qf058RYllM6LWMEbesfOHABo0UKPuC7C5NC ksPJ6ZxE2VROzovCqgcMelmXfiVZGV+tTBx6inx2SXtFr0adK4YaxKq+CAPiBhGcK5I8 MH5Ur8+yfdvkfD0c37lNgMPOdNvOt0fFVFhqR9PgwGMji7azL9kdi8U7CXwmwTlF/o/f tv4zTcGl00N5AJ7m3iOueoxYzpjtFmAcBnIKc+qI4PQWex0FrriqJsPu1v4hJZYUmdx4 56HWL6Q2hcAo4S1sBD7cl1baCHOBpNEAqBQ4calsc+YZ9IuTyfe5d/xNr3IvIyZ8WgmN UGug== X-Forwarded-Encrypted: i=1; AJvYcCVOhd+Jl5gBLnRyk08myeRYEYELux5ej+qH+qV9fkT6zNWdfQSALdeCxORwBv4C4tBJIMY+8HC8NQ==@kvack.org X-Gm-Message-State: AOJu0YxOvjrIl4Ah96rZ0yuBRegoTu/rU5OhtxYf5vwvO9orqjAQzq+/ ppBpRSlYJ7kDQs5EyAjeXahE5tYUUzbntaNtsJRS7Pv4sIgJboJT7eKAgybTnoXkj4I9XqCN+gH 5EHEs0kIwLfA9njZJJH0T8Wian/3j0l5eVUnP X-Gm-Gg: ASbGnctdLVj96iDpYgfw8Pw+KeFrLCn+l1PeZHM04TQeNqg6GonlwgpFtcC6UGyvW9v P+h4WauMymKwax38GimU6hr3EWekkKam2XxTJQStAK2dO5AiJkcFTVlteEmWCWoOmunI= X-Google-Smtp-Source: AGHT+IFerOpLUf4+Oc1OZbSE2vohcdX1NoPD5FkxnfkSCxYZgIo5kiW5QdKvBjzYNAgCUFDG5CGEZXRAhynIVt3lt/0= X-Received: by 2002:a05:600c:4fc1:b0:428:e6eb:1340 with SMTP id 5b1f17b1804b1-4388b5296bfmr1116245e9.4.1737040637800; Thu, 16 Jan 2025 07:17:17 -0800 (PST) MIME-Version: 1.0 References: <20241213164811.2006197-1-tabba@google.com> <20241213164811.2006197-14-tabba@google.com> <9b5a7efa-1a65-4b84-af60-e8658b18bad0@amazon.co.uk> In-Reply-To: <9b5a7efa-1a65-4b84-af60-e8658b18bad0@amazon.co.uk> From: Fuad Tabba Date: Thu, 16 Jan 2025 15:16:41 +0000 X-Gm-Features: AbW1kvaxfWYfuwd0W-V5L_bwpr8WlTsKSgGhMIK4AMPErUZAQCGO6nQZU3XdW0M Message-ID: Subject: Re: [RFC PATCH v4 13/14] KVM: arm64: Handle guest_memfd()-backed guest page faults To: Patrick Roy Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, "Kalyazin, Nikita" , "Manwaring, Derek" , "Cali, Marco" , James Gowans Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 5DC49C0022 X-Rspamd-Server: rspam10 X-Stat-Signature: trbyd3bwftdodkxrmez3i8hhybuka8ch X-HE-Tag: 1737040639-160098 X-HE-Meta: U2FsdGVkX18n0sH4x1hXBYUjEBUYEbr4WoHO5nOAMKoGqIOe1lElNACnaBoY+hSNvxB/X5vHwRRlA5G+1iSuQQRrg84OvpTs4OZT7oWvFDIPj3K9QwSRVTrCkhNvm2XP8vxhFcsE49pA+b6A91lIKSnBo7MWcnwpBEEXwXOGFiD05tdwwiK37pax842N+/Vd6Uh3t28Vu5yCtTVFLJhDt++LSUqmx4UrnH6Qo6q8AyktGQ+OFhgVwNj7b4lQV3F9FifzN0LCDKNIDvjq4v/C2EoXbNpmu8+XcbuZc/Hb/N+RfUnhjNtYIv/UD8OAEAlYM1HPjKGV0wCv0N1WkkzJs8AyHOIW/YLYKGa+6JOBXkeREw9iba3QnI4mLEdzYzrItAVTVjB07lfDrc3IVyOVhN1bXXKky88VwRJOXwoEFQrbd0JA6DpswqVYpYOXiLLhMfe/7Vpzetsk3IB6R7rVIInp03Odq4dmuNPcckqse5z8WRybh4cXNltstDoGhAHmP0Og4eSkYV7nZ9Bk9ROICGNc09uyMtPJeiDrlAZDH9l55OQAJGHhzD44QS3j0jOyzus9/QXlHPb4xmPkt0iOp3DzUnumxaUTE8KgiuceeAiY+mPkwB2IrwPubuZTv2BaO0eGhuyuUeAksArWvL81UL/FN9k91i472uvv+iFdwoceoY67oHQm26zClkDElWmMH4OQsHRj7m8e9rPgtOxe3x9y+azY5CtVJs4L0UUQLCnGMH/n6E4UwWJWhUoNg9t4/XD+x9M05RE0AzHpYbYa0sQJPwsnD4xLff3BLoc18SyBOfWFZMfag+zzio31ClvJ0nlZLPQrtixcuHnGKCwyFRLUp8KVbXpMGz+MbprUdQU+OO0u78gA7BIXL6cSTS2O04HWMCyPqD7jCwSuNsEqAI3BGsKyH48CfVIUmaTl81hphO/Mxl69O8yxBn003Jy+91rhk4SayvBOOAeH3wA xgj/VVcx r6t9bsIa+8L91j5wB56NHywxz3kfXXw6KFG8oQRj2nFynE/SNOHIeVn7OKBR77Y9mKJs3+yNNd37nBJdJFN3JujdyQzPk4qE2zufaGIat6I3OTC9/TxhumgQ4jlU7ZioGeuw1VrnXzp8jjiZDWlcaX8p/mfKRdhKUfTuPwtu0gz7OsFkj54lGuqudOzXNx5ePHO6uun1ZqtD2lZjld051euWeF0Nw5nvv+hAOAJKOyRYfPzEJYNhrMHhFXYQHPCI5PrkNxAaeLjlcumlKUe4qem81HMC7s7erW+fyTX8R0C4WPYwoJm3iw0jLCK04SH1dnIBrK3NYZkqbv+mm2ZcXD8DNABv6RYKHA1GOWA8kShBL1MbwOPjiwr90nH1qqpygIlwv X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Patrick, On Thu, 16 Jan 2025 at 14:48, Patrick Roy wrote: > > On Fri, 2024-12-13 at 16:48 +0000, Fuad Tabba wrote: > > Add arm64 support for resolving guest page faults on > > guest_memfd() backed memslots. This support is not contingent on > > pKVM, or other confidential computing support, and works in both > > VHE and nVHE modes. > > > > Without confidential computing, this support is useful forQ > > testing and debugging. In the future, it might also be useful > > should a user want to use guest_memfd() for all code, whether > > it's for a protected guest or not. > > > > For now, the fault granule is restricted to PAGE_SIZE. > > > > Signed-off-by: Fuad Tabba > > --- > > arch/arm64/kvm/mmu.c | 111 ++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 109 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 342a9bd3848f..1c4b3871967c 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1434,6 +1434,107 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) > > return vma->vm_flags & VM_MTE_ALLOWED; > > } > > > > +static int guest_memfd_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > + struct kvm_memory_slot *memslot, bool fault_is_perm) > > +{ > > + struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; > > + bool exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > > + bool logging_active = memslot_is_logging(memslot); > > + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt; > > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > > + bool write_fault = kvm_is_write_fault(vcpu); > > + struct mm_struct *mm = current->mm; > > + gfn_t gfn = gpa_to_gfn(fault_ipa); > > + struct kvm *kvm = vcpu->kvm; > > + struct page *page; > > + kvm_pfn_t pfn; > > + int ret; > > + > > + /* For now, guest_memfd() only supports PAGE_SIZE granules. */ > > + if (WARN_ON_ONCE(fault_is_perm && > > + kvm_vcpu_trap_get_perm_fault_granule(vcpu) != PAGE_SIZE)) { > > + return -EFAULT; > > + } > > + > > + VM_BUG_ON(write_fault && exec_fault); > > + > > + if (fault_is_perm && !write_fault && !exec_fault) { > > + kvm_err("Unexpected L2 read permission error\n"); > > + return -EFAULT; > > + } > > + > > + /* > > + * Permission faults just need to update the existing leaf entry, > > + * and so normally don't require allocations from the memcache. The > > + * only exception to this is when dirty logging is enabled at runtime > > + * and a write fault needs to collapse a block entry into a table. > > + */ > > + if (!fault_is_perm || (logging_active && write_fault)) { > > + ret = kvm_mmu_topup_memory_cache(memcache, > > + kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu)); > > + if (ret) > > + return ret; > > + } > > + > > + /* > > + * Holds the folio lock until mapped in the guest and its refcount is > > + * stable, to avoid races with paths that check if the folio is mapped > > + * by the host. > > + */ > > + ret = kvm_gmem_get_pfn_locked(kvm, memslot, gfn, &pfn, &page, NULL); > > + if (ret) > > + return ret; > > + > > + if (!kvm_slot_gmem_is_guest_mappable(memslot, gfn)) { > > + ret = -EAGAIN; > > + goto unlock_page; > > + } > > + > > + /* > > + * Once it's faulted in, a guest_memfd() page will stay in memory. > > + * Therefore, count it as locked. > > + */ > > + if (!fault_is_perm) { > > + ret = account_locked_vm(mm, 1, true); > > + if (ret) > > + goto unlock_page; > > + } > > + > > + read_lock(&kvm->mmu_lock); > > + if (write_fault) > > + prot |= KVM_PGTABLE_PROT_W; > > + > > + if (exec_fault) > > + prot |= KVM_PGTABLE_PROT_X; > > + > > + if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) > > + prot |= KVM_PGTABLE_PROT_X; > > + > > + /* > > + * Under the premise of getting a FSC_PERM fault, we just need to relax > > + * permissions. > > + */ > > + if (fault_is_perm) > > + ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); > > + else > > + ret = kvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE, > > + __pfn_to_phys(pfn), prot, > > + memcache, > > + KVM_PGTABLE_WALK_HANDLE_FAULT | > > + KVM_PGTABLE_WALK_SHARED); > > + > > + kvm_release_faultin_page(kvm, page, !!ret, write_fault); > > + read_unlock(&kvm->mmu_lock); > > + > > + if (ret && !fault_is_perm) > > + account_locked_vm(mm, 1, false); > > +unlock_page: > > + unlock_page(page); > > + put_page(page); > > There's a double-free of `page` here, as kvm_release_faultin_page > already calls put_page. I fixed it up locally with > > + unlock_page(page); > kvm_release_faultin_page(kvm, page, !!ret, write_fault); > read_unlock(&kvm->mmu_lock); > > if (ret && !fault_is_perm) > account_locked_vm(mm, 1, false); > + goto out; > + > unlock_page: > unlock_page(page); > put_page(page); > - > +out: > return ret != -EAGAIN ? ret : 0; > } > > which I'm admittedly not sure is correct either because now the locks > don't get released in reverse order of acquisition, but with this I > was able to boot simple VMs. Thanks for that. You're right, I broke this code right before sending out the series while fixing a merge conflict. have prepared a new patch series (rebased on Linux 6.13-rc7), with this redone to be part of user_mem_abort(), as opposed to being in its own function. Makes the code cleaner more maintainable. > > + > > + return ret != -EAGAIN ? ret : 0; > > +} > > + > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > struct kvm_s2_trans *nested, > > struct kvm_memory_slot *memslot, unsigned long hva, > > @@ -1900,8 +2001,14 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > goto out_unlock; > > } > > > > - ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, > > - esr_fsc_is_permission_fault(esr)); > > + if (kvm_slot_can_be_private(memslot)) { > > For my setup, I needed > > if (kvm_mem_is_private(vcpu->kvm, gfn)) > > here instead, because I am making use of KVM_GENERIC_MEMORY_ATTRIBUTES, > and had a memslot with the `KVM_MEM_GUEST_MEMFD` flag set, but whose > gfn range wasn't actually set to KVM_MEMORY_ATTRIBUTE_PRIVATE. > > If I'm reading patch 12 correctly, your memslots always set only one of > userspace_addr or guest_memfd, and the stage 2 table setup simply checks > which one is the case to decide what to fault in, so maybe to support > both cases, this check should be > > if (kvm_mem_is_private(vcpu->kvm, gfn) || (kvm_slot_can_be_private(memslot) && !memslot->userspace_addr) > > ? I've actually missed supporting both cases, and I think your suggestion is the right way to do it. I'll fix it in the respin. Cheers, /fuad > > [1]: https://lore.kernel.org/all/20240801090117.3841080-1-tabba@google.com/ > > > + ret = guest_memfd_abort(vcpu, fault_ipa, memslot, > > + esr_fsc_is_permission_fault(esr)); > > + } else { > > + ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, > > + esr_fsc_is_permission_fault(esr)); > > + } > > + > > if (ret == 0) > > ret = 1; > > out: > > -- > > 2.47.1.613.gc27f4b7a9f-goog > > Best, > Patrick >