From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5C14C02183 for ; Thu, 16 Jan 2025 14:48:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F86E280003; Thu, 16 Jan 2025 09:48:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A869280002; Thu, 16 Jan 2025 09:48:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 548FA280003; Thu, 16 Jan 2025 09:48:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 36C2F280002 for ; Thu, 16 Jan 2025 09:48:42 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EF72C140E8D for ; Thu, 16 Jan 2025 14:48:41 +0000 (UTC) X-FDA: 83013596442.20.1D3630F Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf25.hostedemail.com (Postfix) with ESMTP id D2499A0016 for ; Thu, 16 Jan 2025 14:48:39 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=RVKNoJ3X; spf=pass (imf25.hostedemail.com: domain of "prvs=104ecff12=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=104ecff12=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737038920; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PJBYQ3uryckyAO6cux7hKD8JE0wu1xm8ATaH9PcGq0M=; b=k8SU5O1XRhyoQk1szS/uP72kKnMe1A55BUMBxpPDmNzyvCJZK3g/SLJxTxrV1juzbW5EZP AoU2Kv9ZqUV3BRiC3JFFGcu9xpD6admGdPRgYNvJAw7QKV6U14hHnjMWAc0yo5cja/Bs6F i4zUeFT57SNI2Ej3rO4w8rqv8PjcTQE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=RVKNoJ3X; spf=pass (imf25.hostedemail.com: domain of "prvs=104ecff12=roypat@amazon.co.uk" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=104ecff12=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737038920; a=rsa-sha256; cv=none; b=0nvDCjuV3PZl525gwU15+v4kdzql4mfJQvz9KUBG/HOEFkOBhe4szsI9FG+7e4cL8vCUKO 8lMMKk22TwxLXDJyIP8+st93/do+SJzOCVSObInFmp0Gj765qxTlb2fZ1VZ6uiQbfWftBG uqSnPfNFI4habHkqe0CXZy0FdbaBgY8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1737038919; x=1768574919; h=message-id:date:mime-version:from:subject:to:cc: references:in-reply-to:content-transfer-encoding; bh=PJBYQ3uryckyAO6cux7hKD8JE0wu1xm8ATaH9PcGq0M=; b=RVKNoJ3XpJcRkWMRiaVLjHee8ijqV9PLOev4hpLPYMnj7p9OT/5YvbGp oIAw1hW3MMS9pRzzwmy2dlV7uv3Y9+RNEF4YhyiNkQ1CwiLbm96yfNMce tdnaLbDfiAuDEtdJrxGaROgjV9vJG3oIkHPMc/eJYT8NVtavb3JrgK65l 0=; X-IronPort-AV: E=Sophos;i="6.13,209,1732579200"; d="scan'208";a="161936839" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Jan 2025 14:48:37 +0000 Received: from EX19MTAUWB001.ant.amazon.com [10.0.21.151:19554] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.57.137:2525] with esmtp (Farcaster) id 5790ce6e-4772-4ad0-9f91-f95c5f90cbb9; Thu, 16 Jan 2025 14:48:37 +0000 (UTC) X-Farcaster-Flow-ID: 5790ce6e-4772-4ad0-9f91-f95c5f90cbb9 Received: from EX19D003UWB004.ant.amazon.com (10.13.138.24) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Thu, 16 Jan 2025 14:48:36 +0000 Received: from EX19MTAUWA002.ant.amazon.com (10.250.64.202) by EX19D003UWB004.ant.amazon.com (10.13.138.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Thu, 16 Jan 2025 14:48:36 +0000 Received: from email-imr-corp-prod-iad-1box-1a-9bbde7a3.us-east-1.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39 via Frontend Transport; Thu, 16 Jan 2025 14:48:36 +0000 Received: from [127.0.0.1] (dev-dsk-roypat-1c-dbe2a224.eu-west-1.amazon.com [172.19.88.180]) by email-imr-corp-prod-iad-1box-1a-9bbde7a3.us-east-1.amazon.com (Postfix) with ESMTPS id D06C142230; Thu, 16 Jan 2025 14:48:28 +0000 (UTC) Message-ID: <9b5a7efa-1a65-4b84-af60-e8658b18bad0@amazon.co.uk> Date: Thu, 16 Jan 2025 14:48:27 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Patrick Roy Subject: Re: [RFC PATCH v4 13/14] KVM: arm64: Handle guest_memfd()-backed guest page faults To: Fuad Tabba , , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , "Kalyazin, Nikita" , "Manwaring, Derek" , "Cali, Marco" , James Gowans References: <20241213164811.2006197-1-tabba@google.com> <20241213164811.2006197-14-tabba@google.com> Content-Language: en-US Autocrypt: addr=roypat@amazon.co.uk; keydata= xjMEY0UgYhYJKwYBBAHaRw8BAQdA7lj+ADr5b96qBcdINFVJSOg8RGtKthL5x77F2ABMh4PN NVBhdHJpY2sgUm95IChHaXRodWIga2V5IGFtYXpvbikgPHJveXBhdEBhbWF6b24uY28udWs+ wpMEExYKADsWIQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbAwULCQgHAgIiAgYVCgkI CwIEFgIDAQIeBwIXgAAKCRBVg4tqeAbEAmQKAQC1jMl/KT9pQHEdALF7SA1iJ9tpA5ppl1J9 AOIP7Nr9SwD/fvIWkq0QDnq69eK7HqW14CA7AToCF6NBqZ8r7ksi+QLOOARjRSBiEgorBgEE AZdVAQUBAQdAqoMhGmiXJ3DMGeXrlaDA+v/aF/ah7ARbFV4ukHyz+CkDAQgHwngEGBYKACAW IQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbDAAKCRBVg4tqeAbEAtjHAQDkh5jZRIsZ 7JMNkPMSCd5PuSy0/Gdx8LGgsxxPMZwePgEAn5Tnh4fVbf00esnoK588bYQgJBioXtuXhtom 8hlxFQM= In-Reply-To: <20241213164811.2006197-14-tabba@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D2499A0016 X-Stat-Signature: 85yposrhtwd3gwm6koajyn3apfg1kawz X-HE-Tag: 1737038919-988107 X-HE-Meta: U2FsdGVkX182DQZUCZofElpZj4Rp0HqFXnuu85lSJiopnWKVtjR8dqnoCM137SonF9XYxvayLa8189qY9AlFF6HWhpJlfeDdKhn19ERvXpjQLKUQj1krmQ1f+tDPAlV1Cs4GhQ313Mu8hh0yu2ATxhdOfPLbuu8KE+SUrG2YO3lcqKOJocowbT5T5/7aMk2/0IOP4RzbVke3fHzAxzz6agb316wd/m60VZlyIanGc/S+K3jbXpCQBCH6FENvFWGavFBq/eDfJn48tDHXIyI6tadK0FpB4jW+kmxUi5iFBXL/mzYFvN0eA/PkGCtZMyLQDfhN0eHZTk90EVdfqJtxKTXeQOaBd6XnFMycDONl052AlApidsuzYM5xnAfWxCOzZ4pQPBfflhx+xhAHDwFeSkCQey4K1SzSlG3PuAXr+aj577LIFv+JlvEECE6Ic49sVFwTo5G0urLFgodoZ+DYZjz2Cxy79fHFi1zasW5WNBkCi5j6smWdi7lk8WRmFoXXryE3yBHohGujqfxmLz4JF8V0goZN357PtO0QFPhFktEn7eCGRqWKOPnt6hUX08ymc9lc6fZxVmzhuTSeRl0Ez+xzr5lPwLFvHDid4FvNvvnG6DHFZ7VhspgKmALqJ9v5Sv3EdHmLTfqSG4EloFan3HJ/eTAHEpCHScjYP0J6qI9lnD5QrZfDOiMYrPkfyUv963UExPop6CwEOECbadw5c2vQ78c4kecWHOwqf4Vh39YRukAmvKezrdO8qWa2BZholk7IaEXXRPv7GkhlPU2O92XayPVg9odaDvj8vxik0eUiuwKOAA6Nm7W2K015lZXx5LlVtH7Wahmf12FlHDNl5s1zbDadhCtpy8zKHycV6YEl737Uey2Q10MRu76R8UebiLVuCCKSi+EmG4XWsD7hzLAwLG1IChlbRz0jkZhoiIQZHM0YTOkvMCWclpZiEbKAdsIEBnN8eOPVLiEUTHm WDucu75o Xn1MqI0xB+Rp5QLH9cFDJg70csaqbDhgyPJ1l9cZYO2vaz5nu+x5gQ8bkZ5xeqLlc2D6pL0c6kNPawvzcT/uR5X3sz/rJ8/xvvO2DEs3/II7Hl9S50imcbFIHXXtLYxzo7jBXYbOun0uAydmu7+VmWzsR4ZkYJucjGpOo+qXZb6+oxQEz64K0kRTLFAubaqvKoQNGSae1YfCsNfbVLHkVtw2fruyVeFq6Q/eg6O6K9oSQF8TD5AuK39EIZhnyKdZ4ETUpEt/NBGRnUrQTfy41PNkyKJueVFkVyq+ObU9/bH13ln7eN7I7COgLHDsTwuneHveInFgfi7nWf1+93jAq8aiaHlR1nsHV4sHN+kKgGKWUk++Ufj1Taj7qxefk6+s6vJP1sjmecx2nDdW5p4VoYWOTUA0or9+Qu8XGvqSWMhwvm6vbXKmI1VKcYUFv59By4VjWxUzd4i9HWOblBYIC0b/OOdxRpMwjKkZVY3+xSrsKR/vLI7SiDM+uWxTEpoN52Z6WnMqWHI721qNTYsZ/PDnBe076isqLcm3oJwi74Nhjwlz/+xKSrAew+9tOC202wunj08pEE5p7Sow96OACv+mBlmMMEzAPq0uhkmJ7L77xJCnI9ieiiIBiv2HXnlq9XU+Zf5y+/RDhYOEssOutOi7rp+wW/r6fDAFr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 2024-12-13 at 16:48 +0000, Fuad Tabba wrote: > Add arm64 support for resolving guest page faults on > guest_memfd() backed memslots. This support is not contingent on > pKVM, or other confidential computing support, and works in both > VHE and nVHE modes. > > Without confidential computing, this support is useful forQ > testing and debugging. In the future, it might also be useful > should a user want to use guest_memfd() for all code, whether > it's for a protected guest or not. > > For now, the fault granule is restricted to PAGE_SIZE. > > Signed-off-by: Fuad Tabba > --- > arch/arm64/kvm/mmu.c | 111 ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 109 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 342a9bd3848f..1c4b3871967c 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1434,6 +1434,107 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) > return vma->vm_flags & VM_MTE_ALLOWED; > } > > +static int guest_memfd_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > + struct kvm_memory_slot *memslot, bool fault_is_perm) > +{ > + struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; > + bool exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > + bool logging_active = memslot_is_logging(memslot); > + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt; > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > + bool write_fault = kvm_is_write_fault(vcpu); > + struct mm_struct *mm = current->mm; > + gfn_t gfn = gpa_to_gfn(fault_ipa); > + struct kvm *kvm = vcpu->kvm; > + struct page *page; > + kvm_pfn_t pfn; > + int ret; > + > + /* For now, guest_memfd() only supports PAGE_SIZE granules. */ > + if (WARN_ON_ONCE(fault_is_perm && > + kvm_vcpu_trap_get_perm_fault_granule(vcpu) != PAGE_SIZE)) { > + return -EFAULT; > + } > + > + VM_BUG_ON(write_fault && exec_fault); > + > + if (fault_is_perm && !write_fault && !exec_fault) { > + kvm_err("Unexpected L2 read permission error\n"); > + return -EFAULT; > + } > + > + /* > + * Permission faults just need to update the existing leaf entry, > + * and so normally don't require allocations from the memcache. The > + * only exception to this is when dirty logging is enabled at runtime > + * and a write fault needs to collapse a block entry into a table. > + */ > + if (!fault_is_perm || (logging_active && write_fault)) { > + ret = kvm_mmu_topup_memory_cache(memcache, > + kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu)); > + if (ret) > + return ret; > + } > + > + /* > + * Holds the folio lock until mapped in the guest and its refcount is > + * stable, to avoid races with paths that check if the folio is mapped > + * by the host. > + */ > + ret = kvm_gmem_get_pfn_locked(kvm, memslot, gfn, &pfn, &page, NULL); > + if (ret) > + return ret; > + > + if (!kvm_slot_gmem_is_guest_mappable(memslot, gfn)) { > + ret = -EAGAIN; > + goto unlock_page; > + } > + > + /* > + * Once it's faulted in, a guest_memfd() page will stay in memory. > + * Therefore, count it as locked. > + */ > + if (!fault_is_perm) { > + ret = account_locked_vm(mm, 1, true); > + if (ret) > + goto unlock_page; > + } > + > + read_lock(&kvm->mmu_lock); > + if (write_fault) > + prot |= KVM_PGTABLE_PROT_W; > + > + if (exec_fault) > + prot |= KVM_PGTABLE_PROT_X; > + > + if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) > + prot |= KVM_PGTABLE_PROT_X; > + > + /* > + * Under the premise of getting a FSC_PERM fault, we just need to relax > + * permissions. > + */ > + if (fault_is_perm) > + ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot); > + else > + ret = kvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE, > + __pfn_to_phys(pfn), prot, > + memcache, > + KVM_PGTABLE_WALK_HANDLE_FAULT | > + KVM_PGTABLE_WALK_SHARED); > + > + kvm_release_faultin_page(kvm, page, !!ret, write_fault); > + read_unlock(&kvm->mmu_lock); > + > + if (ret && !fault_is_perm) > + account_locked_vm(mm, 1, false); > +unlock_page: > + unlock_page(page); > + put_page(page); There's a double-free of `page` here, as kvm_release_faultin_page already calls put_page. I fixed it up locally with + unlock_page(page); kvm_release_faultin_page(kvm, page, !!ret, write_fault); read_unlock(&kvm->mmu_lock); if (ret && !fault_is_perm) account_locked_vm(mm, 1, false); + goto out; + unlock_page: unlock_page(page); put_page(page); - +out: return ret != -EAGAIN ? ret : 0; } which I'm admittedly not sure is correct either because now the locks don't get released in reverse order of acquisition, but with this I was able to boot simple VMs. > + > + return ret != -EAGAIN ? ret : 0; > +} > + > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_s2_trans *nested, > struct kvm_memory_slot *memslot, unsigned long hva, > @@ -1900,8 +2001,14 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > goto out_unlock; > } > > - ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, > - esr_fsc_is_permission_fault(esr)); > + if (kvm_slot_can_be_private(memslot)) { For my setup, I needed if (kvm_mem_is_private(vcpu->kvm, gfn)) here instead, because I am making use of KVM_GENERIC_MEMORY_ATTRIBUTES, and had a memslot with the `KVM_MEM_GUEST_MEMFD` flag set, but whose gfn range wasn't actually set to KVM_MEMORY_ATTRIBUTE_PRIVATE. If I'm reading patch 12 correctly, your memslots always set only one of userspace_addr or guest_memfd, and the stage 2 table setup simply checks which one is the case to decide what to fault in, so maybe to support both cases, this check should be if (kvm_mem_is_private(vcpu->kvm, gfn) || (kvm_slot_can_be_private(memslot) && !memslot->userspace_addr) ? [1]: https://lore.kernel.org/all/20240801090117.3841080-1-tabba@google.com/ > + ret = guest_memfd_abort(vcpu, fault_ipa, memslot, > + esr_fsc_is_permission_fault(esr)); > + } else { > + ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva, > + esr_fsc_is_permission_fault(esr)); > + } > + > if (ret == 0) > ret = 1; > out: > -- > 2.47.1.613.gc27f4b7a9f-goog Best, Patrick