From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87D5DC021A1 for ; Tue, 11 Feb 2025 15:57:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09552280002; Tue, 11 Feb 2025 10:57:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04545280001; Tue, 11 Feb 2025 10:57:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E27B8280002; Tue, 11 Feb 2025 10:57:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C33C1280001 for ; Tue, 11 Feb 2025 10:57:57 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 74B0B14044F for ; Tue, 11 Feb 2025 15:57:57 +0000 (UTC) X-FDA: 83108119794.18.E759E0B Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf02.hostedemail.com (Postfix) with ESMTP id 7B63880003 for ; Tue, 11 Feb 2025 15:57:55 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="t4HH/9sI"; spf=pass (imf02.hostedemail.com: domain of qperret@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739289475; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gSy9CeX8Agir4rGMRLeCG9EkflqYF5Cr1FhyZfH86h4=; b=ePZTIrtomIljkpyl1rN8jtZPGFESCpxOxNCIpp0VitYjxwqmeBANLk/7WR25uiXtkAwt+w Ok68EKK2IxLJllTtZb6OAUE5dCiP9tib8EX8STsjUe122fU6cU5DCxfmAz/191uthZYBx8 pxvlg32jZN2XqJ46Pl3LIAZl57Y4jT8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="t4HH/9sI"; spf=pass (imf02.hostedemail.com: domain of qperret@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=qperret@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739289475; a=rsa-sha256; cv=none; b=7c9TublVIqhIn0FY73Z/+Clf12RMRfPLcmoPF1nzlqqo3T/psMn1Q4WsAx5+G5Ubej79xf FtV8ixD3qxu36PrRM7snLhJN47GSJLUver+IPu2mSvYQFXu3sE1orWvdAHuEbvuw414hMx wLBs2Mv5/zbFyT1T05NQ1GstciMISnU= Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-5de56ff9851so7143351a12.2 for ; Tue, 11 Feb 2025 07:57:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739289474; x=1739894274; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=gSy9CeX8Agir4rGMRLeCG9EkflqYF5Cr1FhyZfH86h4=; b=t4HH/9sIV8+L7JXckOLVDGcwVtOROrtAylGhhqUYrYiySvMdphjM8Rl7aitS+mrUoq bsbNpan8AUj4iAHk5yiZ3nxI6m9g9yTZrFuHRt7XaHle0tI7S67PrS9QGVKQQyd0U/x2 tB5YPEakD+Uiw4u4om4AuW85YxQqft0MPWDawyZV3s/2xbyePpxua6cC0j9yLGlBqE+o wQwr12j1Hvwssh9r5ut+x61rPLZvYRbCjyI5yWg/e3TwMYmcOP9nEZPp9Ov8lz0ub9Pd ImuxkOR0YJctAPFgglP3XBhQFIGdvBGkhzBaACpPpRlxwiaXiOrzSTTRQjIlKtjQY+1n Myfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739289474; x=1739894274; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gSy9CeX8Agir4rGMRLeCG9EkflqYF5Cr1FhyZfH86h4=; b=bgczXKZcP8HDYvAWSYMTZMFquGp/Wya4wXuN9JvVe+/yKqBgOEshRA5Wd7a05TJpLA capc6fQ2WLN4MCvbNvqc92o1aBukKyGcsqLTYvxuUmoufAOG5RRYqyhJbKg5tr0wys77 9PY42rMyfaYlo6ih1PR0b5hQbWLdweb4AUMNzdo4Hc+Amxz+8mIN/i/1puzLUpe3yWcb qDj92durLD4bmbbGfg/eOYGjnIXKh7OUVkRZARcY591NyLRkBQGa62JTwzKcy6mfsyns 0i/SZEzMzXFyd8lPm9PgaQHSGV5e1DLTGIWb0AaVvfHMfgIBQketZPuU7HzqDt05g1Z6 yJ0Q== X-Forwarded-Encrypted: i=1; AJvYcCU76UaacLdVFycbiFqPpk6fYtuKZQleLgKtM4VC/Nts6TgRPqsEuEutW7LnS1OeAD0vLFViwusrdA==@kvack.org X-Gm-Message-State: AOJu0Yx9ACRYw7GpSx3KygWCio5bIu0oJH7iAiB6yIq7XKoRYagZTJzd h6Uk/xTuc9NVMW0xf3yKLr8I/K9y7+MGmd2N6HuQ0xHzn99lLicgEfqGbxVKpA== X-Gm-Gg: ASbGncv56bTU2hMFIwqZLqueN6nEjmM3UlxhVmUso3j7eiyY2BpdczjBSAyZPA6wHBf p/YuGqIG7mUJU5HcX5yXFQossDOkidOEeDS0OoKTp0iIkWM+JK+aTCk7g2zIn4XWDQb4g4kmRQY TeTNT5ssKqS0RQXRM0DeqFJ8b657a9LDexSyx7/69BWw/Otn1omFE7zNj9cJyKPsSP8rVQzIdtt SkCA2e6li5+l2s7Whg6nDdchRilzCXaf7x7r2+g9paFcijBO01OHixjuAzWNh2qn7lGCID4MYcd BBGe6AgrJdwc+VgTRqhnMhtQmCh7e0p+jISlMp7b/LiHUgO1X7wu X-Google-Smtp-Source: AGHT+IHvBbSF7MaP8XjqV6V+IWo7/0vGZiaY1WWsB3xUcJqEU+O4V3CWmqqms9kl920x44mAhGChyA== X-Received: by 2002:a05:6402:238a:b0:5dc:e393:af63 with SMTP id 4fb4d7f45d1cf-5de9a3dc461mr3738748a12.16.1739289473700; Tue, 11 Feb 2025 07:57:53 -0800 (PST) Received: from google.com (229.112.91.34.bc.googleusercontent.com. [34.91.112.229]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5dcf1b7b0a2sm9755130a12.22.2025.02.11.07.57.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Feb 2025 07:57:52 -0800 (PST) Date: Tue, 11 Feb 2025 15:57:48 +0000 From: Quentin Perret To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Subject: Re: [PATCH v3 08/11] KVM: arm64: Handle guest_memfd()-backed guest page faults Message-ID: References: <20250211121128.703390-1-tabba@google.com> <20250211121128.703390-9-tabba@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250211121128.703390-9-tabba@google.com> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7B63880003 X-Stat-Signature: a76d5p35xg8qhcw73oxydqgf6nrixxc3 X-HE-Tag: 1739289475-119100 X-HE-Meta: U2FsdGVkX1+xoVVh24f+fzLSsWnvAh0U+GyPPLi78f5gxpp6vozyGdWFEv9qSxg1aPvBb1tuhxW7Ketz/iWY5FFLx5DByQ5y1sFkeh5fkHqvkXJYLWDoZoQJzC9YMuPbkE+MPRY5BlFcXUGpOG5vag/WPB6QYweKofyR5jy8Yh1rEthIxtqHQafjYHCz7v5aAKmHrg4Q5FfTr+hefqpiKPshP8SFS4zFGHaOBI6QH4/ihxiC9wYsFuEd0MUUN1kEvttWRrwSCvD1SeFMJPI8utvPURDaKdnsuzptCfMK3JE5JrRRKFoF3qRu5B7S7iJz+mJtRTLN2MpgAhl8RPpm+m4B7vWzc/xfL0Vm1G+RBrsz2pbGtazsP1tWGtD3JkyHWe7BGSdoZzI2kQp48Z2/5CqTYAVq6FWjNT+rkcx/ejKgQsrHDhQxKmUqw/BvsYyWstAzyIWqqrmo13ewAT5QJ76t96lcOpN1+g04fGCFVVIb5x4/qr5elnfSV6F1CZ3kMLbYH75xPkozuSAaPYBgKjkMwOnbgTGpp0m6dXcLDyX+qVSz2pvvpeDpj/czKLY15++9xucqptv0+fKsvYoE+ln5WAaqNUDZi/WohB31f1TZING5ZPqAzIuozbGhyiC9Y0aOva7Cw4WiDV00v0uzlFUV5Sr1KRSvW7sgjaMboTYaRTquXt98S43quLxMzRbCQ0/2BZTJI29TiLcE7BpSFGpqLkvozSSWJ2N7bsETC+R1bzMDQTyCS09KiSy09dhzlyiydOf9OUSD9AwaXKoqXjI18ExTzNeS5nZBQIsef/j980y4Sc0JsG385ytexJ2EuC4q0bxgazAcjMoXrTzqPUitC8dd+WCHLJpzkyzBjsH65Cd9j9LQquMA7G81TmuKgDdhk6tFQl7vBZ9I7cL0BMVcuufo608g6JyzsdI3EmqE/NkSHZCaumZPWSl9HEwUT1RWKUFb6rBdEhuZ58o hY5HrlwH KusmCpVLWm1QJg/RLRaF5wamZZr//XLQ37KIWqYTjLliLNXaXgQedf/Lt8h5KbbJA56sUoOjlhOeR81BAoq41xuVXhGDvCVgSWOO+LfwIBvIZH4EaTUUv9Ho7GH/Fe0YWt8Sopd8Wi/ZZs5HPTcNgpIyxDckdDygq5GWLoBSrp0aBqHQRX/DHRm+DbCzA8FcwInQVjI/iVNbspvNM44rLRQ3HB4JgKJxygua7xKFGqBh7DEoLCUguVV8XEMciWGrPPc90cdxJoD+mr5r9VuQEw503+vn4ahB77IoKPCK+FvZWKtOZNlmc3pmQyf+Dm/zdBJhs266F4Zc8aXBnyp8gJJxDO7d+mlI4wPRTsuFR+LSdGPuWo+WF9U6FG545ZASkzaZNOZoF7YLvMIgVK7KAtKxGUJWYU8exIdiiCDs2d7AZWWIiTxNRhwMW7/Y9w4qM9CRzWs/Hh0bGF2WjxV0M90zzMg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hey Fuad, On Tuesday 11 Feb 2025 at 12:11:24 (+0000), Fuad Tabba wrote: > Add arm64 support for handling guest page faults on guest_memfd > backed memslots. > > For now, the fault granule is restricted to PAGE_SIZE. > > Signed-off-by: Fuad Tabba > --- > arch/arm64/kvm/mmu.c | 84 ++++++++++++++++++++++++++-------------- > include/linux/kvm_host.h | 5 +++ > virt/kvm/kvm_main.c | 5 --- > 3 files changed, 61 insertions(+), 33 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index b6c0acb2311c..305060518766 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1454,6 +1454,33 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) > return vma->vm_flags & VM_MTE_ALLOWED; > } > > +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > + gfn_t gfn, bool write_fault, bool *writable, > + struct page **page, bool is_private) > +{ > + kvm_pfn_t pfn; > + int ret; > + > + if (!is_private) > + return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page); > + > + *writable = false; > + > + if (WARN_ON_ONCE(write_fault && memslot_is_readonly(slot))) > + return KVM_PFN_ERR_NOSLOT_MASK; I believe this check is superfluous, we should decide to report an MMIO exit to userspace for write faults to RO memslots and not get anywhere near user_mem_abort(). And nit but the error code should probably be KVM_PFN_ERR_RO_FAULT or something instead? > + > + ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL); > + if (!ret) { > + *writable = write_fault; In normal KVM, if we're not dirty logging we'll actively map the page as writable if both the memslot and the userspace mappings are writable. With gmem, the latter doesn't make much sense, but essentially the underlying page should really be writable (e.g. no CoW getting in the way and such?). If so, then perhaps make this *writable = !memslot_is_readonly(slot); Wdyt? > + return pfn; > + } > + > + if (ret == -EHWPOISON) > + return KVM_PFN_ERR_HWPOISON; > + > + return KVM_PFN_ERR_NOSLOT_MASK; > +} > + > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_s2_trans *nested, > struct kvm_memory_slot *memslot, unsigned long hva, > @@ -1461,25 +1488,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > { > int ret = 0; > bool write_fault, writable; > - bool exec_fault, mte_allowed; > + bool exec_fault, mte_allowed = false; > bool device = false, vfio_allow_any_uc = false; > unsigned long mmu_seq; > phys_addr_t ipa = fault_ipa; > struct kvm *kvm = vcpu->kvm; > - struct vm_area_struct *vma; > + struct vm_area_struct *vma = NULL; > short vma_shift; > void *memcache; > - gfn_t gfn; > + gfn_t gfn = ipa >> PAGE_SHIFT; > kvm_pfn_t pfn; > bool logging_active = memslot_is_logging(memslot); > - bool force_pte = logging_active || is_protected_kvm_enabled(); > - long vma_pagesize, fault_granule; > + bool is_private = kvm_mem_is_private(kvm, gfn); Just trying to understand the locking rule for the xarray behind this. Is it kvm->srcu that protects it for reads here? Something else? > + bool force_pte = logging_active || is_private || is_protected_kvm_enabled(); > + long vma_pagesize, fault_granule = PAGE_SIZE; > enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > struct kvm_pgtable *pgt; > struct page *page; > enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED; > > - if (fault_is_perm) > + if (fault_is_perm && !is_private) Nit: not strictly necessary I think. > fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu); > write_fault = kvm_is_write_fault(vcpu); > exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > @@ -1510,24 +1538,30 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > return ret; > } > > + mmap_read_lock(current->mm); > + > /* > * Let's check if we will get back a huge page backed by hugetlbfs, or > * get block mapping for device MMIO region. > */ > - mmap_read_lock(current->mm); > - vma = vma_lookup(current->mm, hva); > - if (unlikely(!vma)) { > - kvm_err("Failed to find VMA for hva 0x%lx\n", hva); > - mmap_read_unlock(current->mm); > - return -EFAULT; > - } > + if (!is_private) { > + vma = vma_lookup(current->mm, hva); > + if (unlikely(!vma)) { > + kvm_err("Failed to find VMA for hva 0x%lx\n", hva); > + mmap_read_unlock(current->mm); > + return -EFAULT; > + } > > - /* > - * logging_active is guaranteed to never be true for VM_PFNMAP > - * memslots. > - */ > - if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) > - return -EFAULT; > + /* > + * logging_active is guaranteed to never be true for VM_PFNMAP > + * memslots. > + */ > + if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) > + return -EFAULT; > + > + vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; > + mte_allowed = kvm_vma_mte_allowed(vma); > + } > > if (force_pte) > vma_shift = PAGE_SHIFT; > @@ -1597,18 +1631,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > ipa &= ~(vma_pagesize - 1); > } > > - gfn = ipa >> PAGE_SHIFT; > - mte_allowed = kvm_vma_mte_allowed(vma); > - > - vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; > - > /* Don't use the VMA after the unlock -- it may have vanished */ > vma = NULL; > > /* > * Read mmu_invalidate_seq so that KVM can detect if the results of > - * vma_lookup() or __kvm_faultin_pfn() become stale prior to > - * acquiring kvm->mmu_lock. > + * vma_lookup() or faultin_pfn() become stale prior to acquiring > + * kvm->mmu_lock. > * > * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs > * with the smp_wmb() in kvm_mmu_invalidate_end(). > @@ -1616,8 +1645,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > mmu_seq = vcpu->kvm->mmu_invalidate_seq; > mmap_read_unlock(current->mm); > > - pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0, > - &writable, &page); > + pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_private); > if (pfn == KVM_PFN_ERR_HWPOISON) { > kvm_send_hwpoison_signal(hva, vma_shift); > return 0; > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 39fd6e35c723..415c6274aede 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn) > return gfn_to_memslot(kvm, gfn)->id; > } > > +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot) > +{ > + return slot->flags & KVM_MEM_READONLY; > +} > + > static inline gfn_t > hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot) > { > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 38f0f402ea46..3e40acb9f5c0 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2624,11 +2624,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn) > return size; > } > > -static bool memslot_is_readonly(const struct kvm_memory_slot *slot) > -{ > - return slot->flags & KVM_MEM_READONLY; > -} > - > static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn, > gfn_t *nr_pages, bool write) > { > -- > 2.48.1.502.g6dc24dfdaf-goog >