From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67D93C0219E for ; Tue, 11 Feb 2025 16:14:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 030346B0082; Tue, 11 Feb 2025 11:14:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F22086B0085; Tue, 11 Feb 2025 11:14:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC18D6B0088; Tue, 11 Feb 2025 11:14:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id BDE8B6B0082 for ; Tue, 11 Feb 2025 11:14:07 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6206DC047E for ; Tue, 11 Feb 2025 16:14:07 +0000 (UTC) X-FDA: 83108160534.30.A9D5195 Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by imf05.hostedemail.com (Postfix) with ESMTP id 7BF2B100013 for ; Tue, 11 Feb 2025 16:14:05 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Xk2yHhtO; spf=pass (imf05.hostedemail.com: domain of tabba@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739290445; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aoJ0q+XDHgANL8EUDkwEMGibOt7px2JbC70hf9oWDOU=; b=lGGhs0HSe4QlbiFC9Ij4sz84ml/uaIBJiA59yU+HicqyMD1kQ9dUvf4BE0gOuppmOiJ7sU jorz06p7GABJhAh22qfkIrlZBo7tp89NvKZhMy3+pQJzz+OcGJ+AOujtCXb1gxe+HfL2uk Yl/e9n4ZTa1bxMsRZfwiMvRowv2U9j0= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Xk2yHhtO; spf=pass (imf05.hostedemail.com: domain of tabba@google.com designates 209.85.160.175 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739290445; a=rsa-sha256; cv=none; b=E2I7HhJg8W71lChzLnAojjOEDY2A4qJe1s3PA3Cr1rKBvtHl9ZUhmYURw0RZ9eFhfMK686 IMeTcLsKE2CaVsTlQh7xzXJcEscKP4WeSXEGxhnIph8Z3ypfeIDnZwcnAl/WWw1qgQaq8H cYjEinz/TVNrBLzXZF+Wv6MK2IT0AZc= Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-47190a013d4so247421cf.1 for ; Tue, 11 Feb 2025 08:14:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739290445; x=1739895245; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=aoJ0q+XDHgANL8EUDkwEMGibOt7px2JbC70hf9oWDOU=; b=Xk2yHhtOzxpdtiwbxYdeoe2lXjvsxMEhVv1jMfQprNwdpbfyrkFPn4Mn1cFibD7Isn XQGHasc+YO0oEQNVDVT2IThoaVDiV/9R+T1whIKfu6I+T1LMAJSnAdUHxIm3ay6zI5Cl NU090jytAQH/MbeGUOQo40qr0go4orkI1/kwb5BKXbTg4wBd96Z220sLDcyHtUwTjrrT FLgMEUXqbtVc6oPUeKzMyjg+c4nYhDfjp2gWWpt2/WZPFnX2ArQmCeCNINLmGEdU5CR3 XkrQLqeMhXTJAv2VKWIodFDjMmyb3WJROkq8LJzx74GI+fjXtgWNEl1Y42IYIayF6RKq /AdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739290445; x=1739895245; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=aoJ0q+XDHgANL8EUDkwEMGibOt7px2JbC70hf9oWDOU=; b=N9wmWi2ArL2pfAEDUxn8Bt0F1aZm922E/gz61rHqN/BdoQNdpvD+bCOYG4i4OI0JXU +ER4v7Ej1d1X5BF+CNuC6HD/VZhNm1dvUuvMgRCzM/03MzfVS1BzSivLaZLQD5ShbOpe tgxfcuvYLSaUyYsWDX+qq6ThwfbabymtTa4GTcJu+6d8SE/pXi/Xuh4F7zx9L+lalLzm NNnV+AgZR/AvxP5WGqHEclTd3DmoEccTP/6wS6NWfJ+xffkJXnEnTJ3Gozgm8FusqaMA K6/WuWy9zFfjF7tS6GOMzBBjOnTay5fZDWG43W3LF5n3hZdas3KD0QWnksmFnH5q+Edl 3szw== X-Forwarded-Encrypted: i=1; AJvYcCWM+RhKBsdhe/kliBg6QgENBv7m9TX91gIk8I7I4OJPdRVHQtw9KWMsfi9JImoOuz7bg1TpkMo5og==@kvack.org X-Gm-Message-State: AOJu0YxUg0SvLddE4ruREDv/vOr5Fj8PSJnjmSxzr2ttdeh7o1l0+sul BqAT1Nf4Pv96s06v3u7v650anw0dQ+d6ni+3yecQt6ifZEdXlOjm5r42Jq04TK21rNJVYwiNUIm TU9HC7hKCs2qOwdjrEvtacjk5nkHa+qhuumCj X-Gm-Gg: ASbGncunhxMqbc4TDMvNL1kQWdFDIYZduymV7TjIk59KD8prNCyJz4Cyin5tS9oAJD5 gtEoGXZNXQklOb7wa5sQGl4Zq8hBq8yZdIf0q8loM+zLKeOpacwgXHUal+AUnidk2Vs3QGJM= X-Google-Smtp-Source: AGHT+IH2JocFmeQmVE+8uxEob+8CbsfBF3u0aOITHhvzJlgliVUgm7lzG27OsHhP83psTgTw03Jf4STjLpWaE0RJMnI= X-Received: by 2002:a05:622a:1802:b0:467:7c30:3446 with SMTP id d75a77b69052e-471a410abe6mr3119701cf.25.1739290444336; Tue, 11 Feb 2025 08:14:04 -0800 (PST) MIME-Version: 1.0 References: <20250211121128.703390-1-tabba@google.com> <20250211121128.703390-9-tabba@google.com> In-Reply-To: From: Fuad Tabba Date: Tue, 11 Feb 2025 16:13:27 +0000 X-Gm-Features: AWEUYZmQhqPKBkPaxPict4Uftyo6wnDk_xJ78YA7yFNPy6bD4Ac-aJJpqTvI0Os Message-ID: Subject: Re: [PATCH v3 08/11] KVM: arm64: Handle guest_memfd()-backed guest page faults To: Quentin Perret Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7BF2B100013 X-Stat-Signature: 7zfetraph1hrym5pie7z35bfy63depj9 X-HE-Tag: 1739290445-135385 X-HE-Meta: U2FsdGVkX1+jNzr/IqnQkiUXFjwofpvo3AJ4Yyyg72hHhNOju5LYrMScatqSZqXsR2uFu8MGohiw42AVnqfwHeS2rlas74P7PRTU60Bs6qf/JUKg3RbZHmkB9ldqMR5Sey8WUmspc0DzafP6dpEy2fus/hCxtxEY4kVBieEzRE4E+WZac4sFf/U/eIe1J7OLZ4dzOUSGeobVTa8awdRC0LsPuJmnfjQZJ7zRTnZnh9HMMiThz2X5tEn8c6/B5koUZqukqf/Y8y9ViDhQulrIquOYxNCq5NxAkUy2BPyFYcGmqO0d025Fj7D/g5c6cJ07ohMoy/VPjw4+Rkq9lbfopA8WoVg8uluq9JTuab2zt8nDUmYS0+c3wRYieGkq56JMqHgdBXrrujxso42gOZZBbWosga69rTK+ZY9i+NiGl+WfwRE+yOJwwgQodcyNEiRvsDKvChKdUcAq+bLX8kJmceLeengLOnAQpruWTOd/BSRD5fBRJEQYeDJm+gpRVxckr5NuX9RcjIYnn17y4vkjvz6DpGBAJunyGP3xbKg2N8A4qWqek7QTrQymtkNF7HT8wJ1+BByxu42lQJGpQh05Cp7zVKS4BQLhLjt5Gdffu7064f5JnkdYI8xNVT2gN70Bhxh9ErWdfWcvMnw/UECZP6/5ZYOfF6C8rqtpf8IajSjjwSyBmCptz3nt3S5DDf0wtbmQ8fYwurUvCtdk1m64HyqH1/MKkChBjb/RmtYq1WtN4XvSulpx7F0C6eGtw3qvvpTB2NOaLhA/+J/hsC7H/WNFDmDP9l28c7+/Rur1AsR6JZkQNb77JpuRpPpS6Xh5cOZYcRulGZq/HK47Kq3jQfnHYZFIHOC8H7r51L6Prv1/G28l3l1Qsc8HCbSeqPNQhUcyAeqiU3y2gulGqg4ada/r+VlfQN1Yhlms2XhqQHA4G7ndHa4yGTcLfXsRaL07tiZxJtFKK3a1cBVr/5k L7sC8OnM sWljhX2qBMNqmZWV9Um6NLOLLzaqFAbIRE1XeOeQ1XPJKtMdMT4feitX/hAQvrldKjw2BDFMKp2FNDYV+tWchvsxaMC8m6+DqL8nE8jBM5K+ujAjYhSXtWf+IyxTLoFhsrYo3XJ4R5FDqZGY/+piH/bj2FJCvWAyzMwZanpUVQUdfZcUmu7ycN8TFjyaHczoJT6wAs1UZH3tLnOcefmbLQUQ0im7w6idx2mcEaUD7L6cWBYwgeEl0VRlEXRwrKw338G6Roj6zfnZlcDlpq5+pYG0ig8VJ0HxKSvrWKd8xqpBzmvkPg7VkzIF52zrNLJFrxkMiR/4ykjeebCoOUrfcNI+eyQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Quentin, On Tue, 11 Feb 2025 at 15:57, Quentin Perret wrote: > > Hey Fuad, > > On Tuesday 11 Feb 2025 at 12:11:24 (+0000), Fuad Tabba wrote: > > Add arm64 support for handling guest page faults on guest_memfd > > backed memslots. > > > > For now, the fault granule is restricted to PAGE_SIZE. > > > > Signed-off-by: Fuad Tabba > > --- > > arch/arm64/kvm/mmu.c | 84 ++++++++++++++++++++++++++-------------- > > include/linux/kvm_host.h | 5 +++ > > virt/kvm/kvm_main.c | 5 --- > > 3 files changed, 61 insertions(+), 33 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index b6c0acb2311c..305060518766 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1454,6 +1454,33 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) > > return vma->vm_flags & VM_MTE_ALLOWED; > > } > > > > +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, > > + gfn_t gfn, bool write_fault, bool *writable, > > + struct page **page, bool is_private) > > +{ > > + kvm_pfn_t pfn; > > + int ret; > > + > > + if (!is_private) > > + return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_WRITE : 0, writable, page); > > + > > + *writable = false; > > + > > + if (WARN_ON_ONCE(write_fault && memslot_is_readonly(slot))) > > + return KVM_PFN_ERR_NOSLOT_MASK; > > I believe this check is superfluous, we should decide to report an MMIO > exit to userspace for write faults to RO memslots and not get anywhere > near user_mem_abort(). And nit but the error code should probably be > KVM_PFN_ERR_RO_FAULT or something instead? I tried to replicate the behavior of __kvm_faultin_pfn() here (but got the wrong error!). I think you're right though that in the arm64 case, this check isn't needed. Should I fix the return error and keep the warning though? > > + > > + ret = kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL); > > + if (!ret) { > > + *writable = write_fault; > > In normal KVM, if we're not dirty logging we'll actively map the page as > writable if both the memslot and the userspace mappings are writable. > With gmem, the latter doesn't make much sense, but essentially the > underlying page should really be writable (e.g. no CoW getting in the > way and such?). If so, then perhaps make this > > *writable = !memslot_is_readonly(slot); > > Wdyt? Ack. > > + return pfn; > > + } > > + > > + if (ret == -EHWPOISON) > > + return KVM_PFN_ERR_HWPOISON; > > + > > + return KVM_PFN_ERR_NOSLOT_MASK; > > +} > > + > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > struct kvm_s2_trans *nested, > > struct kvm_memory_slot *memslot, unsigned long hva, > > @@ -1461,25 +1488,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > { > > int ret = 0; > > bool write_fault, writable; > > - bool exec_fault, mte_allowed; > > + bool exec_fault, mte_allowed = false; > > bool device = false, vfio_allow_any_uc = false; > > unsigned long mmu_seq; > > phys_addr_t ipa = fault_ipa; > > struct kvm *kvm = vcpu->kvm; > > - struct vm_area_struct *vma; > > + struct vm_area_struct *vma = NULL; > > short vma_shift; > > void *memcache; > > - gfn_t gfn; > > + gfn_t gfn = ipa >> PAGE_SHIFT; > > kvm_pfn_t pfn; > > bool logging_active = memslot_is_logging(memslot); > > - bool force_pte = logging_active || is_protected_kvm_enabled(); > > - long vma_pagesize, fault_granule; > > + bool is_private = kvm_mem_is_private(kvm, gfn); > > Just trying to understand the locking rule for the xarray behind this. > Is it kvm->srcu that protects it for reads here? Something else? I'm not sure I follow. Which xarray are you referring to? > > > + bool force_pte = logging_active || is_private || is_protected_kvm_enabled(); > > + long vma_pagesize, fault_granule = PAGE_SIZE; > > enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > > struct kvm_pgtable *pgt; > > struct page *page; > > enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED; > > > > - if (fault_is_perm) > > + if (fault_is_perm && !is_private) > > Nit: not strictly necessary I think. You're right. Thanks, /fuad > > fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu); > > write_fault = kvm_is_write_fault(vcpu); > > exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > > @@ -1510,24 +1538,30 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > return ret; > > } > > > > + mmap_read_lock(current->mm); > > + > > /* > > * Let's check if we will get back a huge page backed by hugetlbfs, or > > * get block mapping for device MMIO region. > > */ > > - mmap_read_lock(current->mm); > > - vma = vma_lookup(current->mm, hva); > > - if (unlikely(!vma)) { > > - kvm_err("Failed to find VMA for hva 0x%lx\n", hva); > > - mmap_read_unlock(current->mm); > > - return -EFAULT; > > - } > > + if (!is_private) { > > + vma = vma_lookup(current->mm, hva); > > + if (unlikely(!vma)) { > > + kvm_err("Failed to find VMA for hva 0x%lx\n", hva); > > + mmap_read_unlock(current->mm); > > + return -EFAULT; > > + } > > > > - /* > > - * logging_active is guaranteed to never be true for VM_PFNMAP > > - * memslots. > > - */ > > - if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) > > - return -EFAULT; > > + /* > > + * logging_active is guaranteed to never be true for VM_PFNMAP > > + * memslots. > > + */ > > + if (WARN_ON_ONCE(logging_active && (vma->vm_flags & VM_PFNMAP))) > > + return -EFAULT; > > + > > + vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; > > + mte_allowed = kvm_vma_mte_allowed(vma); > > + } > > > > if (force_pte) > > vma_shift = PAGE_SHIFT; > > @@ -1597,18 +1631,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > ipa &= ~(vma_pagesize - 1); > > } > > > > - gfn = ipa >> PAGE_SHIFT; > > - mte_allowed = kvm_vma_mte_allowed(vma); > > - > > - vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; > > - > > /* Don't use the VMA after the unlock -- it may have vanished */ > > vma = NULL; > > > > /* > > * Read mmu_invalidate_seq so that KVM can detect if the results of > > - * vma_lookup() or __kvm_faultin_pfn() become stale prior to > > - * acquiring kvm->mmu_lock. > > + * vma_lookup() or faultin_pfn() become stale prior to acquiring > > + * kvm->mmu_lock. > > * > > * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs > > * with the smp_wmb() in kvm_mmu_invalidate_end(). > > @@ -1616,8 +1645,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > mmu_seq = vcpu->kvm->mmu_invalidate_seq; > > mmap_read_unlock(current->mm); > > > > - pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0, > > - &writable, &page); > > + pfn = faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &page, is_private); > > if (pfn == KVM_PFN_ERR_HWPOISON) { > > kvm_send_hwpoison_signal(hva, vma_shift); > > return 0; > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 39fd6e35c723..415c6274aede 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gfn_t gfn) > > return gfn_to_memslot(kvm, gfn)->id; > > } > > > > +static inline bool memslot_is_readonly(const struct kvm_memory_slot *slot) > > +{ > > + return slot->flags & KVM_MEM_READONLY; > > +} > > + > > static inline gfn_t > > hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot) > > { > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 38f0f402ea46..3e40acb9f5c0 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -2624,11 +2624,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn) > > return size; > > } > > > > -static bool memslot_is_readonly(const struct kvm_memory_slot *slot) > > -{ > > - return slot->flags & KVM_MEM_READONLY; > > -} > > - > > static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *slot, gfn_t gfn, > > gfn_t *nr_pages, bool write) > > { > > -- > > 2.48.1.502.g6dc24dfdaf-goog > >