From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1ABC5C3ABBC for ; Mon, 12 May 2025 07:07:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B9086B00BB; Mon, 12 May 2025 03:07:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 367386B00BC; Mon, 12 May 2025 03:07:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 208946B00BD; Mon, 12 May 2025 03:07:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F38136B00BB for ; Mon, 12 May 2025 03:07:57 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6B83DE0C56 for ; Mon, 12 May 2025 07:07:58 +0000 (UTC) X-FDA: 83433376236.26.A8BDCD1 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf13.hostedemail.com (Postfix) with ESMTP id 917AF20002 for ; Mon, 12 May 2025 07:07:56 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=G8o1a9af; spf=pass (imf13.hostedemail.com: domain of tabba@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747033676; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GMe4Rb7KuUZZeR9pc39r+DDWYOFPqj4dlC9wEB9NSpc=; b=8iblMzDN8Bas8xO9bC6v1EpVdGsw3y+CXwlfS4aJlm+tHKoVwGQ7/DvjBY9c8CUwbMF9SV 3lctPQKXCYzrinlif8l8y9gJP2sAqiAqphthu2hDhxqBm4+D0IpOBVxJGvwhgFqQDZmYSF U6DmSZarRgbZe0J9wF+ky+LQ3aTeebs= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=G8o1a9af; spf=pass (imf13.hostedemail.com: domain of tabba@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747033676; a=rsa-sha256; cv=none; b=tFJm6CuKJKD2ziyUuacd2xbtmQhz5yYMCGB4GuWD2zjEisWnA0Jbj2I+E0eVEAN7NoqGUl 1kwKGOjSo7D0ema1lUcpkE8UcXBH7nuL21VJjvxC0MAT96UhB3rm3VYiBiJ4W1O/7wsz+j e9iyCYTmq+qE7WPUlidajuxC/rwHZR0= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-47666573242so533321cf.0 for ; Mon, 12 May 2025 00:07:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747033676; x=1747638476; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GMe4Rb7KuUZZeR9pc39r+DDWYOFPqj4dlC9wEB9NSpc=; b=G8o1a9afYFr9faHUTFbGIyIirllqy7C8fVeHM42OXYwZjyWBO9L0pvTbF1gFW4ZJv+ bfMX8mg4flcfz5eP7P4RS/M44X9MoBTKp7+75cVd0L0sakRxaRfTUH652WArZAJxFs04 2+CopGL5Y8Dt60HTAKrYMhexo5DeCywaBX65/cSG2UcH5KJXddlRTxNv/SxYxGB2J51q BxkCWm3AAX+5LkA5ni3+UIW8RXd0aQTAJY2s0MEF/xS/9QI99U0M8+YLdgs/vZ8FdQyT FiVQYE5CMCaHsdi/2EoMu6dUo6Ni4fuKGdLEi4hXzoLy+lN3RHhUFvzJnCasl3hiL88o BesQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747033676; x=1747638476; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GMe4Rb7KuUZZeR9pc39r+DDWYOFPqj4dlC9wEB9NSpc=; b=ecxlYj3McxhvUXgcJXdbNs988Q5Kf8hArYDX5Wet7z18P8sJNbtKzSaOE2nNTkfAHo LCyzdfzLtZNYtepQww63SkWDVk6AyKJVufjMGuKuUMZkJI+11r1ISG+K9lous+JbVVAU +wmMmsSpYVRj7S+6AX5CVABS0NAFUotqCFe0MBhv/bphDhgLb5wTVrtVysgVdMHsHp6O s9X1YXHBdo1yKP2XqQtWalX8W3ZWwDoiGe5i794xVxGr4zoaQ2vPk9xcxqElliPZhIxS uB7OQAlaVRv8Dt4e61AVq6gDsq57QS8KytM/88pkt9HRye/MZtcGscyHXFld/c5SZWMx ynrQ== X-Forwarded-Encrypted: i=1; AJvYcCWBNpQvfsTZoFsS3TuEYlEvfDMj0RVnjmZg/cYwLNSksRTepeX5SQ8/uBM0No5B+H9i7JAl0+eR7w==@kvack.org X-Gm-Message-State: AOJu0YwVVSoAESWuyC34vOaBwoeW15eal36yZx8YFcZUxZMXU/FWA67E ONe7xMG4saGFhC3903GfkSyz0fY7P4y73rzLb0pbhUYRKTAPMTQhlNkQ8XjzcH9UAfYUGBQ0axD Ayxx4LTd14oYNDmHtsttCDZG/OAVmq88gejDC X-Gm-Gg: ASbGncuowFCw60OCWVzr9CtWaMa7cyAoYGrDokZK56Iimq/LHlZBZ0bCIhEoR/No1lt qgifs94mcESr2yWzYQqt+yH0As0n5/ODDnGeBv3hrQ3EObEYFxUxzBSKN6xpaLxBfGrJlLquTPN +VwZytPROx4kbNtN0eHi/GFYH33ZGYO/VwqscrGSnaL6DS X-Google-Smtp-Source: AGHT+IElESA490+bF3OtjDhtRquySU8jcTkUQvKunDc+O2Bt/7EWHfiFGF6gTUQRhYsPAV4Smez8Y/7iGJWnCU9flPM= X-Received: by 2002:a05:622a:1:b0:47d:9e7:91a4 with SMTP id d75a77b69052e-49462f90c09mr7358751cf.27.1747033675379; Mon, 12 May 2025 00:07:55 -0700 (PDT) MIME-Version: 1.0 References: <20250430165655.605595-11-tabba@google.com> <20250509201529.3160064-1-jthoughton@google.com> In-Reply-To: <20250509201529.3160064-1-jthoughton@google.com> From: Fuad Tabba Date: Mon, 12 May 2025 08:07:18 +0100 X-Gm-Features: AX0GCFsk8lXLnVBUv0n9S7VCr9FBPXBdzxJtrN6DcXKUUBGZbkJrTZSD4COP3Rs Message-ID: Subject: Re: [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest page faults To: James Houghton Cc: ackerleytng@google.com, akpm@linux-foundation.org, amoorthy@google.com, anup@brainfault.org, aou@eecs.berkeley.edu, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@linux.intel.com, chenhuacai@kernel.org, david@redhat.com, dmatlack@google.com, fvdl@google.com, hch@infradead.org, hughd@google.com, isaku.yamahata@gmail.com, isaku.yamahata@intel.com, james.morse@arm.com, jarkko@kernel.org, jgg@nvidia.com, jhubbard@nvidia.com, keirf@google.com, kirill.shutemov@linux.intel.com, kvm@vger.kernel.org, liam.merwick@oracle.com, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, peterx@redhat.com, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, rientjes@google.com, roypat@amazon.co.uk, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, suzuki.poulose@arm.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ce37dug8ozw4brok31cc913m4bbkrw8d X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 917AF20002 X-HE-Tag: 1747033676-157366 X-HE-Meta: U2FsdGVkX1/5R+D22wyNULKoqn1MeF+Lr4DvfZ+/90fIyPIl2XpBbIXe0SBN0X0aFhITH3q+JwGWcNZ4GkhhVtqMQwkvNSldn0wZF1rDgCtwIQAwyX+5snkxnaqIEGTtNZDiniRmZg6NXOiQJeEW/2+YKJOuTogp0viAxhSQlUTz5KxKDKo5Y2q4087HT6euxaen+kQrQoWXroDLtfBY4P5W4gf65wvck5NvdjrRtiTgD+a+i8NIefx6nPYAG96Q0bGfZGpYBDdCaXGpvPVWZdDgfQYWT+HaP6CnMVs/uFzCPXHZtKqt3yl1eirCtTyE5+oQefeulH3DwVLuhdL8EV5EeceofWyNgYG91/P4xbOLEAZEgJCLhkNO9zt499FsvJ7tv7swLJt24Zdn0rnSt4CaFk3e8BZsY53CeyJStaIuG5U0nyiM64lZK3sDE/pMy5Y/3DlYRVAkX/HjjVsuANsPF4oAL0FdcXO4RB3k6ysUbC0wQOTKuadIFlgljHJ8p7bG3+Pd0+b2oU8sO5ZfM2s/CWrefeChr4InZseH/k0y1cbjSe5NmqFy5YiXkFn9kGAJ7QCgTGC48IE9maleOwDevHSwt1J89T6soGXtzR+XH5+B133ZE6oP8BsR9VE/VATDfl/Hay+3FjViUldBbHtQb85kdBB1TcIU1Ls0qeMYwOILgiB8N5KMCnI49ZCiUWiyfWkC322xGDuI4TWtlaUlSC1qB036jDoKCkNengIEb+u5rE0RY4U6bMJ6I64ARW2hwyJ+Ys3SLCnywiQk43aD1RDg+CzPMssoSFr1n3HXHUBxQHFJ+CNFxIAeHc0Cr9iG1Jer87oyvxYGvBIGk/TVbMMl05ChY35lCzCr+ZF1rracMXivtQ0AeBbFCUyNtiqgKeeY0DJmREVLu4mfN0RO0xQFaTHGd6DwCqhxbIoBe++UEq+2yOxXjNKhEglQLEnPiR85N3E4mRTLqaT i41M4AiX 20uSkCLqwi5t8vKvEMV8tGMJvEk/VKBK3o/Z5TmJCTBb07G8T3dcQ/mY2YtMLDIEzAtBJKFqheeBY9MgZn7L54dGLRH6gCPxcXCTpZBQCy3bS2bt2x664+JCvJVgGDrrzb8PknpsVgghKZ9+gCmmB+hHLgDoiL6UNHEu3G2b6GqGnt3Qo6gBrIAzD0pPMvoFoWUA2mMiI5sexXI+gnZIqEf5EGASmIye5DJs/2FzqfbqJAfO5RI19KrlOSwikKGN2GJ9n4rfzhnZy+5bzB7JjDOq3A5sHqSNIDPkPxquFgIHMBB2OaMSFHxkr0AO1JzNkbAFCkbidzB92OOahOzfcu0EPzg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi James, On Fri, 9 May 2025 at 21:15, James Houghton wrote: > > On Wed, Apr 30, 2025 at 9:57=E2=80=AFAM Fuad Tabba wro= te: > > > > Add arm64 support for handling guest page faults on guest_memfd > > backed memslots. > > > > For now, the fault granule is restricted to PAGE_SIZE. > > > > Signed-off-by: Fuad Tabba > > --- > > arch/arm64/kvm/mmu.c | 65 +++++++++++++++++++++++++++------------- > > include/linux/kvm_host.h | 5 ++++ > > virt/kvm/kvm_main.c | 5 ---- > > 3 files changed, 50 insertions(+), 25 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 148a97c129de..d1044c7f78bb 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_s= truct *vma) > > return vma->vm_flags & VM_MTE_ALLOWED; > > } > > > > +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *= slot, > > + gfn_t gfn, bool write_fault, bool *writabl= e, > > + struct page **page, bool is_gmem) > > +{ > > + kvm_pfn_t pfn; > > + int ret; > > + > > + if (!is_gmem) > > + return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_= WRITE : 0, writable, page); > > + > > + *writable =3D false; > > + > > + ret =3D kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL); > > + if (!ret) { > > + *writable =3D !memslot_is_readonly(slot); > > + return pfn; > > + } > > + > > + if (ret =3D=3D -EHWPOISON) > > + return KVM_PFN_ERR_HWPOISON; > > + > > + return KVM_PFN_ERR_NOSLOT_MASK; > > +} > > + > > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa= , > > struct kvm_s2_trans *nested, > > struct kvm_memory_slot *memslot, unsigned lon= g hva, > > @@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu= , phys_addr_t fault_ipa, > > { > > int ret =3D 0; > > bool write_fault, writable; > > - bool exec_fault, mte_allowed; > > + bool exec_fault, mte_allowed =3D false; > > bool device =3D false, vfio_allow_any_uc =3D false; > > unsigned long mmu_seq; > > phys_addr_t ipa =3D fault_ipa; > > struct kvm *kvm =3D vcpu->kvm; > > - struct vm_area_struct *vma; > > + struct vm_area_struct *vma =3D NULL; > > short vma_shift; > > void *memcache; > > - gfn_t gfn; > > + gfn_t gfn =3D ipa >> PAGE_SHIFT; > > kvm_pfn_t pfn; > > bool logging_active =3D memslot_is_logging(memslot); > > - bool force_pte =3D logging_active || is_protected_kvm_enabled()= ; > > - long vma_pagesize, fault_granule; > > + bool is_gmem =3D kvm_slot_has_gmem(memslot) && kvm_mem_from_gme= m(kvm, gfn); > > + bool force_pte =3D logging_active || is_gmem || is_protected_kv= m_enabled(); > > + long vma_pagesize, fault_granule =3D PAGE_SIZE; > > enum kvm_pgtable_prot prot =3D KVM_PGTABLE_PROT_R; > > struct kvm_pgtable *pgt; > > struct page *page; > > @@ -1522,16 +1547,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu= , phys_addr_t fault_ipa, > > return ret; > > } > > > > + mmap_read_lock(current->mm); > > We don't have to take the mmap_lock for gmem faults, right? > > I think we should reorganize user_mem_abort() a bit (and I think vma_page= size > and maybe vma_shift should be renamed) given the changes we're making her= e. Good point. > Below is a diff that I think might be a little cleaner. Let me know what = you > think. > > > + > > /* > > * Let's check if we will get back a huge page backed by hugetl= bfs, or > > * get block mapping for device MMIO region. > > */ > > - mmap_read_lock(current->mm); > > - vma =3D vma_lookup(current->mm, hva); > > - if (unlikely(!vma)) { > > - kvm_err("Failed to find VMA for hva 0x%lx\n", hva); > > - mmap_read_unlock(current->mm); > > - return -EFAULT; > > + if (!is_gmem) { > > + vma =3D vma_lookup(current->mm, hva); > > + if (unlikely(!vma)) { > > + kvm_err("Failed to find VMA for hva 0x%lx\n", h= va); > > + mmap_read_unlock(current->mm); > > + return -EFAULT; > > + } > > + > > + vfio_allow_any_uc =3D vma->vm_flags & VM_ALLOW_ANY_UNCA= CHED; > > + mte_allowed =3D kvm_vma_mte_allowed(vma); > > } > > > > if (force_pte) > > @@ -1602,18 +1633,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu= , phys_addr_t fault_ipa, > > ipa &=3D ~(vma_pagesize - 1); > > } > > > > - gfn =3D ipa >> PAGE_SHIFT; > > - mte_allowed =3D kvm_vma_mte_allowed(vma); > > - > > - vfio_allow_any_uc =3D vma->vm_flags & VM_ALLOW_ANY_UNCACHED; > > - > > /* Don't use the VMA after the unlock -- it may have vanished *= / > > vma =3D NULL; > > > > /* > > * Read mmu_invalidate_seq so that KVM can detect if the result= s of > > - * vma_lookup() or __kvm_faultin_pfn() become stale prior to > > - * acquiring kvm->mmu_lock. > > + * vma_lookup() or faultin_pfn() become stale prior to acquirin= g > > + * kvm->mmu_lock. > > * > > * Rely on mmap_read_unlock() for an implicit smp_rmb(), which = pairs > > * with the smp_wmb() in kvm_mmu_invalidate_end(). > > @@ -1621,8 +1647,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, = phys_addr_t fault_ipa, > > mmu_seq =3D vcpu->kvm->mmu_invalidate_seq; > > mmap_read_unlock(current->mm); > > > > - pfn =3D __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRIT= E : 0, > > - &writable, &page); > > + pfn =3D faultin_pfn(kvm, memslot, gfn, write_fault, &writable, = &page, is_gmem); > > if (pfn =3D=3D KVM_PFN_ERR_HWPOISON) { > > I think we need to take care to handle HWPOISON properly. I know that it = is > (or will most likely be) the case that GUP(hva) --> pfn, but with gmem, > it *might* not be the case. So the following line isn't right. > > I think we need to handle HWPOISON for gmem using memory fault exits inst= ead of > sending a SIGBUS to userspace. This would be consistent with how KVM/x86 > today handles getting a HWPOISON page back from kvm_gmem_get_pfn(). I'm n= ot > entirely sure how KVM/x86 is meant to handle HWPOISON on shared gmem page= s yet; > I need to keep reading your series. You're right. In the next respin (coming soon), Ackerley has added a patch that performs a best-effort check to ensure that hva matches the gfn. > The reorganization diff below leaves this unfixed. > > > kvm_send_hwpoison_signal(hva, vma_shift); > > return 0; > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index f3af6bff3232..1b2e4e9a7802 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gf= n_t gfn) > > return gfn_to_memslot(kvm, gfn)->id; > > } > > > > +static inline bool memslot_is_readonly(const struct kvm_memory_slot *s= lot) > > +{ > > + return slot->flags & KVM_MEM_READONLY; > > +} > > + > > static inline gfn_t > > hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot) > > { > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index c75d8e188eb7..d9bca5ba19dc 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu= *vcpu, gfn_t gfn) > > return size; > > } > > > > -static bool memslot_is_readonly(const struct kvm_memory_slot *slot) > > -{ > > - return slot->flags & KVM_MEM_READONLY; > > -} > > - > > static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *s= lot, gfn_t gfn, > > gfn_t *nr_pages, bool write) > > { > > -- > > 2.49.0.901.g37484f566f-goog > > Thanks, Fuad! Here's the reorganization/rename diff: Thank you James. This is very helpful. Cheers, /fuad > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index d1044c7f78bba..c9eb72fe9013b 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1502,7 +1502,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, > unsigned long mmu_seq; > phys_addr_t ipa =3D fault_ipa; > struct kvm *kvm =3D vcpu->kvm; > - struct vm_area_struct *vma =3D NULL; > short vma_shift; > void *memcache; > gfn_t gfn =3D ipa >> PAGE_SHIFT; > @@ -1510,7 +1509,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, > bool logging_active =3D memslot_is_logging(memslot); > bool is_gmem =3D kvm_slot_has_gmem(memslot) && kvm_mem_from_gmem(= kvm, gfn); > bool force_pte =3D logging_active || is_gmem || is_protected_kvm_= enabled(); > - long vma_pagesize, fault_granule =3D PAGE_SIZE; > + long target_size =3D PAGE_SIZE, fault_granule =3D PAGE_SIZE; > enum kvm_pgtable_prot prot =3D KVM_PGTABLE_PROT_R; > struct kvm_pgtable *pgt; > struct page *page; > @@ -1547,13 +1546,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, = phys_addr_t fault_ipa, > return ret; > } > > - mmap_read_lock(current->mm); > - > /* > * Let's check if we will get back a huge page backed by hugetlbf= s, or > * get block mapping for device MMIO region. > */ > if (!is_gmem) { > + struct vm_area_struct *vma =3D NULL; > + > + mmap_read_lock(current->mm); > + > vma =3D vma_lookup(current->mm, hva); > if (unlikely(!vma)) { > kvm_err("Failed to find VMA for hva 0x%lx\n", hva= ); > @@ -1563,38 +1564,45 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, = phys_addr_t fault_ipa, > > vfio_allow_any_uc =3D vma->vm_flags & VM_ALLOW_ANY_UNCACH= ED; > mte_allowed =3D kvm_vma_mte_allowed(vma); > - } > - > - if (force_pte) > - vma_shift =3D PAGE_SHIFT; > - else > - vma_shift =3D get_vma_page_shift(vma, hva); > + vma_shift =3D force_pte ? get_vma_page_shift(vma, hva) : = PAGE_SHIFT; > > - switch (vma_shift) { > + switch (vma_shift) { > #ifndef __PAGETABLE_PMD_FOLDED > - case PUD_SHIFT: > - if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_= SIZE)) > - break; > - fallthrough; > + case PUD_SHIFT: > + if (fault_supports_stage2_huge_mapping(memslot, h= va, PUD_SIZE)) > + break; > + fallthrough; > #endif > - case CONT_PMD_SHIFT: > - vma_shift =3D PMD_SHIFT; > - fallthrough; > - case PMD_SHIFT: > - if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_= SIZE)) > + case CONT_PMD_SHIFT: > + vma_shift =3D PMD_SHIFT; > + fallthrough; > + case PMD_SHIFT: > + if (fault_supports_stage2_huge_mapping(memslot, h= va, PMD_SIZE)) > + break; > + fallthrough; > + case CONT_PTE_SHIFT: > + vma_shift =3D PAGE_SHIFT; > + force_pte =3D true; > + fallthrough; > + case PAGE_SHIFT: > break; > - fallthrough; > - case CONT_PTE_SHIFT: > - vma_shift =3D PAGE_SHIFT; > - force_pte =3D true; > - fallthrough; > - case PAGE_SHIFT: > - break; > - default: > - WARN_ONCE(1, "Unknown vma_shift %d", vma_shift); > - } > + default: > + WARN_ONCE(1, "Unknown vma_shift %d", vma_shift); > + } > > - vma_pagesize =3D 1UL << vma_shift; > + /* > + * Read mmu_invalidate_seq so that KVM can detect if the = results of > + * vma_lookup() or faultin_pfn() become stale prior to ac= quiring > + * kvm->mmu_lock. > + * > + * Rely on mmap_read_unlock() for an implicit smp_rmb(), = which pairs > + * with the smp_wmb() in kvm_mmu_invalidate_end(). > + */ > + mmu_seq =3D vcpu->kvm->mmu_invalidate_seq; > + mmap_read_unlock(current->mm); > + > + target_size =3D 1UL << vma_shift; > + } > > if (nested) { > unsigned long max_map_size; > @@ -1620,7 +1628,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, > max_map_size =3D PAGE_SIZE; > > force_pte =3D (max_map_size =3D=3D PAGE_SIZE); > - vma_pagesize =3D min(vma_pagesize, (long)max_map_size); > + target_size =3D min(target_size, (long)max_map_size); > } > > /* > @@ -1628,27 +1636,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, = phys_addr_t fault_ipa, > * ensure we find the right PFN and lay down the mapping in the r= ight > * place. > */ > - if (vma_pagesize =3D=3D PMD_SIZE || vma_pagesize =3D=3D PUD_SIZE)= { > - fault_ipa &=3D ~(vma_pagesize - 1); > - ipa &=3D ~(vma_pagesize - 1); > + if (target_size =3D=3D PMD_SIZE || target_size =3D=3D PUD_SIZE) { > + fault_ipa &=3D ~(target_size - 1); > + ipa &=3D ~(target_size - 1); > } > > - /* Don't use the VMA after the unlock -- it may have vanished */ > - vma =3D NULL; > - > - /* > - * Read mmu_invalidate_seq so that KVM can detect if the results = of > - * vma_lookup() or faultin_pfn() become stale prior to acquiring > - * kvm->mmu_lock. > - * > - * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pa= irs > - * with the smp_wmb() in kvm_mmu_invalidate_end(). > - */ > - mmu_seq =3D vcpu->kvm->mmu_invalidate_seq; > - mmap_read_unlock(current->mm); > - > pfn =3D faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &p= age, is_gmem); > if (pfn =3D=3D KVM_PFN_ERR_HWPOISON) { > + // TODO: Handle gmem properly. vma_shift > + // intentionally left uninitialized. > kvm_send_hwpoison_signal(hva, vma_shift); > return 0; > } > @@ -1658,9 +1654,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, > if (kvm_is_device_pfn(pfn)) { > /* > * If the page was identified as device early by looking = at > - * the VMA flags, vma_pagesize is already representing th= e > + * the VMA flags, target_size is already representing the > * largest quantity we can map. If instead it was mapped > - * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_S= IZE > + * via __kvm_faultin_pfn(), target_size is set to PAGE_SI= ZE > * and must not be upgraded. > * > * In both cases, we don't let transparent_hugepage_adjus= t() > @@ -1699,7 +1695,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, > > kvm_fault_lock(kvm); > pgt =3D vcpu->arch.hw_mmu->pgt; > - if (mmu_invalidate_retry(kvm, mmu_seq)) { > + if (!is_gmem && mmu_invalidate_retry(kvm, mmu_seq)) { > ret =3D -EAGAIN; > goto out_unlock; > } > @@ -1708,16 +1704,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, = phys_addr_t fault_ipa, > * If we are not forced to use page mapping, check if we are > * backed by a THP and thus use block mapping if possible. > */ > - if (vma_pagesize =3D=3D PAGE_SIZE && !(force_pte || device)) { > + if (target_size =3D=3D PAGE_SIZE && !(force_pte || device)) { > if (fault_is_perm && fault_granule > PAGE_SIZE) > - vma_pagesize =3D fault_granule; > - else > - vma_pagesize =3D transparent_hugepage_adjust(kvm,= memslot, > + target_size =3D fault_granule; > + else if (!is_gmem) > + target_size =3D transparent_hugepage_adjust(kvm, = memslot, > hva, &= pfn, > &fault= _ipa); > > - if (vma_pagesize < 0) { > - ret =3D vma_pagesize; > + if (target_size < 0) { > + ret =3D target_size; > goto out_unlock; > } > } > @@ -1725,7 +1721,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, > if (!fault_is_perm && !device && kvm_has_mte(kvm)) { > /* Check the VMM hasn't introduced a new disallowed VMA *= / > if (mte_allowed) { > - sanitise_mte_tags(kvm, pfn, vma_pagesize); > + sanitise_mte_tags(kvm, pfn, target_size); > } else { > ret =3D -EFAULT; > goto out_unlock; > @@ -1750,10 +1746,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, = phys_addr_t fault_ipa, > > /* > * Under the premise of getting a FSC_PERM fault, we just need to= relax > - * permissions only if vma_pagesize equals fault_granule. Otherwi= se, > + * permissions only if target_size equals fault_granule. Otherwis= e, > * kvm_pgtable_stage2_map() should be called to change block size= . > */ > - if (fault_is_perm && vma_pagesize =3D=3D fault_granule) { > + if (fault_is_perm && target_size =3D=3D fault_granule) { > /* > * Drop the SW bits in favour of those stored in the > * PTE, which will be preserved. > @@ -1761,7 +1757,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph= ys_addr_t fault_ipa, > prot &=3D ~KVM_NV_GUEST_MAP_SZ; > ret =3D KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, f= ault_ipa, prot, flags); > } else { > - ret =3D KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa= , vma_pagesize, > + ret =3D KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa= , target_size, > __pfn_to_phys(pfn), prot, > memcache, flags); > }