From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1ABC5C3ABBC
	for <linux-mm@archiver.kernel.org>; Mon, 12 May 2025 07:07:59 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3B9086B00BB; Mon, 12 May 2025 03:07:58 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 367386B00BC; Mon, 12 May 2025 03:07:58 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 208946B00BD; Mon, 12 May 2025 03:07:58 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id F38136B00BB
	for <linux-mm@kvack.org>; Mon, 12 May 2025 03:07:57 -0400 (EDT)
Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 6B83DE0C56
	for <linux-mm@kvack.org>; Mon, 12 May 2025 07:07:58 +0000 (UTC)
X-FDA: 83433376236.26.A8BDCD1
Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180])
	by imf13.hostedemail.com (Postfix) with ESMTP id 917AF20002
	for <linux-mm@kvack.org>; Mon, 12 May 2025 07:07:56 +0000 (UTC)
Authentication-Results: imf13.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=G8o1a9af;
	spf=pass (imf13.hostedemail.com: domain of tabba@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=tabba@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1747033676;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=GMe4Rb7KuUZZeR9pc39r+DDWYOFPqj4dlC9wEB9NSpc=;
	b=8iblMzDN8Bas8xO9bC6v1EpVdGsw3y+CXwlfS4aJlm+tHKoVwGQ7/DvjBY9c8CUwbMF9SV
	3lctPQKXCYzrinlif8l8y9gJP2sAqiAqphthu2hDhxqBm4+D0IpOBVxJGvwhgFqQDZmYSF
	U6DmSZarRgbZe0J9wF+ky+LQ3aTeebs=
ARC-Authentication-Results: i=1;
	imf13.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=G8o1a9af;
	spf=pass (imf13.hostedemail.com: domain of tabba@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=tabba@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747033676; a=rsa-sha256;
	cv=none;
	b=tFJm6CuKJKD2ziyUuacd2xbtmQhz5yYMCGB4GuWD2zjEisWnA0Jbj2I+E0eVEAN7NoqGUl
	1kwKGOjSo7D0ema1lUcpkE8UcXBH7nuL21VJjvxC0MAT96UhB3rm3VYiBiJ4W1O/7wsz+j
	e9iyCYTmq+qE7WPUlidajuxC/rwHZR0=
Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-47666573242so533321cf.0
        for <linux-mm@kvack.org>; Mon, 12 May 2025 00:07:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747033676; x=1747638476; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=GMe4Rb7KuUZZeR9pc39r+DDWYOFPqj4dlC9wEB9NSpc=;
        b=G8o1a9afYFr9faHUTFbGIyIirllqy7C8fVeHM42OXYwZjyWBO9L0pvTbF1gFW4ZJv+
         bfMX8mg4flcfz5eP7P4RS/M44X9MoBTKp7+75cVd0L0sakRxaRfTUH652WArZAJxFs04
         2+CopGL5Y8Dt60HTAKrYMhexo5DeCywaBX65/cSG2UcH5KJXddlRTxNv/SxYxGB2J51q
         BxkCWm3AAX+5LkA5ni3+UIW8RXd0aQTAJY2s0MEF/xS/9QI99U0M8+YLdgs/vZ8FdQyT
         FiVQYE5CMCaHsdi/2EoMu6dUo6Ni4fuKGdLEi4hXzoLy+lN3RHhUFvzJnCasl3hiL88o
         BesQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747033676; x=1747638476;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=GMe4Rb7KuUZZeR9pc39r+DDWYOFPqj4dlC9wEB9NSpc=;
        b=ecxlYj3McxhvUXgcJXdbNs988Q5Kf8hArYDX5Wet7z18P8sJNbtKzSaOE2nNTkfAHo
         LCyzdfzLtZNYtepQww63SkWDVk6AyKJVufjMGuKuUMZkJI+11r1ISG+K9lous+JbVVAU
         +wmMmsSpYVRj7S+6AX5CVABS0NAFUotqCFe0MBhv/bphDhgLb5wTVrtVysgVdMHsHp6O
         s9X1YXHBdo1yKP2XqQtWalX8W3ZWwDoiGe5i794xVxGr4zoaQ2vPk9xcxqElliPZhIxS
         uB7OQAlaVRv8Dt4e61AVq6gDsq57QS8KytM/88pkt9HRye/MZtcGscyHXFld/c5SZWMx
         ynrQ==
X-Forwarded-Encrypted: i=1; AJvYcCWBNpQvfsTZoFsS3TuEYlEvfDMj0RVnjmZg/cYwLNSksRTepeX5SQ8/uBM0No5B+H9i7JAl0+eR7w==@kvack.org
X-Gm-Message-State: AOJu0YwVVSoAESWuyC34vOaBwoeW15eal36yZx8YFcZUxZMXU/FWA67E
	ONe7xMG4saGFhC3903GfkSyz0fY7P4y73rzLb0pbhUYRKTAPMTQhlNkQ8XjzcH9UAfYUGBQ0axD
	Ayxx4LTd14oYNDmHtsttCDZG/OAVmq88gejDC
X-Gm-Gg: ASbGncuowFCw60OCWVzr9CtWaMa7cyAoYGrDokZK56Iimq/LHlZBZ0bCIhEoR/No1lt
	qgifs94mcESr2yWzYQqt+yH0As0n5/ODDnGeBv3hrQ3EObEYFxUxzBSKN6xpaLxBfGrJlLquTPN
	+VwZytPROx4kbNtN0eHi/GFYH33ZGYO/VwqscrGSnaL6DS
X-Google-Smtp-Source: AGHT+IElESA490+bF3OtjDhtRquySU8jcTkUQvKunDc+O2Bt/7EWHfiFGF6gTUQRhYsPAV4Smez8Y/7iGJWnCU9flPM=
X-Received: by 2002:a05:622a:1:b0:47d:9e7:91a4 with SMTP id
 d75a77b69052e-49462f90c09mr7358751cf.27.1747033675379; Mon, 12 May 2025
 00:07:55 -0700 (PDT)
MIME-Version: 1.0
References: <20250430165655.605595-11-tabba@google.com> <20250509201529.3160064-1-jthoughton@google.com>
In-Reply-To: <20250509201529.3160064-1-jthoughton@google.com>
From: Fuad Tabba <tabba@google.com>
Date: Mon, 12 May 2025 08:07:18 +0100
X-Gm-Features: AX0GCFsk8lXLnVBUv0n9S7VCr9FBPXBdzxJtrN6DcXKUUBGZbkJrTZSD4COP3Rs
Message-ID: <CA+EHjTy5yBa1J4d0X5Rb=fOc_nQWwEaWyqPOxpNo1kD+Z=oi1w@mail.gmail.com>
Subject: Re: [PATCH v8 10/13] KVM: arm64: Handle guest_memfd()-backed guest
 page faults
To: James Houghton <jthoughton@google.com>
Cc: ackerleytng@google.com, akpm@linux-foundation.org, amoorthy@google.com, 
	anup@brainfault.org, aou@eecs.berkeley.edu, brauner@kernel.org, 
	catalin.marinas@arm.com, chao.p.peng@linux.intel.com, chenhuacai@kernel.org, 
	david@redhat.com, dmatlack@google.com, fvdl@google.com, hch@infradead.org, 
	hughd@google.com, isaku.yamahata@gmail.com, isaku.yamahata@intel.com, 
	james.morse@arm.com, jarkko@kernel.org, jgg@nvidia.com, jhubbard@nvidia.com, 
	keirf@google.com, kirill.shutemov@linux.intel.com, kvm@vger.kernel.org, 
	liam.merwick@oracle.com, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, 
	mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, 
	michael.roth@amd.com, mpe@ellerman.id.au, oliver.upton@linux.dev, 
	palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, 
	pbonzini@redhat.com, peterx@redhat.com, qperret@google.com, 
	quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, 
	quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, 
	quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, rientjes@google.com, 
	roypat@amazon.co.uk, seanjc@google.com, shuah@kernel.org, 
	steven.price@arm.com, suzuki.poulose@arm.com, vannapurve@google.com, 
	vbabka@suse.cz, viro@zeniv.linux.org.uk, wei.w.wang@intel.com, 
	will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, 
	yilun.xu@intel.com, yuzenghui@huawei.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Stat-Signature: ce37dug8ozw4brok31cc913m4bbkrw8d
X-Rspam-User: 
X-Rspamd-Server: rspam03
X-Rspamd-Queue-Id: 917AF20002
X-HE-Tag: 1747033676-157366
X-HE-Meta: U2FsdGVkX1/5R+D22wyNULKoqn1MeF+Lr4DvfZ+/90fIyPIl2XpBbIXe0SBN0X0aFhITH3q+JwGWcNZ4GkhhVtqMQwkvNSldn0wZF1rDgCtwIQAwyX+5snkxnaqIEGTtNZDiniRmZg6NXOiQJeEW/2+YKJOuTogp0viAxhSQlUTz5KxKDKo5Y2q4087HT6euxaen+kQrQoWXroDLtfBY4P5W4gf65wvck5NvdjrRtiTgD+a+i8NIefx6nPYAG96Q0bGfZGpYBDdCaXGpvPVWZdDgfQYWT+HaP6CnMVs/uFzCPXHZtKqt3yl1eirCtTyE5+oQefeulH3DwVLuhdL8EV5EeceofWyNgYG91/P4xbOLEAZEgJCLhkNO9zt499FsvJ7tv7swLJt24Zdn0rnSt4CaFk3e8BZsY53CeyJStaIuG5U0nyiM64lZK3sDE/pMy5Y/3DlYRVAkX/HjjVsuANsPF4oAL0FdcXO4RB3k6ysUbC0wQOTKuadIFlgljHJ8p7bG3+Pd0+b2oU8sO5ZfM2s/CWrefeChr4InZseH/k0y1cbjSe5NmqFy5YiXkFn9kGAJ7QCgTGC48IE9maleOwDevHSwt1J89T6soGXtzR+XH5+B133ZE6oP8BsR9VE/VATDfl/Hay+3FjViUldBbHtQb85kdBB1TcIU1Ls0qeMYwOILgiB8N5KMCnI49ZCiUWiyfWkC322xGDuI4TWtlaUlSC1qB036jDoKCkNengIEb+u5rE0RY4U6bMJ6I64ARW2hwyJ+Ys3SLCnywiQk43aD1RDg+CzPMssoSFr1n3HXHUBxQHFJ+CNFxIAeHc0Cr9iG1Jer87oyvxYGvBIGk/TVbMMl05ChY35lCzCr+ZF1rracMXivtQ0AeBbFCUyNtiqgKeeY0DJmREVLu4mfN0RO0xQFaTHGd6DwCqhxbIoBe++UEq+2yOxXjNKhEglQLEnPiR85N3E4mRTLqaT
 i41M4AiX
 20uSkCLqwi5t8vKvEMV8tGMJvEk/VKBK3o/Z5TmJCTBb07G8T3dcQ/mY2YtMLDIEzAtBJKFqheeBY9MgZn7L54dGLRH6gCPxcXCTpZBQCy3bS2bt2x664+JCvJVgGDrrzb8PknpsVgghKZ9+gCmmB+hHLgDoiL6UNHEu3G2b6GqGnt3Qo6gBrIAzD0pPMvoFoWUA2mMiI5sexXI+gnZIqEf5EGASmIye5DJs/2FzqfbqJAfO5RI19KrlOSwikKGN2GJ9n4rfzhnZy+5bzB7JjDOq3A5sHqSNIDPkPxquFgIHMBB2OaMSFHxkr0AO1JzNkbAFCkbidzB92OOahOzfcu0EPzg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Hi James,


On Fri, 9 May 2025 at 21:15, James Houghton <jthoughton@google.com> wrote:
>
> On Wed, Apr 30, 2025 at 9:57=E2=80=AFAM Fuad Tabba <tabba@google.com> wro=
te:
> >
> > Add arm64 support for handling guest page faults on guest_memfd
> > backed memslots.
> >
> > For now, the fault granule is restricted to PAGE_SIZE.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/kvm/mmu.c     | 65 +++++++++++++++++++++++++++-------------
> >  include/linux/kvm_host.h |  5 ++++
> >  virt/kvm/kvm_main.c      |  5 ----
> >  3 files changed, 50 insertions(+), 25 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 148a97c129de..d1044c7f78bb 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1466,6 +1466,30 @@ static bool kvm_vma_mte_allowed(struct vm_area_s=
truct *vma)
> >         return vma->vm_flags & VM_MTE_ALLOWED;
> >  }
> >
> > +static kvm_pfn_t faultin_pfn(struct kvm *kvm, struct kvm_memory_slot *=
slot,
> > +                            gfn_t gfn, bool write_fault, bool *writabl=
e,
> > +                            struct page **page, bool is_gmem)
> > +{
> > +       kvm_pfn_t pfn;
> > +       int ret;
> > +
> > +       if (!is_gmem)
> > +               return __kvm_faultin_pfn(slot, gfn, write_fault ? FOLL_=
WRITE : 0, writable, page);
> > +
> > +       *writable =3D false;
> > +
> > +       ret =3D kvm_gmem_get_pfn(kvm, slot, gfn, &pfn, page, NULL);
> > +       if (!ret) {
> > +               *writable =3D !memslot_is_readonly(slot);
> > +               return pfn;
> > +       }
> > +
> > +       if (ret =3D=3D -EHWPOISON)
> > +               return KVM_PFN_ERR_HWPOISON;
> > +
> > +       return KVM_PFN_ERR_NOSLOT_MASK;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa=
,
> >                           struct kvm_s2_trans *nested,
> >                           struct kvm_memory_slot *memslot, unsigned lon=
g hva,
> > @@ -1473,19 +1497,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu=
, phys_addr_t fault_ipa,
> >  {
> >         int ret =3D 0;
> >         bool write_fault, writable;
> > -       bool exec_fault, mte_allowed;
> > +       bool exec_fault, mte_allowed =3D false;
> >         bool device =3D false, vfio_allow_any_uc =3D false;
> >         unsigned long mmu_seq;
> >         phys_addr_t ipa =3D fault_ipa;
> >         struct kvm *kvm =3D vcpu->kvm;
> > -       struct vm_area_struct *vma;
> > +       struct vm_area_struct *vma =3D NULL;
> >         short vma_shift;
> >         void *memcache;
> > -       gfn_t gfn;
> > +       gfn_t gfn =3D ipa >> PAGE_SHIFT;
> >         kvm_pfn_t pfn;
> >         bool logging_active =3D memslot_is_logging(memslot);
> > -       bool force_pte =3D logging_active || is_protected_kvm_enabled()=
;
> > -       long vma_pagesize, fault_granule;
> > +       bool is_gmem =3D kvm_slot_has_gmem(memslot) && kvm_mem_from_gme=
m(kvm, gfn);
> > +       bool force_pte =3D logging_active || is_gmem || is_protected_kv=
m_enabled();
> > +       long vma_pagesize, fault_granule =3D PAGE_SIZE;
> >         enum kvm_pgtable_prot prot =3D KVM_PGTABLE_PROT_R;
> >         struct kvm_pgtable *pgt;
> >         struct page *page;
> > @@ -1522,16 +1547,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu=
, phys_addr_t fault_ipa,
> >                         return ret;
> >         }
> >
> > +       mmap_read_lock(current->mm);
>
> We don't have to take the mmap_lock for gmem faults, right?
>
> I think we should reorganize user_mem_abort() a bit (and I think vma_page=
size
> and maybe vma_shift should be renamed) given the changes we're making her=
e.

Good point.

> Below is a diff that I think might be a little cleaner. Let me know what =
you
> think.
>
> > +
> >         /*
> >          * Let's check if we will get back a huge page backed by hugetl=
bfs, or
> >          * get block mapping for device MMIO region.
> >          */
> > -       mmap_read_lock(current->mm);
> > -       vma =3D vma_lookup(current->mm, hva);
> > -       if (unlikely(!vma)) {
> > -               kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
> > -               mmap_read_unlock(current->mm);
> > -               return -EFAULT;
> > +       if (!is_gmem) {
> > +               vma =3D vma_lookup(current->mm, hva);
> > +               if (unlikely(!vma)) {
> > +                       kvm_err("Failed to find VMA for hva 0x%lx\n", h=
va);
> > +                       mmap_read_unlock(current->mm);
> > +                       return -EFAULT;
> > +               }
> > +
> > +               vfio_allow_any_uc =3D vma->vm_flags & VM_ALLOW_ANY_UNCA=
CHED;
> > +               mte_allowed =3D kvm_vma_mte_allowed(vma);
> >         }
> >
> >         if (force_pte)
> > @@ -1602,18 +1633,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu=
, phys_addr_t fault_ipa,
> >                 ipa &=3D ~(vma_pagesize - 1);
> >         }
> >
> > -       gfn =3D ipa >> PAGE_SHIFT;
> > -       mte_allowed =3D kvm_vma_mte_allowed(vma);
> > -
> > -       vfio_allow_any_uc =3D vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
> > -
> >         /* Don't use the VMA after the unlock -- it may have vanished *=
/
> >         vma =3D NULL;
> >
> >         /*
> >          * Read mmu_invalidate_seq so that KVM can detect if the result=
s of
> > -        * vma_lookup() or __kvm_faultin_pfn() become stale prior to
> > -        * acquiring kvm->mmu_lock.
> > +        * vma_lookup() or faultin_pfn() become stale prior to acquirin=
g
> > +        * kvm->mmu_lock.
> >          *
> >          * Rely on mmap_read_unlock() for an implicit smp_rmb(), which =
pairs
> >          * with the smp_wmb() in kvm_mmu_invalidate_end().
> > @@ -1621,8 +1647,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, =
phys_addr_t fault_ipa,
> >         mmu_seq =3D vcpu->kvm->mmu_invalidate_seq;
> >         mmap_read_unlock(current->mm);
> >
> > -       pfn =3D __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRIT=
E : 0,
> > -                               &writable, &page);
> > +       pfn =3D faultin_pfn(kvm, memslot, gfn, write_fault, &writable, =
&page, is_gmem);
> >         if (pfn =3D=3D KVM_PFN_ERR_HWPOISON) {
>
> I think we need to take care to handle HWPOISON properly. I know that it =
is
> (or will most likely be) the case that GUP(hva) --> pfn, but with gmem,
> it *might* not be the case. So the following line isn't right.
>
> I think we need to handle HWPOISON for gmem using memory fault exits inst=
ead of
> sending a SIGBUS to userspace. This would be consistent with how KVM/x86
> today handles getting a HWPOISON page back from kvm_gmem_get_pfn(). I'm n=
ot
> entirely sure how KVM/x86 is meant to handle HWPOISON on shared gmem page=
s yet;
> I need to keep reading your series.

You're right. In the next respin (coming soon), Ackerley has added a
patch that performs a best-effort check to ensure that hva matches the
gfn.

> The reorganization diff below leaves this unfixed.
>
> >                 kvm_send_hwpoison_signal(hva, vma_shift);
> >                 return 0;
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f3af6bff3232..1b2e4e9a7802 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1882,6 +1882,11 @@ static inline int memslot_id(struct kvm *kvm, gf=
n_t gfn)
> >         return gfn_to_memslot(kvm, gfn)->id;
> >  }
> >
> > +static inline bool memslot_is_readonly(const struct kvm_memory_slot *s=
lot)
> > +{
> > +       return slot->flags & KVM_MEM_READONLY;
> > +}
> > +
> >  static inline gfn_t
> >  hva_to_gfn_memslot(unsigned long hva, struct kvm_memory_slot *slot)
> >  {
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index c75d8e188eb7..d9bca5ba19dc 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2640,11 +2640,6 @@ unsigned long kvm_host_page_size(struct kvm_vcpu=
 *vcpu, gfn_t gfn)
> >         return size;
> >  }
> >
> > -static bool memslot_is_readonly(const struct kvm_memory_slot *slot)
> > -{
> > -       return slot->flags & KVM_MEM_READONLY;
> > -}
> > -
> >  static unsigned long __gfn_to_hva_many(const struct kvm_memory_slot *s=
lot, gfn_t gfn,
> >                                        gfn_t *nr_pages, bool write)
> >  {
> > --
> > 2.49.0.901.g37484f566f-goog
>
> Thanks, Fuad! Here's the reorganization/rename diff:

Thank you James. This is very helpful.

Cheers,
/fuad

>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d1044c7f78bba..c9eb72fe9013b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1502,7 +1502,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph=
ys_addr_t fault_ipa,
>         unsigned long mmu_seq;
>         phys_addr_t ipa =3D fault_ipa;
>         struct kvm *kvm =3D vcpu->kvm;
> -       struct vm_area_struct *vma =3D NULL;
>         short vma_shift;
>         void *memcache;
>         gfn_t gfn =3D ipa >> PAGE_SHIFT;
> @@ -1510,7 +1509,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph=
ys_addr_t fault_ipa,
>         bool logging_active =3D memslot_is_logging(memslot);
>         bool is_gmem =3D kvm_slot_has_gmem(memslot) && kvm_mem_from_gmem(=
kvm, gfn);
>         bool force_pte =3D logging_active || is_gmem || is_protected_kvm_=
enabled();
> -       long vma_pagesize, fault_granule =3D PAGE_SIZE;
> +       long target_size =3D PAGE_SIZE, fault_granule =3D PAGE_SIZE;
>         enum kvm_pgtable_prot prot =3D KVM_PGTABLE_PROT_R;
>         struct kvm_pgtable *pgt;
>         struct page *page;
> @@ -1547,13 +1546,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, =
phys_addr_t fault_ipa,
>                         return ret;
>         }
>
> -       mmap_read_lock(current->mm);
> -
>         /*
>          * Let's check if we will get back a huge page backed by hugetlbf=
s, or
>          * get block mapping for device MMIO region.
>          */
>         if (!is_gmem) {
> +               struct vm_area_struct *vma =3D NULL;
> +
> +               mmap_read_lock(current->mm);
> +
>                 vma =3D vma_lookup(current->mm, hva);
>                 if (unlikely(!vma)) {
>                         kvm_err("Failed to find VMA for hva 0x%lx\n", hva=
);
> @@ -1563,38 +1564,45 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, =
phys_addr_t fault_ipa,
>
>                 vfio_allow_any_uc =3D vma->vm_flags & VM_ALLOW_ANY_UNCACH=
ED;
>                 mte_allowed =3D kvm_vma_mte_allowed(vma);
> -       }
> -
> -       if (force_pte)
> -               vma_shift =3D PAGE_SHIFT;
> -       else
> -               vma_shift =3D get_vma_page_shift(vma, hva);
> +               vma_shift =3D force_pte ? get_vma_page_shift(vma, hva) : =
PAGE_SHIFT;
>
> -       switch (vma_shift) {
> +               switch (vma_shift) {
>  #ifndef __PAGETABLE_PMD_FOLDED
> -       case PUD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_=
SIZE))
> -                       break;
> -               fallthrough;
> +               case PUD_SHIFT:
> +                       if (fault_supports_stage2_huge_mapping(memslot, h=
va, PUD_SIZE))
> +                               break;
> +                       fallthrough;
>  #endif
> -       case CONT_PMD_SHIFT:
> -               vma_shift =3D PMD_SHIFT;
> -               fallthrough;
> -       case PMD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_=
SIZE))
> +               case CONT_PMD_SHIFT:
> +                       vma_shift =3D PMD_SHIFT;
> +                       fallthrough;
> +               case PMD_SHIFT:
> +                       if (fault_supports_stage2_huge_mapping(memslot, h=
va, PMD_SIZE))
> +                               break;
> +                       fallthrough;
> +               case CONT_PTE_SHIFT:
> +                       vma_shift =3D PAGE_SHIFT;
> +                       force_pte =3D true;
> +                       fallthrough;
> +               case PAGE_SHIFT:
>                         break;
> -               fallthrough;
> -       case CONT_PTE_SHIFT:
> -               vma_shift =3D PAGE_SHIFT;
> -               force_pte =3D true;
> -               fallthrough;
> -       case PAGE_SHIFT:
> -               break;
> -       default:
> -               WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
> -       }
> +               default:
> +                       WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
> +               }
>
> -       vma_pagesize =3D 1UL << vma_shift;
> +               /*
> +                * Read mmu_invalidate_seq so that KVM can detect if the =
results of
> +                * vma_lookup() or faultin_pfn() become stale prior to ac=
quiring
> +                * kvm->mmu_lock.
> +                *
> +                * Rely on mmap_read_unlock() for an implicit smp_rmb(), =
which pairs
> +                * with the smp_wmb() in kvm_mmu_invalidate_end().
> +                */
> +               mmu_seq =3D vcpu->kvm->mmu_invalidate_seq;
> +               mmap_read_unlock(current->mm);
> +
> +               target_size =3D 1UL << vma_shift;
> +       }
>
>         if (nested) {
>                 unsigned long max_map_size;
> @@ -1620,7 +1628,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph=
ys_addr_t fault_ipa,
>                         max_map_size =3D PAGE_SIZE;
>
>                 force_pte =3D (max_map_size =3D=3D PAGE_SIZE);
> -               vma_pagesize =3D min(vma_pagesize, (long)max_map_size);
> +               target_size =3D min(target_size, (long)max_map_size);
>         }
>
>         /*
> @@ -1628,27 +1636,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, =
phys_addr_t fault_ipa,
>          * ensure we find the right PFN and lay down the mapping in the r=
ight
>          * place.
>          */
> -       if (vma_pagesize =3D=3D PMD_SIZE || vma_pagesize =3D=3D PUD_SIZE)=
 {
> -               fault_ipa &=3D ~(vma_pagesize - 1);
> -               ipa &=3D ~(vma_pagesize - 1);
> +       if (target_size =3D=3D PMD_SIZE || target_size =3D=3D PUD_SIZE) {
> +               fault_ipa &=3D ~(target_size - 1);
> +               ipa &=3D ~(target_size - 1);
>         }
>
> -       /* Don't use the VMA after the unlock -- it may have vanished */
> -       vma =3D NULL;
> -
> -       /*
> -        * Read mmu_invalidate_seq so that KVM can detect if the results =
of
> -        * vma_lookup() or faultin_pfn() become stale prior to acquiring
> -        * kvm->mmu_lock.
> -        *
> -        * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pa=
irs
> -        * with the smp_wmb() in kvm_mmu_invalidate_end().
> -        */
> -       mmu_seq =3D vcpu->kvm->mmu_invalidate_seq;
> -       mmap_read_unlock(current->mm);
> -
>         pfn =3D faultin_pfn(kvm, memslot, gfn, write_fault, &writable, &p=
age, is_gmem);
>         if (pfn =3D=3D KVM_PFN_ERR_HWPOISON) {
> +               // TODO: Handle gmem properly. vma_shift
> +               // intentionally left uninitialized.
>                 kvm_send_hwpoison_signal(hva, vma_shift);
>                 return 0;
>         }
> @@ -1658,9 +1654,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph=
ys_addr_t fault_ipa,
>         if (kvm_is_device_pfn(pfn)) {
>                 /*
>                  * If the page was identified as device early by looking =
at
> -                * the VMA flags, vma_pagesize is already representing th=
e
> +                * the VMA flags, target_size is already representing the
>                  * largest quantity we can map.  If instead it was mapped
> -                * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_S=
IZE
> +                * via __kvm_faultin_pfn(), target_size is set to PAGE_SI=
ZE
>                  * and must not be upgraded.
>                  *
>                  * In both cases, we don't let transparent_hugepage_adjus=
t()
> @@ -1699,7 +1695,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph=
ys_addr_t fault_ipa,
>
>         kvm_fault_lock(kvm);
>         pgt =3D vcpu->arch.hw_mmu->pgt;
> -       if (mmu_invalidate_retry(kvm, mmu_seq)) {
> +       if (!is_gmem && mmu_invalidate_retry(kvm, mmu_seq)) {
>                 ret =3D -EAGAIN;
>                 goto out_unlock;
>         }
> @@ -1708,16 +1704,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, =
phys_addr_t fault_ipa,
>          * If we are not forced to use page mapping, check if we are
>          * backed by a THP and thus use block mapping if possible.
>          */
> -       if (vma_pagesize =3D=3D PAGE_SIZE && !(force_pte || device)) {
> +       if (target_size =3D=3D PAGE_SIZE && !(force_pte || device)) {
>                 if (fault_is_perm && fault_granule > PAGE_SIZE)
> -                       vma_pagesize =3D fault_granule;
> -               else
> -                       vma_pagesize =3D transparent_hugepage_adjust(kvm,=
 memslot,
> +                       target_size =3D fault_granule;
> +               else if (!is_gmem)
> +                       target_size =3D transparent_hugepage_adjust(kvm, =
memslot,
>                                                                    hva, &=
pfn,
>                                                                    &fault=
_ipa);
>
> -               if (vma_pagesize < 0) {
> -                       ret =3D vma_pagesize;
> +               if (target_size < 0) {
> +                       ret =3D target_size;
>                         goto out_unlock;
>                 }
>         }
> @@ -1725,7 +1721,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph=
ys_addr_t fault_ipa,
>         if (!fault_is_perm && !device && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA *=
/
>                 if (mte_allowed) {
> -                       sanitise_mte_tags(kvm, pfn, vma_pagesize);
> +                       sanitise_mte_tags(kvm, pfn, target_size);
>                 } else {
>                         ret =3D -EFAULT;
>                         goto out_unlock;
> @@ -1750,10 +1746,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, =
phys_addr_t fault_ipa,
>
>         /*
>          * Under the premise of getting a FSC_PERM fault, we just need to=
 relax
> -        * permissions only if vma_pagesize equals fault_granule. Otherwi=
se,
> +        * permissions only if target_size equals fault_granule. Otherwis=
e,
>          * kvm_pgtable_stage2_map() should be called to change block size=
.
>          */
> -       if (fault_is_perm && vma_pagesize =3D=3D fault_granule) {
> +       if (fault_is_perm && target_size =3D=3D fault_granule) {
>                 /*
>                  * Drop the SW bits in favour of those stored in the
>                  * PTE, which will be preserved.
> @@ -1761,7 +1757,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, ph=
ys_addr_t fault_ipa,
>                 prot &=3D ~KVM_NV_GUEST_MAP_SZ;
>                 ret =3D KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, f=
ault_ipa, prot, flags);
>         } else {
> -               ret =3D KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa=
, vma_pagesize,
> +               ret =3D KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa=
, target_size,
>                                              __pfn_to_phys(pfn), prot,
>                                              memcache, flags);
>         }