From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1F245C3ABBC
	for <linux-mm@archiver.kernel.org>; Tue,  6 May 2025 20:47:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 23E816B0082; Tue,  6 May 2025 16:47:02 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 1C6546B0083; Tue,  6 May 2025 16:47:02 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 040D26B0085; Tue,  6 May 2025 16:47:01 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id D56526B0082
	for <linux-mm@kvack.org>; Tue,  6 May 2025 16:47:01 -0400 (EDT)
Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 7894C1A02EC
	for <linux-mm@kvack.org>; Tue,  6 May 2025 20:47:03 +0000 (UTC)
X-FDA: 83413667526.04.35BB4F8
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202])
	by imf15.hostedemail.com (Postfix) with ESMTP id A9C70A0009
	for <linux-mm@kvack.org>; Tue,  6 May 2025 20:47:01 +0000 (UTC)
Authentication-Results: imf15.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=DCLkk0a1;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf15.hostedemail.com: domain of 3RHUaaAsKCN8BDLFSMFZUOHHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3RHUaaAsKCN8BDLFSMFZUOHHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--ackerleytng.bounces.google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746564421; a=rsa-sha256;
	cv=none;
	b=IJ7PcTyN3+A5ZalPn8wU4ByYJbKfVjc2R9AHKY3h2/m4H0cK2D1h3Aeyq7JFt1j/8QMX4m
	Q9nwBpnmeE+tcGUiDf/MHFetSFXFh7zfflhBN/+3VhKPn/cycsvrC3/zMppx2hCISV25QU
	kPZXnuYmJYra9GQjiYcl7f/exhzP/XQ=
ARC-Authentication-Results: i=1;
	imf15.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=DCLkk0a1;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf15.hostedemail.com: domain of 3RHUaaAsKCN8BDLFSMFZUOHHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--ackerleytng.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3RHUaaAsKCN8BDLFSMFZUOHHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--ackerleytng.bounces.google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1746564421;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=DCzobWbLqzYgZ0RtPdZiSF2En8bMULz9xtc7ptlaLyg=;
	b=k+wfcg7EYa6qtJEe4UkxgDqFoPbDpsPH3+VbZfPpKyyaOzKyPSy4TZivzJFbk+EJBjMUeC
	BWuyWZZ6jIRZLlcBb4egDhvTWP7bih2Rrc/ggiGA//eIqIN5li/BNXMOARQh/sLEC0nwYa
	CsslbzeLUejdJfkXiWa2WX8uCRe3+GY=
Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b00e4358a34so3378545a12.0
        for <linux-mm@kvack.org>; Tue, 06 May 2025 13:47:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1746564420; x=1747169220; darn=kvack.org;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id
         :reply-to;
        bh=DCzobWbLqzYgZ0RtPdZiSF2En8bMULz9xtc7ptlaLyg=;
        b=DCLkk0a1Qb88AY6jva3SDnob4Nvoc2ws5m165aOEhZbLur5cVRQor6X4hQHtoNqXje
         piTW/s/8UynXKJNgFjZYBEQDqvXgASpPPKb04Sp7Z3cCeBT9VUxbhFKq8+FBRpeaOQ2F
         0bI83k108NhNtL3/92uyTdE7Z/FTuZX55LLnPtb+bkO9IRZLIvhAJozGan/0OcZwZ7ZQ
         WAGiOiDGlZ4749B7YalnA91jKPeEfBHB5O8spXlJhNz+sOLWBka7YQMoLJwX+rXnDzA1
         /fNA2dgQ9wkaIFMmfPIUIyZFWVvtSfbmFZzuX2EwiApBLJr2SA5sFguSRaXv/tTeEM9g
         HbhQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1746564420; x=1747169220;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject
         :date:message-id:reply-to;
        bh=DCzobWbLqzYgZ0RtPdZiSF2En8bMULz9xtc7ptlaLyg=;
        b=jin0xj6o0ah8m4cVBwmPeP1qhIgBsrVYdSDvF0SEd+c24F9sLJsTrEoACExPXmfSaR
         hGQQN6SdTyoOgBNwGg2WGkI3HTzBeK1FSgCAvPLkuf3YqAgK4Nc4CwWK928W8DZEQEL2
         u/RlgdnVnPU0gufnFE+WwQc1i/5JXvA87TPPSo7wBxpZWeRxsEcLjhyaCZ424wy2kS/W
         sDiHw0oDD5X2lKXhvO9RQZjlt4KwyZA/8rJwNV/fygQrOqAEpXJyDraZQuSAC1p7EJI1
         xJK3uJUeLUJEf9EIKP83+AUuSd4XMGuKesab1s7ZXdBEs2H3ktx6zlBCY+t33B7yq2kx
         Swfw==
X-Forwarded-Encrypted: i=1; AJvYcCWF50dyI7xc8z2ZGsSAFiw9txuIC+/MZZnJxYJRC4qezHvMqdBsY+oOVeQbw2uLNC827nXUflUPQA==@kvack.org
X-Gm-Message-State: AOJu0YzAlkT2eDWyo0hYlqchqOoRwAmdR27OqLwsH22cLH6w+Nb+1CS5
	QqVxGBGNj1YOhU3gantMhaTtYg1J09W/0Mi1MVuEYbLKyZ293qV+AZWeg4vOTksfKIYqeROk9Wh
	0yaKFJtok+mw+abF0swu20A==
X-Google-Smtp-Source: AGHT+IHuf4HiiM5l7QrOtvUrpzBKa8NR8v2a6GaWbIwD5Bf0Sh8JWbcl8IXLPMkv3T1UFmq2oEYM6jTxGiIuLrPMRg==
X-Received: from pfjg15.prod.google.com ([2002:a05:6a00:b8f:b0:740:6f6:7346])
 (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a20:3d8f:b0:1f5:902e:1e97 with SMTP id adf61e73a8af0-2148d53bad9mr825091637.41.1746564420272;
 Tue, 06 May 2025 13:47:00 -0700 (PDT)
Date: Tue, 06 May 2025 13:46:58 -0700
In-Reply-To: <39ea3946-6683-462e-af5d-fe7d28ab7d00@redhat.com>
Mime-Version: 1.0
References: <diqz7c31xyqs.fsf@ackerleytng-ctop.c.googlers.com>
 <386c1169-8292-43d1-846b-c50cbdc1bc65@redhat.com> <aBTxJvew1GvSczKY@google.com>
 <diqzjz6ypt9y.fsf@ackerleytng-ctop.c.googlers.com> <7e32aabe-c170-4cfc-99aa-f257d2a69364@redhat.com>
 <aBlCSGB86cp3B3zn@google.com> <CAGtprH8DW-hqxbFdyo+Mg7MddsOAnN+rpLZUOHT-msD+OwCv=Q@mail.gmail.com>
 <CAGtprH9AVUiFsSELhmt4p24fssN2x7sXnUqn39r31GbA0h39Sw@mail.gmail.com>
 <aBoVbJZEcQ2OeXhG@google.com> <39ea3946-6683-462e-af5d-fe7d28ab7d00@redhat.com>
Message-ID: <diqzh61xqxfh.fsf@ackerleytng-ctop.c.googlers.com>
Subject: Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to
 guest_memfd fault lookups
From: Ackerley Tng <ackerleytng@google.com>
To: David Hildenbrand <david@redhat.com>, Sean Christopherson <seanjc@google.com>, 
	Vishal Annapurve <vannapurve@google.com>
Cc: Fuad Tabba <tabba@google.com>, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, 
	linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, 
	mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, 
	palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, 
	brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, 
	xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, 
	jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, 
	isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, 
	mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, 
	liam.merwick@oracle.com, isaku.yamahata@gmail.com, 
	kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, 
	quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, 
	quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, 
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, 
	james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, 
	maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, 
	roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, 
	rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, 
	jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: A9C70A0009
X-Stat-Signature: johqx6y93ffkemjrmbrtzqtwfcpaugnx
X-HE-Tag: 1746564421-191608
X-HE-Meta: U2FsdGVkX1/xjYGkgCkPBFLTCPK69kl9h93y0OcUsQ35GqGXGmtQM7FaMZhpabeXZ1LGNtvynVNDMzrIHzqh0IHjK8g0m4KIIhB1Vr01nEPPjZX19SAfBbZI6VBXGjMhDZr2Gnv7/ijfAomS07Cymf3y2Qj4f3dBpdLV53QV64lKozuhPmEPt2sFVhRj3+/CvvTdO2GPF7wfWgwdScm4DrONIvIVAe4h4hL9+R4xFwa5IJLIFbvlrYnyx6bx1PFNFSpR8LE6CmRSrS1kb98EVAVeACjLk5TFDRYsX8mno2QRscuuJ4tQdWZvPCGnh6lGvYbi8mXel35bFSo8rDLZZnmXSvCKXmRvovM1JfojcLlAvZjc3t8BF4v1R1ZW4UALkO+QIFnJEdfdTLVQSVODNZEH7VOV9gQYUQ9OqKTcnj+2InijlNcL5ZSMjS0UdGpco1suU8wukroa7hS9oI0NLCUt6eUWlR7ilk3GCF1goMdb/sxX+TEiCVn4Unexa5QO4kblCn0bNvhCt8CY8U7H2nUA2XLCbeU213OlY4FQcWp/6T0JflH08HBuzr956egP8Fu0RjdY6o4XEkrs8HYdc6yOIykmi2hPC9MfzzccUWD14gQrAQzkt4A8WdiO4HA+uxmKRT4ipnmFADHo2imQ+ZB4u5FIMoTZNWuFN9pE5GuXob8E34h5w3/o3bXY86T1F5dgWheMfUbhAIjkQ0LX+FuXYPzwDMKCpBUlU1ZYloj388BaAVLgrFyYaEHboCbc1KYqOqFzzNXzEeNPixtzyLwd8HAWTuOKR63YgJvzVdfnwf/ELGbsJhp/SUQkZFOTiPJ0LXbBfoBrYECrIUiR06eXWJOIxZr+tY8oyfowViQBlnrLwtJ0zS7ro9v7TClV4XgZ9YIztpeqbBj0CANM/X+J0Y/rveOMbg9tQnfqBHsZNT+eKHY005AzX7AZWia4rEDW9e6XO6R5z5mVVZD
 rIcvEmCT
 lF413GMIl2pZiAL+rao9M2JKdcdk0b+DXRpPhsJjwpS4gtxkGpASM0ivuNeNg4xsidGBfif1Dc69D4g5JQLQrE18z+/kCN6Fexz2N2LlLQRHivZ9YH+Slu5agcotyCnugL3toVohBWK59cfccB6virAzy80AwzpF4KDuDsI14CfJT6qt9obYwOw+T8qdQiOHwbTIPYgoXjcgSdmIyvmUlzm8iELyyziLunE3mwg0V8jNqhDw23n/YM+7E4M6Um80DQCsglZDXXlmyoctlM+f+Oi/LJMjBAEfnWvLTAbAr2CdGV64XIRx1eLOeJL6Cy2drLKGb1xCE4InuqtRK+7W5TQc+/COj6PAKZqNH+qX8xnAJdPtDY9WsPMQfNQMcnskKGyPYm/kX3yMYqmU7md+xe/Plog/QzphTcIVpOPwVC+9bEWvhaAklcHyGmgir0DUKHglxhjAUs2jkTZxCY6D+JY/Z2vCb2qmQi0nSne8auBuxV/2pif5ucPbVhkHtahGCBZYrvJj4tJwD8bgnfoO59hb2wsEpcKR9Pm6hjCogdBCvus3f0WA1hNy+OPez62NBfdcfrpk0ZDdQKbrB/DQ95gBIQOyQBEu/aiHRuWnGlQgPRiB3FyuV5K+qbM1X2JMh0/FOcMZqndbsaKrT4pYiYQh60Lb2S7ukPbZ28entCWCPXe23bU0+AIrZJJOakdQ+1fBnAlHYDIeTBxsyPKn34zHKwy2ZafyuDfk0r0I8LbWgFXG/EUlthPjwSc0quAALRQcHwy2MzbchAGKRcUQDYXeWjYum14ki9xmq
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

David Hildenbrand <david@redhat.com> writes:

> On 06.05.25 15:58, Sean Christopherson wrote:
>> On Mon, May 05, 2025, Vishal Annapurve wrote:
>>> On Mon, May 5, 2025 at 10:17=E2=80=AFPM Vishal Annapurve <vannapurve@go=
ogle.com> wrote:
>>>>
>>>> On Mon, May 5, 2025 at 3:57=E2=80=AFPM Sean Christopherson <seanjc@goo=
gle.com> wrote:
>>>>>> ...
>>>>>> And not worry about lpage_infor for the time being, until we actuall=
y do
>>>>>> support larger pages.
>>>>>
>>>>> I don't want to completely punt on this, because if it gets messy, th=
en I want
>>>>> to know now and have a solution in hand, not find out N months from n=
ow.
>>>>>
>>>>> That said, I don't expect it to be difficult.  What we could punt on =
is
>>>>> performance of the lookups, which is the real reason KVM maintains th=
e rather
>>>>> expensive disallow_lpage array.
>>>>>
>>>>> And that said, memslots can only bind to one guest_memfd instance, so=
 I don't
>>>>> immediately see any reason why the guest_memfd ioctl() couldn't proce=
ss the
>>>>> slots that are bound to it.  I.e. why not update KVM_LPAGE_MIXED_FLAG=
 from the
>>>>> guest_memfd ioctl() instead of from KVM_SET_MEMORY_ATTRIBUTES?
>>>>
>>>> I am missing the point here to update KVM_LPAGE_MIXED_FLAG for the
>>>> scenarios where in-place memory conversion will be supported with
>>>> guest_memfd. As guest_memfd support for hugepages comes with the
>>>> design that hugepages can't have mixed attributes. i.e. max_order
>>>> returned by get_pfn will always have the same attributes for the folio
>>>> range.
>>=20
>> Oh, if this will naturally be handled by guest_memfd, then do that.  I w=
as purely
>> reacting to David's suggestion to "not worry about lpage_infor for the t=
ime being,
>> until we actually do support larger pages".
>>=20
>>>> Is your suggestion around using guest_memfd ioctl() to also toggle
>>>> memory attributes for the scenarios where guest_memfd instance doesn't
>>>> have in-place memory conversion feature enabled?
>>>
>>> Reading more into your response, I guess your suggestion is about
>>> covering different usecases present today and new usecases which may
>>> land in future, that rely on kvm_lpage_info for faster lookup. If so,
>>> then it should be easy to modify guest_memfd ioctl to update
>>> kvm_lpage_info as you suggested.
>>=20
>> Nah, I just missed/forgot that using a single guest_memfd for private an=
d shared
>> would naturally need to split the folio and thus this would Just Work.

Sean, David, I'm circling back to make sure I'm following the discussion
correctly before Fuad sends out the next revision of this series.

>
> Yeah, I ignored that fact as well. So essentially, this patch should be=
=20
> mostly good for now.
>

>From here [1], these changes will make it to v9

+ kvm_max_private_mapping_level renaming to kvm_max_gmem_mapping_level
+ kvm_mmu_faultin_pfn_private renaming to kvm_mmu_faultin_pfn_gmem

> Only kvm_mmu_hugepage_adjust() must be taught to not rely on=20
> fault->is_private.
>

I think fault->is_private should contribute to determining the max
mapping level.

By the time kvm_mmu_hugepage_adjust() is called,

* For Coco VMs using guest_memfd only for private memory,
  * fault->is_private would have been checked to align with
    kvm->mem_attr_array, so=20
* For Coco VMs using guest_memfd for both private/shared memory,
  * fault->is_private would have been checked to align with
    guest_memfd's shareability
* For non-Coco VMs using guest_memfd
  * fault->is_private would be false

Hence fault->is_private can be relied on when calling
kvm_mmu_hugepage_adjust().

If fault->is_private, there will be no host userspace mapping to check,
hence in __kvm_mmu_max_mapping_level(), we should skip querying host
page tables.

If !fault->is_private, for shared memory ranges, if the VM uses
guest_memfd only for shared memory, we should query host page tables.

If !fault->is_private, for shared memory ranges, if the VM uses
guest_memfd for both shared/private memory, we should not query host
page tables.

If !fault->is_private, for non-Coco VMs, we should not query host page
tables.

I propose to rename the parameter is_private to skip_host_page_tables,
so

- if (is_private)
+ if (skip_host_page_tables)
	return max_level;

and pass

skip_host_page_tables =3D fault->is_private ||
			kvm_gmem_memslot_supports_shared(fault->slot);

where kvm_gmem_memslot_supports_shared() checks the inode in the memslot
for GUEST_MEMFD_FLAG_SUPPORT_SHARED.

For recover_huge_pages_range(), the other user of
__kvm_mmu_max_mapping_level(), currently there's no prior call to
kvm_gmem_get_pfn() to get max_order or max_level, so I propose to call
__kvm_mmu_max_mapping_level() with

if (kvm_gmem_memslot_supports_shared(slot)) {
	max_level =3D kvm_gmem_max_mapping_level(slot, gfn);
	skip_host_page_tables =3D true;
} else {
	max_level =3D PG_LEVEL_NUM;
        skip_host_page_tables =3D kvm_slot_has_gmem(slot) &&
				kvm_mem_is_private(kvm, gfn);
}

Without 1G support, kvm_gmem_max_mapping_level(slot, gfn) would always
return 4K.

With 1G support, kvm_gmem_max_mapping_level(slot, gfn) would return the
level for the page's order, at the offset corresponding to the gfn.

> Once we support large folios in guest_memfd, only the "alignment"=20
> consideration might have to be taken into account.
>

I'll be handling this alignment as part of the 1G page support series
(won't be part of Fuad's first stage series) [2]

> Anything else?
>
> --=20
> Cheers,
>
> David / dhildenb


[1] https://lore.kernel.org/all/20250430165655.605595-7-tabba@google.com/
[2] https://lore.kernel.org/all/diqz1pt1sfw8.fsf@ackerleytng-ctop.c.googler=
s.com/