From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id EF6B5C3ABB9
	for <linux-mm@archiver.kernel.org>; Mon,  5 May 2025 23:10:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E31AF6B0085; Mon,  5 May 2025 19:10:02 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DB9386B0089; Mon,  5 May 2025 19:10:02 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C59576B008A; Mon,  5 May 2025 19:10:02 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id A97526B0085
	for <linux-mm@kvack.org>; Mon,  5 May 2025 19:10:02 -0400 (EDT)
Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 7FB01160CD1
	for <linux-mm@kvack.org>; Mon,  5 May 2025 23:10:03 +0000 (UTC)
X-FDA: 83410399086.06.4BEDE91
Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201])
	by imf16.hostedemail.com (Postfix) with ESMTP id C3FD6180011
	for <linux-mm@kvack.org>; Mon,  5 May 2025 23:10:01 +0000 (UTC)
Authentication-Results: imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=nOFPPfwl;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf16.hostedemail.com: domain of 3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746486601; a=rsa-sha256;
	cv=none;
	b=SllBkYhDlkQ7LQBt/lEMrfXTF63TWp/64hDEBEdNODxpqyL9h17bteygQSMDMopwJaNCS6
	vXyJ93kPeP69PIckj4ft90y+g94Gq1eoFu4CAGmdj1py80nXPktI7oxMtnpXpcS+JW3bHF
	gEL9+AQpGfDGVWUtavfqt3Ad0/gU9EA=
ARC-Authentication-Results: i=1;
	imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=nOFPPfwl;
	dmarc=pass (policy=reject) header.from=google.com;
	spf=pass (imf16.hostedemail.com: domain of 3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3SEUZaAsKCH8dfnhuoh1wqjjrrjoh.frpolqx0-ppnydfn.ruj@flex--ackerleytng.bounces.google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1746486601;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=QKDIrUmShBenn7oyPXZBGKMvZgIcDdiYuAx6vvvr0uk=;
	b=V2SqJu+4HLfr+fiTuMYlsVhf/6Oe2LlkeryQtfCriqbrKcxja3byFaPRHS4iZbdYZ2c5by
	pW/IOT1sL3jkbcMTlOe7Ha2r59xoahmLZkrgUsKyie8Mk4Nj3BojkuJbONrKRAM6Ne5uuL
	8cebThOtq+chcRIcpq7cXAeHp2PkHtc=
Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-736b5f9279cso4114803b3a.2
        for <linux-mm@kvack.org>; Mon, 05 May 2025 16:10:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1746486600; x=1747091400; darn=kvack.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=QKDIrUmShBenn7oyPXZBGKMvZgIcDdiYuAx6vvvr0uk=;
        b=nOFPPfwlwbzaqSP6+SE9DLTGGES7k9dcj25+jdB9eozLeHDHdRngyPqXfXNKce7q14
         MlRmTOQoCjxEQ0Pe2vCtH757m/FIvn2kb5p/Yc9LJ+USY4cRj0DBTv4O/Kl/B03Ba2fF
         hwl1DKFz0Y28IjraFxDkr875kaZ0UnUlBLNyIFIL7zj+UuDOEKHFodlK353n7Hm/yCqJ
         UvSue8EHPdlHM/zX0PlI+BIOXDygw+eOx3y8Iv3OEbhPFK/dNN/UGccEHxazbGxRa0Hr
         cj/KNOb/Wx7Xjhs5OhWQWDZ0lC7m44J9Mer/ZVLZlcv5MtkCDhBl5+wJ+gN5/8Hbvh0C
         KVeg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1746486600; x=1747091400;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=QKDIrUmShBenn7oyPXZBGKMvZgIcDdiYuAx6vvvr0uk=;
        b=ojy5csvAQ5kQFOOONBtmxeT6eKpuq7y8P360R4NwafQel+iKPUfFyxqJM9cNka4rNE
         tJITUxnwykt6DhPQXEfURl+juI365o1CIOTYGGPCmqw3ouo0xZESGFFYKNcaf4wz5zGl
         QaJhd1s+zaJMPcOs4aIG+DLxGVXoLwumUlyFmOVAGV+nJc7WxS+uLq00n32t5riLQ23+
         ZLsuLhID2AUXGLaGhVOqAkny1wC6fvz+Ss3GBWwOBAYrm16q9h4+zRRN0cjRfQKnysmR
         3tKArZdvX4dpE4n4HtWuG50xXP54bMzv9lhP4xD2x88XUIfzKdH/MK1gKPLi1LrEuc/i
         txPw==
X-Forwarded-Encrypted: i=1; AJvYcCVOpqpfr4HwZw0lz6mogMhr2WcMAReIuHZaINyjNfct6p6crBOh8FlehcghdvTjkM5DlbcaKytWow==@kvack.org
X-Gm-Message-State: AOJu0YyD8HIuuKPcB1mPRKL02Kr49g9CUwpqauP3fA4/TFr3TMHqlHOS
	azPvGcu2vBzR9ivjmPB+8++hXmNh7ulToOignz3HwTZR/ou7nUT3S0lpKKQ/n67fiQAGc4yhQvJ
	DFyuTol/fuZOs//PFqp/vIg==
X-Google-Smtp-Source: AGHT+IH8nRtKZdcXRNW5pHePDqwjVYLFx8dauinck+p9OIy/5vnrG+4KpoUHfGGyuxJRstUfMIhSpCGvhSjtLgsvJw==
X-Received: from pfst35.prod.google.com ([2002:aa7:8fa3:0:b0:736:86e0:8dee])
 (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a00:aa8d:b0:73e:1e24:5a4e with SMTP id d2e1a72fcca58-74091a963c7mr1411673b3a.24.1746486600519;
 Mon, 05 May 2025 16:10:00 -0700 (PDT)
Date: Mon, 05 May 2025 16:09:58 -0700
In-Reply-To: <7e32aabe-c170-4cfc-99aa-f257d2a69364@redhat.com>
Mime-Version: 1.0
References: <diqz7c31xyqs.fsf@ackerleytng-ctop.c.googlers.com>
 <386c1169-8292-43d1-846b-c50cbdc1bc65@redhat.com> <aBTxJvew1GvSczKY@google.com>
 <diqzjz6ypt9y.fsf@ackerleytng-ctop.c.googlers.com> <7e32aabe-c170-4cfc-99aa-f257d2a69364@redhat.com>
Message-ID: <diqzfrhik62h.fsf@ackerleytng-ctop.c.googlers.com>
Subject: Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to
 guest_memfd fault lookups
From: Ackerley Tng <ackerleytng@google.com>
To: David Hildenbrand <david@redhat.com>, Sean Christopherson <seanjc@google.com>
Cc: Fuad Tabba <tabba@google.com>, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, 
	linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, 
	mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, 
	palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, 
	brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, 
	xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, 
	jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, 
	isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, 
	vannapurve@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, 
	wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, 
	kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, 
	quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, 
	quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, 
	quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, 
	james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, 
	maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, 
	roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, 
	rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, 
	jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: C3FD6180011
X-Stat-Signature: ud4skk3qechjj36d6jj4zd5o9arumwx5
X-Rspam-User: 
X-HE-Tag: 1746486601-219046
X-HE-Meta: U2FsdGVkX18zbe4Fv7oLNpNni2WFdTOHDE+ILDNvwIMKGAbGT8RNQqqcbg5rp6ufcA4hOuBD6pHG89mH3DnB8bCZkw3B5+RSFKc3mQU1yBYHpSM8co8Go91PdEvUHGQBsgb4X/nmVRRqv0g/udsHTIYhytsHhPNhdqnjU9eCMdUs/C3QE/r15rbzPgMR4qFLioIwccTTEbPBrbVaevhR1D/WNtIrjclTGhpYS2VtK9FwSKf8YpixBDqOncgFdIT4Vtu0N0N1keGTlf7sfmyv4GFegQXaJgI72GQsd7I0MUsZXurDTO7V4drU6on2DPUiEwh6gAekWdkZvyc8ORnLCXwTGmrnUgREgwRIPUH2eZCJnO81IQKWYa7cdPSSN/7jF0IRE8nQ3Dm3eBqAg7B4GAx37NoTjXYE55ZMJhyXyAfKRmFITENKH3mhl1AmwqYPM8YuHMZXtwoBpDgDbI0v8Ly0HwKRwYMQFAZWxYszpP7Xji89fi4A4NwPHq3IVO8brsNGE7zpMKgwxnOLiy+SWENrvXYD937lEWsvhQUPIMg5KIQnWFoxgPq1T6OPAd2DoIkwCfv8rm7cPmzkGXvlAC37fnvRoej2wpSeouziu3rdVed/1XlU8Tkcirv8OH6WcGL2qXp24WZitKb25wJrGBG2aFvkkwIm8xtu8n2g7cW/DAfepA4OGthcvh0SwmpEp7Npcn3kDpOLP24LPrRisjWvqB2ocFXLdhGIrJ1mBU495mdD2FRhAlA8G8U3emFfpsXjrHg0XPED963t6yw88gNv+55LOoLlc5JsmNGz1MlEOdmvsHMFRGWp6NdUK4juJob42Qu1p7AHp+wacnTNQAIZhUH14cZFou86a2VVBKQ3TsZyq3h8YsJQRWU8To3qKnaRooY2nmewl52WJnluNDRI50q2tr2BpLzz008RHjyKcq9Nj06t8nhsZFs7K5V99Ck9mExycbEO0bdzG77
 Tl4pLJ+e
 sLhkiS4fuKnak+55VeFOAUyBC9+R9f+CjAEU9aHWNiCJf40E9t5i/LxlVySiSR6o+5htoGJk2yOSubiEb/XoIQxyiibBxNgdWkYGAsCi4AKkg1E++JQRoEaPFykMBp1tSAALMg+CG1hQY46NJ//1IIbA+RDUUbOFSh2/tfEkAoMiEtzyeUH6+Z+Yq0/c5Ey4bibMFZgO9aD8h1DyQD4YHHMZamZ1T4elLhEf1iFXyM0MKy3929eS9+rA5YkltpKJ4tdYXrQtXY5DsDULdT3GMfyhl8D4s61t7o8YblcmFuguxF/4KPuXngyOrUyHLGW2ELd1mdfWNKuqRmAsvrw4a+af+qjogaqlEJLtmnE225dHXpTbf21YUUViNweXC7EC6NA4MO7kPyxUdaCnnUWw5zN15tzAFuYGoCykXDwjhi5KXB5Y0LDsNxzte9XtvSIYQB16xwt+ipH011x32524FOgqsSqWubce/+ZRbUEkG/Gt2FJGv1CqKIo1bJPW3DPTzRqIV6TPd5jTaXz088zB08gSAw+Y37f7I4q87LKBlT2lcX7684twjWvo4wxAAEwqtdnOfEihK2dEKgWiI18im5pd4nMVN91Ye2fDobmJxGzfAmLghjt4w4IrOTZnXYWK8d+zapw2HBfjqHc2kXbuG/W80tuRvXBuOKhzfpUZWHHE/LwWiSlMoUXPChcgh0TBgUZPIIP2n2laAYoRjozghQd/HLsTE+MUOXR5I
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

David Hildenbrand <david@redhat.com> writes:

> On 03.05.25 00:00, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>>> On Fri, May 02, 2025, David Hildenbrand wrote:
>>>> On 30.04.25 20:58, Ackerley Tng wrote:
>>>>>> -	if (is_private)
>>>>>> +	if (is_gmem)
>>>>>>    		return max_level;
>>>>>
>>>>> I think this renaming isn't quite accurate.
>>>>
>>>> After our discussion yesterday, does that still hold true?
>>>
>>> No.
>>>
>>>>> IIUC in __kvm_mmu_max_mapping_level(), we skip considering
>>>>> host_pfn_mapping_level() if the gfn is private because private memory
>>>>> will not be mapped to userspace, so there's no need to query userspace
>>>>> page tables in host_pfn_mapping_level().
>>>>
>>>> I think the reason was that: for private we won't be walking the user space
>>>> pages tables.
>>>>
>>>> Once guest_memfd is also responsible for the shared part, why should this
>>>> here still be private-only, and why should we consider querying a user space
>>>> mapping that might not even exist?
>>>
>>> +1, one of the big selling points for guest_memfd beyond CoCo is that it provides
>>> guest-first memory.  It is very explicitly an intended feature that the guest
>>> mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
>>> guest can use larger page sizes, have RW while the host has RO, etc.
>> 
>> Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
>> the parameter renaming from is_private to is_gmem, do something like
>> 
>> if (is_gmem)
>> 	return kvm_gmem_get_max_mapping_level(slot, gfn);
>
> I assume you mean, not looking at lpage_info at all?
>

My bad. I actually meant just to take input from guest_memfd and stop
there without checking with host page tables, perhaps something like
min(kvm_gmem_get_max_mapping_level(slot, gfn), max_level);

> I have limited understanding what lpage_info is or what it does. I 
> believe all it adds is a mechanism to *disable* large page mappings.
>

This is my understanding too.

> We want to disable large pages if (using 2M region as example)
>
> (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
>      of that region are shared vs. private (mixed memory attributes ->
>      KVM_LPAGE_MIXED_FLAG)
>
>   -> With gmem-shared we could have mixed memory attributes, not a PFN
>      fracturing. (PFNs don't depend on memory attributes)
>
> (b) page track: intercepting (mostly write) access to GFNs
>

Could you explain more about page track case? 

>
> So, I wonder if we still have to take care of lpage_info, at least for
> handling (b) correctly [I assume so]. Regarding (a) I am not sure: once 
> memory attributes are handled by gmem in the gmem-shared case. IIRC, 
> with AMD SEV we might still have to honor it? But gmem itself could 
> handle that.
>

For AMD SEV, I believe kvm_max_private_mapping_level() already takes
care of that, at least for the MMU faulting path [1], where guest_memfd
gives input using max_order, then arch-specific callback contributes input.

>
> What we could definitely do here for now is:
>
> if (is_gmem)
> 	/* gmem only supports 4k pages for now. */
> 	return PG_LEVEL_4K;
>
> And not worry about lpage_infor for the time being, until we actually do 
> support larger pages.
>
>

Perhaps this is better explained as an RFC in code. I'll put in a patch
as part of Fuad's series if Fuad doesn't mind.

>> 
>> and basically defer to gmem as long as gmem should be used for this gfn?
>> 
>> There is another call to __kvm_mmu_max_mapping_level() via
>> kvm_mmu_max_mapping_level() beginning from recover_huge_pages_range(),
>> and IIUC that doesn't go through guest_memfd.
>> 
>> Hence, unlike the call to __kvm_mmu_max_mapping_level() from the KVM x86
>> MMU fault path, guest_memfd didn't get a chance to provide its input in
>> the form of returning max_order from kvm_gmem_get_pfn().
>
> Right, we essentially say that "this is a private fault", likely 
> assuming that we already verified earlier that the memory is also private.
>
> [I can see that happening when the function is called through 
> direct_page_fault()]
>
> We could simply call kvm_mmu_max_mapping_level() from 
> kvm_mmu_hugepage_adjust() I guess. (could possibly be optimized later)
>
> -- 
> Cheers,
>
> David / dhildenb

[1] https://github.com/torvalds/linux/blob/01f95500a162fca88cefab9ed64ceded5afabc12/arch/x86/kvm/mmu/mmu.c#L4480