From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id BEBACC3ABDD
	for <linux-mm@archiver.kernel.org>; Tue, 20 May 2025 16:03:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DF25E6B009A; Tue, 20 May 2025 12:03:06 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DA2766B009B; Tue, 20 May 2025 12:03:06 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C6A4A6B009C; Tue, 20 May 2025 12:03:06 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 9F6106B009A
	for <linux-mm@kvack.org>; Tue, 20 May 2025 12:03:06 -0400 (EDT)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 45EDF1A087B
	for <linux-mm@kvack.org>; Tue, 20 May 2025 16:03:06 +0000 (UTC)
X-FDA: 83463755172.25.A190736
Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181])
	by imf24.hostedemail.com (Postfix) with ESMTP id 23280180026
	for <linux-mm@kvack.org>; Tue, 20 May 2025 16:03:03 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=2IqhvboT;
	spf=pass (imf24.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=vannapurve@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1747756984;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=LP+IgKh5WuP7/61GoXCSgoWlxqk6ttGptinEY/thQH0=;
	b=U+gHyVsIF/tb4BoImoWHyVyw6Ch6j/xKkeNgzGxG2SauUJ3Y+e2I+pWrHRWnakuyeVPClS
	8DfJ7cNOYVE18OurcFd2q44ijYhBNO6scLr/rz7SZlbQ8NRm1qBljJEk7oCk41/iFI0Z2A
	L7p6xVUU6+ZprumrYgkymzcXzRUqy4A=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=2IqhvboT;
	spf=pass (imf24.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=vannapurve@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747756984; a=rsa-sha256;
	cv=none;
	b=lYQYRdTG/XvuR9X/UzcZhbG6gUup/rKbbnl0BrGgOzUm7kxKOKEAYHhyR/mW9QX5e5UD8s
	jrG1TxxtM77X2+caQoN55R8ven3SBOTAJiNzn8QExLMfPWRCVrB9pR77WoypcHRlhv+FhV
	cSKhoEUqZYH4ltdVclfwK+bE+zKkdgs=
Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-231f61dc510so785205ad.0
        for <linux-mm@kvack.org>; Tue, 20 May 2025 09:03:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747756983; x=1748361783; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=LP+IgKh5WuP7/61GoXCSgoWlxqk6ttGptinEY/thQH0=;
        b=2IqhvboTAfckHZM6pjeOFjennW7leujd0M04U+6zqTmncr/BhtNJr2ZUZYWDg05b/r
         4dyLHWVQgw3mk2U4G2YDlcKm5BGjO8NkBZP4sLVTvrfsXfRG1D4LHUyxRPZgwPMJS1NF
         Snh3sAh4s/hK4SHQubtJf6/ZxPfyP/Ohs8eOwVk/GpYAjRXC4f1cWwElmUwq0WEIyS3V
         McYglSBNoWbYfTXde28WaUFEh3+VRUtl+cXZ3+YwfdtN53lb4+kChrRJzt9+SXW6I+yx
         y9wXN7IZcmgVcYSqxy+UJ78j2mb/TjS4ggBBRgCn1jRjAJNfO+D/3VC2RFnNUdeDPU0O
         AR9w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747756983; x=1748361783;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=LP+IgKh5WuP7/61GoXCSgoWlxqk6ttGptinEY/thQH0=;
        b=mmSiCvNkaZ1kM3c6IXBbHtXVmL7Xd/qJ/oxdY5JQXxzIO8EB2IkyK8eVT1G9eF6SMm
         ObCZ+J4xVpAwtJThSlwCQ7JBhptu+ZRCAjvJhxoguGlEOGuRFoz1h1h6yk4LVtbyjefH
         BDGuU0E1Xm55b/1DkPtWjP1t+SW1hHTaehZrpglDE4Rx2gnZCdDR3HfTOua126xPQGkY
         IsTZYZ6QlbcW4sL9nLBLWNh0TKJDybjnAbq7ES0UBSk1v9nGE7lYzahyt9gjToi+KM7S
         DixIWr/nFaaZYFm9dl4q0Rz+9lYMkxC1AMGIfa9Nss1TGZBuwZBN3Ym/bgz9VuC11mTg
         8RMQ==
X-Forwarded-Encrypted: i=1; AJvYcCVE8uxfXEWmmoNRUODCF3dcyGZXWUZavY5UTA8PBgf704UP38b5vT5HBYlamzYFhb8n/j2d3c9ckA==@kvack.org
X-Gm-Message-State: AOJu0YzhNDtikDoMLNWncdBc0HZDr6gAEYqyjiZyeldLKqc79CAU02D6
	WkNsUmp4Q8SMyYhHv9U8piSU4anyomHHy3NrG5cYMsn/9MFy+uBlvwLEgJlJ9UVNzHXCi36Q2nC
	Yv8x1BYCsCkiZ1+/ky6z629ZfEUMCfeNAaK45nUJ+
X-Gm-Gg: ASbGncs5G3LqQ/KalgTwyXEYApHNIjNsSPZKPowcF+OAv+aOtPlLjgSjpbqNfY8OGt9
	jcUDerrJHsDOcVj0aCu5CWPZjWo0ntV8YLyMQkE7Nhcpjka7hZ5ex4S2OdQeF0moXbY8dlF0sd4
	5SorL9AuvTJPpKtMZhGiRT8Sp24WNlQ6Sm/OPtXXIf5o2BNoqnilDYQqORqz46DTfBvgoNvgszY
	YVU
X-Google-Smtp-Source: AGHT+IHIqS2Dqz8t9Kspjt12t3UanzbX0TZpQnWe3Ht48Uo1xURTRGgrxqIt9Z9aa1RVZDHLDUDlIsXJxb1nUq6AGcg=
X-Received: by 2002:a17:903:4b4f:b0:231:fb83:9c3d with SMTP id
 d9443c01a7336-231ffdc5e60mr8624145ad.20.1747756982425; Tue, 20 May 2025
 09:03:02 -0700 (PDT)
MIME-Version: 1.0
References: <cover.1747264138.git.ackerleytng@google.com> <d3832fd95a03aad562705872cbda5b3d248ca321.1747264138.git.ackerleytng@google.com>
 <CA+EHjTxtHOgichL=UvAzczoqS1608RSUNn5HbmBw2NceO941ng@mail.gmail.com>
 <CAGtprH8eR_S50xDnnMLHNCuXrN2Lv_0mBRzA_pcTtNbnVvdv2A@mail.gmail.com>
 <CA+EHjTwjKVkw2_AK0Y0-eth1dVW7ZW2Sk=73LL9NeQYAPpxPiw@mail.gmail.com>
 <CAGtprH_Evyc7tLhDB0t0fN+BUx5qeqWq8A2yZ5-ijbJ5UJ5f-g@mail.gmail.com> <CA+EHjTy7iBNBb9DRdtgq8oYmvgykhSNvZL3FrRV4XF90t3XgBg@mail.gmail.com>
In-Reply-To: <CA+EHjTy7iBNBb9DRdtgq8oYmvgykhSNvZL3FrRV4XF90t3XgBg@mail.gmail.com>
From: Vishal Annapurve <vannapurve@google.com>
Date: Tue, 20 May 2025 09:02:50 -0700
X-Gm-Features: AX0GCFsR8m63CkFf_0gOE0SYuixW7eqLNNVHBhmGF8G-ihHN6Drh2TEMsr5J4cg
Message-ID: <CAGtprH_7jSpwF77j1GW8rjSrbtZZ2OW2iGck5=Wk67+VnF9vjQ@mail.gmail.com>
Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce
 KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls
To: Fuad Tabba <tabba@google.com>
Cc: Ackerley Tng <ackerleytng@google.com>, kvm@vger.kernel.org, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, 
	aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, 
	amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, 
	aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, 
	brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, 
	chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, 
	dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, 
	fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, 
	hch@infradead.org, hughd@google.com, ira.weiny@intel.com, 
	isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, 
	jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, 
	jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, 
	kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, 
	kirill.shutemov@intel.com, liam.merwick@oracle.com, 
	maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, 
	mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, 
	muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, 
	oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, 
	paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, 
	peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, 
	quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, 
	quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, 
	quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, 
	rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, 
	rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, 
	steven.sistare@oracle.com, suzuki.poulose@arm.com, thomas.lendacky@amd.com, 
	usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, 
	vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, 
	willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, 
	yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam10
X-Stat-Signature: sbbe5jq6diaczbsttdkkfrhsa7eqrmxq
X-Rspamd-Queue-Id: 23280180026
X-Rspam-User: 
X-HE-Tag: 1747756983-246199
X-HE-Meta: U2FsdGVkX1/gBpL+6zPlamPjflaQcgg+xuVTgT8rwmLghH1S7a6hKFnzJX1KiRXHZluKOt/HJhWh2NeWHYlrLLkh+t5JuueEY3lHHePR0Ih5ULpVQOsGEIHxhM1jAZvP+c7rQnSKxEK/iUO5XyKZZpHqjyjZuxI5ADL2EQcyQEVdICckY1pIvlR4XE+GYWtWOMmeaPV+F5bfJ+upaAUPbPsL0vP+TiNTQZx4kiiXeG77HYU7JZr5w7c8NWnKw7A7Fym2GiwJnDantYGlKJrvsb+Eg245vmIpz4UsM/HDnWIJ1HsXT1Xall1o2V7dXIYEwBhOWh7Jvqidcbp/yYajWMXVK5/MpBaxumPmdogIrsVu3hpz4mVriOlJxmyJ3bL17b0aVViPu3O37OCOj2uQ6Uv8DlS/4yw2754m9eM4d+8Gj+nq/Xyy976DXm0TGtjfabr/HbcHgJGZxit+bXRaEFq2gb8ytKdyzLWDZGOKrGx9IIOvnFk4dLCdoS5xSz4E2T4zGs66N4Wd4CjjHZIyJpGRda6reqe9AjMEXZ/S+Xs1/1YcSwuqhckPontvzH505Kf2S1PWIca5Nzo2hTshI+mQ0GDj3Wvzx+WUCnQFTAg52yA4hq9kpKJuGe5C7jK+1gJbXKKAfkpKd+I7w9sqqzLJJtOTAIJjQn4WWNScaeQ86iR7mci5yCbEzz4XYUMRb2naEMR1zlao7p1fZZhbTC9GYjTPojOhFrKLNvTmn4Vt0H8yr5LhbV+tEjiVv+KCoydsN5T1iIoNAF4e3MBItk0/vLBmfIzJrLPONeGYqxJDB0475rwISUghjo9iNRXhPEd71TVCjUAQBXeRmmo+6/W07lGd1xsjGY8tDk7Kn4PmFi1SDamq4gWlMve3WHOXGASvmjBa5V+OeF9NImJBe6Q66UmO7G5BKYvjrXFo/riXoW6mPL91IyvRgIBCYPTRLtKvUPH/zVPi08Q3SGK
 RIQAXk5a
 4f4bUYmKtjDPntlLxjQ7/0YZ8fGPmzhqP6lep5arxzRUFqoB+pJ2K6rV6C67+ogWVEWdESCoAWwLskLi+hV2br1mVKIqESsnhj0T5bTh3ieKtFHv0kRDaKaA0Gi+FA94pxGUXzYQK7+i9iwf2L1lML6z0TLEMSaYYrjmsuZRJqhaWOs1dnvUmpuhvdsYa9NZNr5C9V+4eiBE9VlFqdsED8Khtf6+ch4LfGmx2AWxzs5l5RMJxrfHEXEmEVDNAoLg41O1i22SrPtNC41oid4G63r+NJf2dI6rECpl5wgfksHo4z0VcN4Ti9XxQ/bQSgKn69q1RmA9pyep1BdbQVPKYXc254Q==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Tue, May 20, 2025 at 7:34=E2=80=AFAM Fuad Tabba <tabba@google.com> wrote=
:
>
> Hi Vishal,
>
> On Tue, 20 May 2025 at 15:11, Vishal Annapurve <vannapurve@google.com> wr=
ote:
> >
> > On Tue, May 20, 2025 at 6:44=E2=80=AFAM Fuad Tabba <tabba@google.com> w=
rote:
> > >
> > > Hi Vishal,
> > >
> > > On Tue, 20 May 2025 at 14:02, Vishal Annapurve <vannapurve@google.com=
> wrote:
> > > >
> > > > On Tue, May 20, 2025 at 2:23=E2=80=AFAM Fuad Tabba <tabba@google.co=
m> wrote:
> > > > >
> > > > > Hi Ackerley,
> > > > >
> > > > > On Thu, 15 May 2025 at 00:43, Ackerley Tng <ackerleytng@google.co=
m> wrote:
> > > > > >
> > > > > > The two new guest_memfd ioctls KVM_GMEM_CONVERT_SHARED and
> > > > > > KVM_GMEM_CONVERT_PRIVATE convert the requested memory ranges to=
 shared
> > > > > > and private respectively.
> > > > >
> > > > > I have a high level question about this particular patch and this
> > > > > approach for conversion: why do we need IOCTLs to manage conversi=
on
> > > > > between private and shared?
> > > > >
> > > > > In the presentations I gave at LPC [1, 2], and in my latest patch
> > > > > series that performs in-place conversion [3] and the associated (=
by
> > > > > now outdated) state diagram [4], I didn't see the need to have a
> > > > > userspace-facing interface to manage that. KVM has all the inform=
ation
> > > > > it needs to handle conversions, which are triggered by the guest.=
 To
> > > > > me this seems like it adds additional complexity, as well as a us=
er
> > > > > facing interface that we would need to maintain.
> > > > >
> > > > > There are various ways we could handle conversion without explici=
t
> > > > > interference from userspace. What I had in mind is the following =
(as
> > > > > an example, details can vary according to VM type). I will use us=
e the
> > > > > case of conversion from shared to private because that is the mor=
e
> > > > > complicated (interesting) case:
> > > > >
> > > > > - Guest issues a hypercall to request that a shared folio become =
private.
> > > > >
> > > > > - The hypervisor receives the call, and passes it to KVM.
> > > > >
> > > > > - KVM unmaps the folio from the guest stage-2 (EPT I think in x86
> > > > > parlance), and unmaps it from the host. The host however, could s=
till
> > > > > have references (e.g., GUP).
> > > > >
> > > > > - KVM exits to the host (hypervisor call exit), with the informat=
ion
> > > > > that the folio has been unshared from it.
> > > > >
> > > > > - A well behaving host would now get rid of all of its references
> > > > > (e.g., release GUPs), perform a VCPU run, and the guest continues
> > > > > running as normal. I expect this to be the common case.
> > > > >
> > > > > But to handle the more interesting situation, let's say that the =
host
> > > > > doesn't do it immediately, and for some reason it holds on to som=
e
> > > > > references to that folio.
> > > > >
> > > > > - Even if that's the case, the guest can still run *. If the gues=
t
> > > > > tries to access the folio, KVM detects that access when it tries =
to
> > > > > fault it into the guest, sees that the host still has references =
to
> > > > > that folio, and exits back to the host with a memory fault exit. =
At
> > > > > this point, the VCPU that has tried to fault in that particular f=
olio
> > > > > cannot continue running as long as it cannot fault in that folio.
> > > >
> > > > Are you talking about the following scheme?
> > > > 1) guest_memfd checks shareability on each get pfn and if there is =
a
> > > > mismatch exit to the host.
> > >
> > > I think we are not really on the same page here (no pun intended :) )=
.
> > > I'll try to answer your questions anyway...
> > >
> > > Which get_pfn? Are you referring to get_pfn when faulting the page
> > > into the guest or into the host?
> >
> > I am referring to guest fault handling in KVM.
> >
> > >
> > > > 2) host user space has to guess whether it's a pending refcount or
> > > > whether it's an actual mismatch.
> > >
> > > No need to guess. VCPU run will let it know exactly why it's exiting.
> > >
> > > > 3) guest_memfd will maintain a third state
> > > > "pending_private_conversion" or equivalent which will transition to
> > > > private upon the last refcount drop of each page.
> > > >
> > > > If conversion is triggered by userspace (in case of pKVM, it will b=
e
> > > > triggered from within the KVM (?)):
> > >
> > > Why would conversion be triggered by userspace? As far as I know, it'=
s
> > > the guest that triggers the conversion.
> > >
> > > > * Conversion will just fail if there are extra refcounts and usersp=
ace
> > > > can try to get rid of extra refcounts on the range while it has eno=
ugh
> > > > context without hitting any ambiguity with memory fault exit.
> > > > * guest_memfd will not have to deal with this extra state from 3 ab=
ove
> > > > and overall guest_memfd conversion handling becomes relatively
> > > > simpler.
> > >
> > > That's not really related. The extra state isn't necessary any more
> > > once we agreed in the previous discussion that we will retry instead.
> >
> > Who is *we* here? Which entity will retry conversion?
>
> Userspace will re-attempt the VCPU run.

Then KVM will have to keep track of the ranges that need conversion
across exits. I think it's cleaner to let userspace make the decision
and invoke conversion without carrying additional state in KVM about
guest request.

>
> > >
> > > > Note that for x86 CoCo cases, memory conversion is already triggere=
d
> > > > by userspace using KVM ioctl, this series is proposing to use
> > > > guest_memfd ioctl to do the same.
> > >
> > > The reason why for x86 CoCo cases conversion is already triggered by
> > > userspace using KVM ioctl is that it has to, since shared memory and
> > > private memory are two separate pages, and userspace needs to manage
> > > that. Sharing memory in place removes the need for that.
> >
> > Userspace still needs to clean up memory usage before conversion is
> > successful. e.g. remove IOMMU mappings for shared to private
> > conversion. I would think that memory conversion should not succeed
> > before all existing users let go of the guest_memfd pages for the
> > range being converted.
>
> Yes. Userspace will know that it needs to do that on the VCPU exit,
> which informs it of the guest's hypervisor request to unshare (convert
> from shared to private) the page.
>
> > In x86 CoCo usecases, userspace can also decide to not allow
> > conversion for scenarios where ranges are still under active use by
> > the host and guest is erroneously trying to take away memory. Both
> > SNP/TDX spec allow failure of conversion due to in use memory.
>
> How can the guest erroneously try to take away memory? If the guest
> sends a hypervisor request asking for a conversion of memory that
> doesn't belong to it, then I would expect the hypervisor to prevent
> that.

Making a range as private is effectively disallowing host from
accessing those ranges -> so taking away memory.

>
> I don't see how having an IOCTL to trigger the conversion is needed to
> allow conversion failure. How is that different from userspace
> ignoring or delaying releasing all references it has for the
> conversion request?
>
> > >
> > > This series isn't using the same ioctl, it's introducing new ones to
> > > perform a task that as far as I can tell so far, KVM can handle by
> > > itself.
> >
> > I would like to understand this better. How will KVM handle the
> > conversion process for guest_memfd pages? Can you help walk an example
> > sequence for shared to private conversion specifically around
> > guest_memfd offset states?
>
> To make sure that we are discussing the same scenario: can you do the
> same as well please --- walk me through an example sequence for shared
> to private conversion specifically around guest_memfd offset states
> With the IOCTLs involved?
>
> Here is an example that I have implemented and tested with pKVM. Note
> that there are alternatives, the flow below is architecture or even
> vm-type dependent. None of this code is code KVM code and the
> behaviour could vary.
>
>
> Assuming the folio is shared with the host:
>
> Guest sends unshare hypercall to the hypervisor
> Hypervisor forwards request to KVM (gmem) (having done due diligence)
> KVM (gmem) performs an unmap_folio(), exits to userspace with

For x86 CoCo VM usecases I was talking about, userspace would like to
avoid unmap_mapping_range() on the range before it's safe to unshare
the range.

> KVM_EXIT_UNSHARE and all the information about the folio being
> unshared
>
> Case 1:
> Userspace removes any remaining references (GUPs, IOMMU Mappings etc...)
> Userspace calls vcpu_run(): KVM (gmem) sees that there aren't any
> references, sets state to PRIVATE
>
> Case 2 (alternative 1):
> Userspace doesn't release its references
> Userspace calls vcpu_run(): KVM (gmem) sees that there are still
> references, exits back to userspace with KVM_EXIT_UNSHARE
>
> Case 2 (alternative 2):
> Userspace doesn't release its references
> Userspace calls vcpu_run(): KVM (gmem) sees that there are still
> references, unmaps folio from guest, but allows it to run (until it
> tries to fault in the folio)
> Guest tries to fault in folio that still has reference, KVM does not
> allow that (it sees that the folio is shared, and it doesn't fault in
> shared folios to confidential guests)
> KVM exits back to userspace with KVM_EXIT_UNSHARE
>
> As I mentioned, the alternatives above are _not_ set in core KVM code.
> They can vary by architecture of VM type, depending on the policy,
> support, etc..
>
> Now for your example please on how this would work with IOCTLs :)
>
> Thanks,
> /fuad
>
> > >
> > > >  - Allows not having to keep track of separate shared/private range
> > > > information in KVM.
> > >
> > > This patch series is already tracking shared/private range informatio=
n in KVM.
> > >
> > > >  - Simpler handling of the conversion process done per guest_memfd
> > > > rather than for full range.
> > > >      - Userspace can handle the rollback as needed, simplifying err=
or
> > > > handling in guest_memfd.
> > > >  - guest_memfd is single source of truth and notifies the users of
> > > > shareability change.
> > > >      - e.g. IOMMU, userspace, KVM MMU all can be registered for
> > > > getting notifications from guest_memfd directly and will get notifi=
ed
> > > > for invalidation upon shareability attribute updates.
> > >
> > > All of these can still be done without introducing a new ioctl.
> > >
> > > Cheers,
> > > /fuad