From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9B581C54E65
	for <linux-mm@archiver.kernel.org>; Thu, 22 May 2025 16:26:44 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 1C3BA6B0089; Thu, 22 May 2025 12:26:44 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 19B026B008A; Thu, 22 May 2025 12:26:44 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 0B1586B008C; Thu, 22 May 2025 12:26:44 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id DF24C6B0089
	for <linux-mm@kvack.org>; Thu, 22 May 2025 12:26:43 -0400 (EDT)
Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 72ECC5FE9C
	for <linux-mm@kvack.org>; Thu, 22 May 2025 16:26:43 +0000 (UTC)
X-FDA: 83471072286.08.0CE4DBB
Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201])
	by imf16.hostedemail.com (Postfix) with ESMTP id A5B3D180003
	for <linux-mm@kvack.org>; Thu, 22 May 2025 16:26:41 +0000 (UTC)
Authentication-Results: imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=3vvnW5yD;
	spf=pass (imf16.hostedemail.com: domain of 3QFAvaAYKCOUZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3QFAvaAYKCOUZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1747931201;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=1/Y3MnN1CQvgNIXaw6MJZKAlF/SRi4SM/x7PuhK8ENM=;
	b=Xy9Z3dqkOEaaJYEH+sein5CoeEuqvfKaGfRKKPKuQZ6/aEsCUbZLSorsBfTk2gWjscmSin
	fZot+Eho/wCWhD504AmdB/Fvku9pifiSD+r1ZVcMwOJzdY4vGvFwcsRDOxKQYdaGLeyyzO
	seRPFKWKxtHZBsmx3RaGizUujiNocbI=
ARC-Authentication-Results: i=1;
	imf16.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=3vvnW5yD;
	spf=pass (imf16.hostedemail.com: domain of 3QFAvaAYKCOUZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=3QFAvaAYKCOUZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747931201; a=rsa-sha256;
	cv=none;
	b=ZagIWCI4Oe9jn6C3u+LlBUDPIfr5znEwJBtEdTrXiW9tNm5dRSZGdwEWOiKpGXjOjfTgMU
	K9tfkFffEgaC16GQK3OfoBcvRg6gh76vtOQtpA0o/kMIduIIXiAOG59Vz+nm9RTxLy2hVG
	r1pPO24QJj/dJvFH1cf9zR9onzLqOVg=
Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2322e8c4dc5so48425325ad.3
        for <linux-mm@kvack.org>; Thu, 22 May 2025 09:26:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747931200; x=1748536000; darn=kvack.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=1/Y3MnN1CQvgNIXaw6MJZKAlF/SRi4SM/x7PuhK8ENM=;
        b=3vvnW5yD2juJqYQOdLAwXZnRR2w0/pbVB7NjB8oldpad1l8xA6sUo9LJ6EVRhYVYfs
         BzQMabWq3KsCCVkjV9gP09C2FXZZGJx8DU16HgdNsttU24a06KdtGMvGIKqI1KSAf8xI
         V3VD8unx7kBAKqQDBo2UpND358NJfcLn+1yglgxxOm1m2yoBA0QZf4E44OT+6Mx+PSkV
         hM2ER3WrryDFHAnTKpnn7Vt5q4UuYdJad2ufBXyrmT68pBW1HE8vVpJInAwixQs49Gub
         jcujy9Wo17NXqXTUw2lbnwGdKloR4IjEKuCvol/5b9pZBNchGZtdTlduHwL3sTq23Icy
         vDuQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747931200; x=1748536000;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=1/Y3MnN1CQvgNIXaw6MJZKAlF/SRi4SM/x7PuhK8ENM=;
        b=DSBnHWO13pH8nIm/ZhCyFLUPc019xn3dOUj9UqsYAFOj2TWrDb42bvb66l8mXQmidt
         b5d1gBYy2VGcgTHG4tf1OXIigJyVawmS7TDBB+7EFIuRvLgkbLpCrK+jY4/p8sK2oSse
         EPXOo8QgUyCH3WvRjxT847RW24KsXLnEOnHz8IDPB4BnUx1kpAl/yGCDfBkLQRt79hnU
         891DyTYcHRNs7jwuVlwjiEaBVIamgwaLkCCqzxsg7X6qMSAXapS0SXN3yWkWH5MqQMBY
         ECDxpmt71wp/VrjJBEN+/fENfA/39QeNCUCyu7XZSMbIdUvCmzPPiaiS2SP8sNjcfbWu
         +IQA==
X-Forwarded-Encrypted: i=1; AJvYcCXdbi2FYxAukaSp15vvFGwxeKSsDjferVYOQJI4fLyc+nqbV8oztUbxaVKstM7ncVR8WraQ3cjamQ==@kvack.org
X-Gm-Message-State: AOJu0YxtVAH22s9ScB+W9VDejii0tKRMKNW25lUvmowZoXNjdm2mo1Py
	3znt/Orkk0LiptMmemzpTUGOzl4C23rTQCOxYcK3ZXDqWEI2UImPUkVIc7z23rx11KmeEGrFRR2
	akYdXFA==
X-Google-Smtp-Source: AGHT+IG11bJ7U64NeAncdnegyKck+686dgfizFr3LVIf6dCovZhR/ckc60gLEITPUFULGhmCVlE/JqNg0Zo=
X-Received: from plhw13.prod.google.com ([2002:a17:903:2f4d:b0:231:c831:9520])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:c951:b0:224:1af1:87f4
 with SMTP id d9443c01a7336-231d43bb822mr384251005ad.22.1747931200213; Thu, 22
 May 2025 09:26:40 -0700 (PDT)
Date: Thu, 22 May 2025 09:26:38 -0700
In-Reply-To: <CA+EHjTxjt-mb_WbtVymaBvCb1EdJAVMV_uGb4xDs_ewg4k0C4g@mail.gmail.com>
Mime-Version: 1.0
References: <CAGtprH_7jSpwF77j1GW8rjSrbtZZ2OW2iGck5=Wk67+VnF9vjQ@mail.gmail.com>
 <CA+EHjTzMhKCoftfJUuL0WUZW4DdqOHgVDcn0Cmf-0r--8rBdbg@mail.gmail.com>
 <diqzecwjnk95.fsf@ackerleytng-ctop.c.googlers.com> <CA+EHjTyY5C1QgkoAqvJ0kHM4nUvKc1e1nQ0Uq+BANtVEnZH90w@mail.gmail.com>
 <CAGtprH-fE=G923ctBAcq5zFna+2WULhmHDSbXUsZKUrin29b4g@mail.gmail.com>
 <CA+EHjTxvufYVA8LQWRKEX7zA0gWLQUHVO2LvwKc5JXVu-XAEEA@mail.gmail.com>
 <CAGtprH_TfKT3oRPCLbh-ojLGXSfOQ2XA39pVhr47gb3ikPtUkw@mail.gmail.com>
 <CA+EHjTxJZ_pb7+chRoZxvkxuib2YjbiHg=_+f4bpRt2xDFNCzQ@mail.gmail.com>
 <aC86OsU2HSFZkJP6@google.com> <CA+EHjTxjt-mb_WbtVymaBvCb1EdJAVMV_uGb4xDs_ewg4k0C4g@mail.gmail.com>
Message-ID: <aC9QPoEUw_nLHhV4@google.com>
Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce
 KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls
From: Sean Christopherson <seanjc@google.com>
To: Fuad Tabba <tabba@google.com>
Cc: Vishal Annapurve <vannapurve@google.com>, Ackerley Tng <ackerleytng@google.com>, kvm@vger.kernel.org, 
	linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, 
	linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, 
	akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, 
	anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, 
	binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, 
	chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, 
	david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, 
	erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, 
	haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, 
	isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, 
	jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, 
	jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, 
	kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, 
	kirill.shutemov@intel.com, liam.merwick@oracle.com, 
	maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, 
	mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, 
	muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, 
	oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, 
	paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, 
	peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, 
	quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, 
	quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, 
	quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, 
	rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, 
	rppt@kernel.org, shuah@kernel.org, steven.price@arm.com, 
	steven.sistare@oracle.com, suzuki.poulose@arm.com, thomas.lendacky@amd.com, 
	usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, 
	vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, 
	willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, 
	yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com
Content-Type: text/plain; charset="us-ascii"
X-Rspamd-Queue-Id: A5B3D180003
X-Stat-Signature: ekfcwkj9oo8kwmegpcupgpp68gu8hk56
X-Rspam-User: 
X-Rspamd-Server: rspam04
X-HE-Tag: 1747931201-990146
X-HE-Meta: U2FsdGVkX18WOsE6nZhA2ulhECsI+cbxYHnlFVyROSU7e5NShslC3sJKJE/HS6VaIn1HOjRwA21grwHgzhBMhpfkvram/MozahHXNuts1laERMD/8cogqgwgXfLy/7Z8p5++FrxZsrX0NS0VJQRXDNslYpm7HcYVX7V95pVmo7RhnqsehV4vtNvgtMtmQNwTExKPjiOtnWODHkJq9yNEMlgSUnJIEi7+ySgUXRiPcRQmJJCeWljM0Ge0gz/84cHE9SQHMPFodFFobqUHtNYV+tQC0hknwg9+/QCxQ6LK8kQYv0aY4EtIR9g6dWsWBCZEb4VTK86S8EbIKG6EqaG+RtsNhMDMd6fXZ9mS24I+CE9T+fZGFAY9YhMNaJddtHTLk3NsKGLjx458ATbItP2Q7IHtP6+U0Myqu/AA1Q+ONX4TJtuXtAwg3fQe5H65T1ctnYMuH/l6IEep6z7k07/xD97+g/BOXkJwvtgnatyqaVONLcoaj2kPO4oyIXbxQaJ7YHXi+wknFhIIGktF4SjkY9y3pGZ8drvgo5nzpQMIOhTG32n0Cp7UQrMd4DlZI/eLj1GMKgHHP+KZTyQoYEFXSFIIlYLgsCU7tuFi+kmqvcwBWEIw8/xyBpIf1WbNSUiRPQ8Y7l1nI4RxAvKAC/98nc4KczEnP9my5qnuMD67yWXdTPpuB7KMwwMvAWKNq0zb1QO67ioqMy74H7Fa781Je8x4PMMdWIPGjK7jDPPj9sGqwmrdgh7c5zOk6xzpKvtyTe83YrFw/6YU4lnzGVcQtMYGzzxFBqL9fTjWPWeIKJpLYnb27Iw3oYKskiNvj7/Hef/BVcTUEX+8sF6bc6Wt2t4cSC3P8Z1CLEm6DDPD25HGqgbVrX7YBDbB1YPcfqK6z8JY29rRAuworRSp0Gs/Fns7HTzx0hZofMPnFkxrM0U+/yd8E2lXGKLU7e4ni+rW/GZqY2LLnmvSGwo1QJG
 j7CND/At
 igbDyKfHGqte3FRFn5Nmjv9/E5DjBrUTHgiQe80abxmciIkjGn4L2XV60VP+qBX5KYO2D/8qUmqOd8Sj0CrXCyCQpzNraILMBAl1IQpi3VVYrjCZB7pvE2HY2ErGFloclY6LexXzbro378gA1FuQTNDYJtbuOfCARO9/h0qYGaQTso9FXZUdcNrPqrNtLToxy1Em8WDl23Jau49MMkP1NUbZHWEjrULQHN1MLVx3cl5uIYe7jXoXCZjDwUdcWU/fjgLLra6QTRkc7tsYinrciQJxMzt6A+C4Fx4vHGetY0pO7+SmPYMi4+o+0d4LhYbgc+p/0PCHc7Y+75UfC4wN/iOaiE2mDyIeNvwYneNALg0UogS4BhSCaz9aO6DQVIL4n/JdBnw1rTPr1qKQ3vcC1+b5elhFvWnqMFZpLji8Jbkwx7+hAs8YIIEvUgYNqjq6S9h1EL46FOGhb929mH4rAu87y7myYyuiYzaNqr4b2T+q0FNT1Xyh4BEJFfkA6Vaco2beCuqQQs4iT1TfActgiRD9anzg2kxePRPa2jQk8dxf8Ax2bX954i/4N342fl+CYMIgT+ceeRQW0zMYbbNWfWFiGaCQt9cBZNzpYUtSa/e7CtjlP31R3wXAZpMQMfGufdpWw4rxHgE6QLFyOUC4ETk/AheUK1lcQGU8gNRBfVj+jYoGHbHJZebo09VJ894Gr/Nfm
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Thu, May 22, 2025, Fuad Tabba wrote:
> On Thu, 22 May 2025 at 15:52, Sean Christopherson <seanjc@google.com> wrote:
> > On Wed, May 21, 2025, Fuad Tabba wrote:
> > > How does the host userspace find that out? If the host userspace is capable
> > > of finding that out, then surely KVM is also capable of finding out the same.
> >
> > Nope, not on x86.  Well, not without userspace invoking a new ioctl, which would
> > defeat the purpose of adding these ioctls.
> >
> > KVM is only responsible for emulating/virtualizing the "CPU".  The chipset, e.g.
> > the PCI config space, is fully owned by userspace.  KVM doesn't even know whether
> > or not PCI exists for the VM.  And reboot may be emulated by simply creating a
> > new KVM instance, i.e. even if KVM was somehow aware of the reboot request, the
> > change in state would happen in an entirely new struct kvm.
> >
> > That said, Vishal and Ackerley, this patch is a bit lacking on the documentation
> > front.  The changelog asserts that:
> >
> >   A guest_memfd ioctl is used because shareability is a property of the memory,
> >   and this property should be modifiable independently of the attached struct kvm
> >
> > but then follows with a very weak and IMO largely irrelevant justification of:
> >
> >   This allows shareability to be modified even if the memory is not yet bound
> >   using memslots.
> >
> > Allowing userspace to change shareability without memslots is one relatively minor
> > flow in one very specific use case.
> >
> > The real justification for these ioctls is that fundamentally, shareability for
> > in-place conversions is a property of a guest_memfd instance and not a struct kvm
> > instance, and so needs to owned by guest_memfd.
> 
> Thanks for the clarification Sean. I have a couple of followup
> questions/comments that you might be able to help with:
> 
> From a conceptual point of view, I understand that the in-place conversion is
> a property of guest_memfd. But that doesn't necessarily mean that the
> interface between kvm <-> guest_memfd is a userspace IOCTL.

kvm and guest_memfd aren't the communication endpoints for in-place conversions,
and more importantly, kvm isn't part of the control plane.  kvm's primary role
(for guest_memfd with in-place conversions) is to manage the page tables to map
memory into the guest.

kvm *may* also explicitly provide a communication channel between the guest and
host, e.g. when conversions are initiated via hypercalls, but in some cases the
communication channel may be created through pre-existing mechanisms, e.g. a
shared memory buffer or emulated I/O (such as the PCI reset case).

  guest => kvm (dumb pipe) => userspace => guest_memfd => kvm (invalidate)

And in other cases, kvm might not be in that part of the picture at all, e.g. if
the userspace VMM provides an interface to the VM owner (which could also be the
user running the VM) to reset the VM, then the flow would look like:

  userspace => guest_memfd => kvm (invalidate)

A decent comparison is vCPUs.  KVM _could_ route all ioctls through the VM, but
that's unpleasant for all parties, as it'd be cumbersome for userspace, and
unnecessarily complex and messy for KVM.  Similarly, routing guest_memfd state
changes through KVM_SET_MEMORY_ATTRIBUTES is awkward from both design and mechanical
perspectives.

Even if we disagree on how ugly/pretty routing conversions through kvm would be,
which I'll allow is subjective, the bigger problem is that bouncing through
KVM_SET_MEMORY_ATTRIBUTES would create an unholy mess of an ABI.

Today, KVM_SET_MEMORY_ATTRIBUTES is handled entirely within kvm, and any changes
take effect irrespective of any memslot bindings.  And that didn't happen by
chance; preserving and enforcing attribute changes independently of memslots was
a key design requirement, precisely because memslots are ephemeral to a certain
extent.

Adding support for in-place guest_memfd conversion will require new ABI, and so
will be a "breaking" change for KVM_SET_MEMORY_ATTRIBUTES no matter what.  E.g.
KVM will need to reject KVM_MEMORY_ATTRIBUTE_PRIVATE for VMs that elect to use
in-place guest_memfd conversions.  But very critically, KVM can cripsly enumerate
the lack of KVM_MEMORY_ATTRIBUTE_PRIVATE via KVM_CAP_MEMORY_ATTRIBUTES, the
behavior will be very straightforward to document (e.g. CAP X is mutually excusive
with KVM_MEMORY_ATTRIBUTE_PRIVATE), and it will be opt-in, i.e. won't truly be a
breaking change.

If/when we move shareability to guest_memfd, routing state changes through
KVM_SET_MEMORY_ATTRIBUTES will gain a subtle dependency on userspace having to
create memslots in order for state changes to take effect.  That wrinkle would be
weird and annoying to document, e.g. "if CAP X is enabled, the ioctl ordering is
A => B => C, otherwise the ordering doesn't matter", and would create many more
conundrums:

  - If a memslot needs to exist in order for KVM_SET_MEMORY_ATTRIBUTES to take effect,
    what should happen if that memslot is deleted?
  - If a memslot isn't found, should KVM_SET_MEMORY_ATTRIBUTES fail and report
    an error, or silently do nothing?
  - If KVM_SET_MEMORY_ATTRIBUTES affects multiple memslots that are bound to
    multiple guest_memfd, how does KVM guarantee atomicity?  What happens if one
    guest_memfd conversion succeeds, but a later fails?

> We already communicate directly between the two. Other, even less related
> subsystems within the kernel also interact without going through userspace.
> Why can't we do the same here? I'm not suggesting it not be owned by
> guest_memfd, but that we communicate directly.

I'm not concerned about kvm communicating with guest_memfd, as you note it's all
KVM.  As above, my concerns are all about KVM's ABI and who owns/controls what.

> From a performance point of view, I would expect the common case to be that
> when KVM gets an unshare request from the guest, it would be able to unmap
> those pages from the (cooperative) host userspace, and return back to the
> guest. In this scenario, the host userspace wouldn't even need to be
> involved.

Hard NAK, at least from an x86 perspective.  Userspace is the sole decision maker
with respect to what memory is state of shared vs. private, full stop.  The guest
can make *requests* to convert memory, but ultimately it's host userspace that
decides whether or not to honor the request.

We've litigated this exact issue multiple times.  All state changes must be
controlled by userspace, because userspace is the only entity that can gracefully
handle exceptions and edge cases, and is the only entity with (almost) full
knowledge of the system.  We can discuss this again if necessary, but I'd much
prefer to not rehash all of those conversations.

> Having a userspace IOCTL as part of this makes that trip unnecessarily longer
> for the common case.

I'm very skeptical that an exit to userspace is going to even be measurable in
terms of the cost to convert memory.  Conversion is going to require multiple
locks, modifications to multiple sets of page tables with all the associated TLB
maintenance, possibly cache maintenance, and probably a few other things I'm
forgetting.  The cost of a few user<=>kernel transitions is likely going to be a
drop in the bucket.

If I'm wrong, and there are flows where the user<=>kernel transitions are the
long pole, then we could certainly exploring adding a way for userspace to opt
into a "fast path" conversion.  But it would need to be exactly that, an optional
fast path that can fall back to the "slow" userspace-driven conversion as needed.