From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81929C8302D for ; Mon, 30 Jun 2025 14:14:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 001496B00AD; Mon, 30 Jun 2025 10:14:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECD166B00B5; Mon, 30 Jun 2025 10:14:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6DAD6B00CF; Mon, 30 Jun 2025 10:14:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id C152B6B00AD for ; Mon, 30 Jun 2025 10:14:24 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4F43216055A for ; Mon, 30 Jun 2025 14:14:24 +0000 (UTC) X-FDA: 83612262048.11.7BD4DDF Received: from mail-pl1-f182.google.com (mail-pl1-f182.google.com [209.85.214.182]) by imf05.hostedemail.com (Postfix) with ESMTP id 6C510100013 for ; Mon, 30 Jun 2025 14:14:22 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SXpbqe3O; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751292862; a=rsa-sha256; cv=none; b=NcmhemBbO4NRH0EqhDhjAb6+I8FR7s3CxZhp7gNQN7g3sKwiBGnCeXI4iW5qQz/Yasi/C9 LXim5FfWStC17ZRIgy+UAk3k2DSzHXYCuKJCc/nLmu3B5/qhTbpUreuSRTFebS8W95k4dT bM9Satg6Y/b39feYt2Fp/uadSr1nHa8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=SXpbqe3O; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of vannapurve@google.com designates 209.85.214.182 as permitted sender) smtp.mailfrom=vannapurve@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751292862; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l3YhyQhhzhBw+UKvE/Ww6oXAxKgXpPIyifb0WjxIRJY=; b=Aqy2BKsm3mJ/pMQr53udFdD5CX94w3b/TKSNn/1FaYxQWez8ue+Z36mHIya8hLMhRlTVFH uD7dQcnvYhGKHwODTTXxII4UJENfKG7E9DBveltm+sIecHRfst0wlleJ3HUPxklORDFFAb RMQPn0n9m0BQM+bWnM8GxesNTgN3vNE= Received: by mail-pl1-f182.google.com with SMTP id d9443c01a7336-237f18108d2so271185ad.0 for ; Mon, 30 Jun 2025 07:14:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751292861; x=1751897661; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=l3YhyQhhzhBw+UKvE/Ww6oXAxKgXpPIyifb0WjxIRJY=; b=SXpbqe3O3wLP2qVw1S/nl6jJlNisNrWSgzmWuO7a259rpQXSKOmoKcrnH02rD9QstS jrKiQ8u1cDcuXQ0I4QDy0Ii+GBmJi4055dvUpD6p0LYmQ1zopHysiQuyA4VNt6Wmejoe PyhxZ+Xgh8VwAl5oaTr2FC9iJD2rcDzSZ4VTshMRGK0/T5AM4OCuo0oCBMG6a9+RU67x /dBDdiAZtGFKBZ3wlorsrANBoYwqTjrQuhcIh8JjMc4KHHbMOPf7/UFr5Ww9FktMkxyJ Rnjp3ceS30Bfqqdf6ZmQ/QCDIqIvm7QD6IlrkOX9XBAi0z5/7jWEXgGcATd/4Y8U/zox SjYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751292861; x=1751897661; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l3YhyQhhzhBw+UKvE/Ww6oXAxKgXpPIyifb0WjxIRJY=; b=D0ucOe2yuWUu1N9oAdjQqxaPvDPRxQc+S73WQpfeyVhTF+WgonHlyQUJB77R4TWnbl bCIC2Bf5z6Zf6LHfMakBJ5R12Kwjg930o2lA0TlkPAO85qUsQdhU4Gf+i/Ih9zXqqIj0 StSXevOEIgIYHJZ6TyJHw0fkgx6rE8au0WcWx5Qf2mGsWgiu8+ccmqhMRD1xbvqCzZRT hwZvmzNfTptlJbvl5hHu7fVnfy1NMMPsEzbAFYurA4jAxiLSy601jgur01uLh5jeodTi IYcu6Cca6u2yu65OHrlBehPuLcly9yd0l9op91B/LWaFVO5vuTRNPBBOPH8m5fZs2jFx HPcw== X-Forwarded-Encrypted: i=1; AJvYcCUEyIEk03sPTFSrbRit6McsOuRZpaItp88/fUClULrOcJjrHPHvSd4mi7FpwNuTBwXwl3mgaakQ1g==@kvack.org X-Gm-Message-State: AOJu0Yyky+co4KONEBtEbPWBVBYxaFtVuH3shfROtMAMOS1rjJkrkNaj ZYsgGltDuoJtkxyw0ZA2wMgelr1/GlNPZ9L2q4ypVKdrbdtP8YkhIKvrU3g/gVekos5rJ0/YGyV 9WILs0KsJumTizkpNXdPBFZcW3zifg1GoyWIIjyxh X-Gm-Gg: ASbGncuqa9/iucE9o9JwHWylTprPSS15KVHzhMLnwskz4eL7s2B/2KEAIaXhbu72jAa 2WU2XPH9YCuKkxwxDi5hM+W9hVGwGE4kt60IJmrajFI3sjpwE7G8OA1SVBMskB0j/B9nYHNkP/h rtzjejes2jp+NkkLqEuHx/7/S8pdtyQX/c6Nk8U4V0zFVGhrr6CnygTRCqgpZDNv41LDun4ENLT Mfg X-Google-Smtp-Source: AGHT+IEG+yYQRShX2eCBsh+gkX2OuX0sYXUf8ANfIRJdlFT/hUpfbXpiEF5PxwlidvJtH87uoHpmmSkgeA4elew1Y58= X-Received: by 2002:a17:902:d588:b0:234:afcf:d9de with SMTP id d9443c01a7336-23ae9f7b05bmr4120325ad.29.1751292860689; Mon, 30 Jun 2025 07:14:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vishal Annapurve Date: Mon, 30 Jun 2025 07:14:07 -0700 X-Gm-Features: Ac12FXwfnKqMFtjiZJXXQ5NYPM9dcCUmxbZVI12HM5E-BFhANicX2DD6O3kaYs8 Message-ID: Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd To: Yan Zhao Cc: Xiaoyao Li , Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 4yb4ooy6aj3tz4e5sqciokzo5gwua1hw X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6C510100013 X-Rspam-User: X-HE-Tag: 1751292862-119000 X-HE-Meta: U2FsdGVkX1+4i+e5vGZ2yXubJYe/FJcKsn9LCeU9lC8yLzwdRWD40qkrFnx2VOFtjKoBmmK3hPS1dl8H12DtZyucPjBtJ398vVE2X3CWuYnSx0gt3u/4mIKta4af8jsJuFSzffrXkN/iW/YDiGvkBr4hMbul76o0bUGdX/BgTcnFl8jNiY+MLlYV+Cs1nHVPkdMgr+e8AT1Z7ZKnvCa4aULUTXLNmZW8aIY/nabNWBsYZizdK4JWzR9zLRNrMLK0dPMTIKFlCMZlaQtZPxhGfp5wvVnOAen1vLMTBksqWPSJXq36+d/mru5IKeZfGgLBNmzjOD/Z9ic+5dJ7oiPkNr5S4ejo65z+OYTZMAlTrwSW7Rh0u0/d14X9nLVIm2lypRUXhwwApebNrhALAFUtNEF71ZUEHAzfbnyHE3yYD4LL+zk4gFZKULNvWZFN4I/U64IWflIriI619s7Rl6x1AnppH4HVoerhSrzMhIPNrtQ6ufuDrf3j6GPwJe5MPUcl8x0KDZGMZbfTeEwUR3O98fgmfciBLIDIIwElyNdaUZe+hJvpV6mRPNSDfiBDKisl9Bpyjtgo5JJHawTXcls5GdNAVALI/9/cIFF6kscJKgZjPoMgNM2Z7MtowqQB+VqKCHL/u6v9NkpzIQGt/MGIbdvt3FSQSE5GwDtdr7jWej+UMkIHDYqvVj/4f2C1NYeFgq9Mr5sCDZvgXZuJnUI3tOoIZv3F9Sk+RTSJWA+3r0EX6QRR6E3oIAxXxVF3+gbu0ym/f7qhNdxxNRLvpSILZDrsZFKVZHRyzAAKK1OgXXbx6pmJBkOL9nQfY37efpPGhyXhCgyNh587xHBXc9LKdZ07XG4DzQE1aq4qXV3gzi91e8OvRN/fKss+4SG0FEmEh0LNXJbceJN6/nZW8kD4R2sAjmFNabjCo9n+ikzOSA4LC2VdflFxyliEDF5UoyBA66CuWSFnNOjyGuMvQDy 5NJp8qcc CsmQRggSQ4q8cBSrJ02bOq16R+8+nec6GpgvXwBcniNGuyG4fo2BNmqWaFS5EkB8AcXGCMUs4EdpoHAFh+Q62Pw025bDTx/KNua+YagDXsMkbf/nP6ESFB/1ledUFZVbwS93REMzKmgox4I/pyZJIzvnQV+nDzdkflfdBG4WiLBtSq7fxtXhvrVx4IgNM/TDZ/u5qWdqIppnYkkwmSM/qKcF1fAVABlhU+7VmXjdupfqt06c1SRgF00h2IR8kk3zQVC3MlK74pwU9MDIHBXBXCntpyk4+sHgivK5xLWJsxFkW1IWYcxyHvVfGE3UzmsD/OLyHJee9FCS4+vbyt3hpWzomzD/tcx4jH5uYfipcNJcJX+8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jun 29, 2025 at 8:17=E2=80=AFPM Yan Zhao wro= te: > > On Sun, Jun 29, 2025 at 11:28:22AM -0700, Vishal Annapurve wrote: > > On Thu, Jun 19, 2025 at 1:59=E2=80=AFAM Xiaoyao Li wrote: > > > > > > On 6/19/2025 4:13 PM, Yan Zhao wrote: > > > > On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote: > > > >> Hello, > > > >> > > > >> This patchset builds upon discussion at LPC 2024 and many guest_me= mfd > > > >> upstream calls to provide 1G page support for guest_memfd by takin= g > > > >> pages from HugeTLB. > > > >> > > > >> This patchset is based on Linux v6.15-rc6, and requires the mmap s= upport > > > >> for guest_memfd patchset (Thanks Fuad!) [1]. > > > >> > > > >> For ease of testing, this series is also available, stitched toget= her, > > > >> at https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-= support-rfc-v2 > > > > > > > > Just to record a found issue -- not one that must be fixed. > > > > > > > > In TDX, the initial memory region is added as private memory during= TD's build > > > > time, with its initial content copied from source pages in shared m= emory. > > > > The copy operation requires simultaneous access to both shared sour= ce memory > > > > and private target memory. > > > > > > > > Therefore, userspace cannot store the initial content in shared mem= ory at the > > > > mmap-ed VA of a guest_memfd that performs in-place conversion betwe= en shared and > > > > private memory. This is because the guest_memfd will first unmap a = PFN in shared > > > > page tables and then check for any extra refcount held for the shar= ed PFN before > > > > converting it to private. > > > > > > I have an idea. > > > > > > If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place > > > conversion unmap the PFN in shared page tables while keeping the cont= ent > > > of the page unchanged, right? > > > > That's correct. > > > > > > > > So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private mem= ory > > > actually for non-CoCo case actually, that userspace first mmap() it a= nd > > > ensure it's shared and writes the initial content to it, after it > > > userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE. > > > > I think you mean pKVM by non-coco VMs that care about private memory. > > Yes, initial memory regions can start as shared which userspace can > > populate and then convert the ranges to private. > > > > > > > > For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if i= t > > > wants the private memory to be initialized with initial content, and > > > just do in-place TDH.PAGE.ADD in the hook. > > > > I think this scheme will be cleaner: > > 1) Userspace marks the guest_memfd ranges corresponding to initial > > payload as shared. > > 2) Userspace mmaps and populates the ranges. > > 3) Userspace converts those guest_memfd ranges to private. > > 4) For both SNP and TDX, userspace continues to invoke corresponding > > initial payload preparation operations via existing KVM ioctls e.g. > > KVM_SEV_SNP_LAUNCH_UPDATE/KVM_TDX_INIT_MEM_REGION. > > - SNP/TDX KVM logic fetches the right pfns for the target gfns > > using the normal paths supported by KVM and passes those pfns directly > > to the right trusted module to initialize the "encrypted" memory > > contents. > > - Avoiding any GUP or memcpy from source addresses. > One caveat: > > when TDX populates the mirror root, kvm_gmem_get_pfn() is invoked. > Then kvm_gmem_prepare_folio() is further invoked to zero the folio. Given that confidential VMs have their own way of initializing private memory, I think zeroing makes sense for only shared memory ranges. i.e. something like below: 1) Don't zero at allocation time. 2) If faulting in a shared page and its not uptodate, then zero the page and set the page as uptodate. 3) Clear uptodate flag on private to shared conversion. 4) For faults on private ranges, don't zero the memory. There might be some other considerations here e.g. pKVM needs non-destructive conversion operation, which might need a way to enable zeroing at allocation time only. On a TDX specific note, IIUC, KVM TDX logic doesn't need to clear pages on future platforms [1]. [1] https://lore.kernel.org/lkml/6de76911-5007-4170-bf74-e1d045c68465@intel= .com/ > > > i.e. for TDX VMs, KVM_TDX_INIT_MEM_REGION still does the in-place TDH.P= AGE.ADD. > So, upon here, the pages should not contain the original content? > Pages should contain the original content. Michael is already experimenting with similar logic [2] for SNP. [2] https://lore.kernel.org/lkml/20250613005400.3694904-6-michael.roth@amd.= com/