From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5BBCC7115A for ; Thu, 19 Jun 2025 09:50:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 413986B00AB; Thu, 19 Jun 2025 05:50:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EB466B00AF; Thu, 19 Jun 2025 05:50:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 327D76B00B3; Thu, 19 Jun 2025 05:50:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 271BC6B00AB for ; Thu, 19 Jun 2025 05:50:08 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7EE40101794 for ; Thu, 19 Jun 2025 09:50:07 +0000 (UTC) X-FDA: 83571679254.17.62BC468 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by imf06.hostedemail.com (Postfix) with ESMTP id 3415718000D for ; Thu, 19 Jun 2025 09:50:04 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=F1xucCuJ; spf=pass (imf06.hostedemail.com: domain of xiaoyao.li@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=xiaoyao.li@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750326605; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=weLSMpQfGEGDTt5YdfABC7hsd78kBUPLEgQREydZZQ4=; b=7qQdbYbbZpqnMA+twoR9dAwmxI2Ck5yvJ4yBOnGP7lOi7tndAFZKLIKgnWYWTlkiXgyR2I IA7b9XNjehyVp4kjnz1ffpic6RI5Ru84Nq3gXf7I+o1Zb89J4iQuhy1EMh55+BPXnEkQ/D Xa35/oHKQnuM3vVjG1GCP3gTfiecHZk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=F1xucCuJ; spf=pass (imf06.hostedemail.com: domain of xiaoyao.li@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=xiaoyao.li@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750326605; a=rsa-sha256; cv=none; b=rbl18vCalA5W3sfQUh3UF68MWkj2zSMMPuRm0aYSQUH3wQ4mhlNTKqq+gxl1Jj2BnorQ40 jZ57WxP8UB9n90U4keTAg1Qq2NvWPpguSl8YxGPWKXGvuGOuD73eixxmWMjnYAN7sBUtrj 5cY02+Co1RjcMlU1hPXdB9Xi7VPrsQU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1750326605; x=1781862605; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=0vDuxYtZQxMHLMyV7Fv6kyC0eDbvjTX2Uthi/5EmmbE=; b=F1xucCuJG9z4H2IqIIoQ9zJicpN8WffjQXCOjHuXUlr+Wcff/5/BF73O CeVDnmiEKvBcq4GHKhw8ahCgJU65lB+/sbXluayDovLjVOZgLGVaewF+4 aJZNL3cCmP2ZGENsG5IyP65waXGLTCW3P6V0byiTUbGUZoK8kY6NJngjs agLrOIGjplf3RfazCj0KAyzmCF9kQHguL9Y1GV1WavW2RFcbiPu73KTHh BEB7WJHs/l6PhxTEnr/bTjXeAhAbWNIfW+4UMvo1Yxu+hNasoMuxJmEZG HsB9c7bwZyT9dBvZWHi4gE+ujakZgnd/ZzggpvJZlZjlIT5JNDNEJqnqz Q==; X-CSE-ConnectionGUID: GQuAWBY2Qm2jAU327A3FRA== X-CSE-MsgGUID: 8j/BzQ4mSNudg3kfkQjodA== X-IronPort-AV: E=McAfee;i="6800,10657,11468"; a="62845808" X-IronPort-AV: E=Sophos;i="6.16,248,1744095600"; d="scan'208";a="62845808" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2025 02:50:04 -0700 X-CSE-ConnectionGUID: 5YNH3rbDRX+bpMvim0FPzw== X-CSE-MsgGUID: 0ElUZ+sbQVuiATdt8kiskQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,248,1744095600"; d="scan'208";a="150153819" Received: from xiaoyaol-hp-g830.ccr.corp.intel.com (HELO [10.124.247.1]) ([10.124.247.1]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2025 02:49:43 -0700 Message-ID: Date: Thu, 19 Jun 2025 17:49:40 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd From: Xiaoyao Li To: Yan Zhao Cc: Ackerley Tng , kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com References: <9b55acfa-688e-49da-9599-f35aee351e3d@intel.com> <30965147-24af-4dc8-aec4-781ea401a3a9@intel.com> Content-Language: en-US In-Reply-To: <30965147-24af-4dc8-aec4-781ea401a3a9@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 3415718000D X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 6oe3mjyxyfy956ixbndekyz5fdz7san1 X-HE-Tag: 1750326604-15449 X-HE-Meta: U2FsdGVkX18OYBFrBMdjaAIZLFpuIqa8uw2RnUitP/v1YQa/4a0WRMsPfV9i9tBSHCBY6Hp/ng/CxWDArQxPkjKd9+0xTNjmjPjM64QVDjocslysIGmXMizGrREyb0bVWkpTzB2aZs65qZ7PPN3HakjikUU45MFW3ow4LcOTCUXvP2EAxzB3W9WnaCLhiQgPCnyJD5rh52PBRZjUOmd/bNHFPuyLTugq2Sh8ThIjKOA38v7Xko4pjkpqq0EyugwmTYrmDt1CgNPmouPuRbAzFFzj/8UwcFfpM8dZTPZOo+DyBEjn8nVePAwEN46gZFAty/U+er/DdjEq+f5LfgsvqzvFF6RufrUyv/ZnzD2lJ4B02pqS74GthISkQi5R4W1gt0sxoyxS1Sfk3ZvXSIh7NuL5XRqeUzG1KT3u/NS9etu5XpzbrTEdDUoolmIgK88mtFXAsww1RToiq72ovwq7MYcksSPdhFUCRKEEs/bQAwuxVFKqz9otYK/Mr23MfTfRX2BrwaTrG3iG8tfsTIrJ0mQsNmH3RPOv5bIXDG3WlLHoCBPjADTDoV2eUcswd7Gk0IbOprjAGKdY7CF0FcByoMC1dn7PdqE0Ok+wrd7522ssUbDy2hmmyk/HE6BTi3KTPQHB/WxjVVdJ98oxpq0VOKZJo1AIoo5r3l5dTSJXI8JqNEVdUn+QF7AS3tlHPXnVOIGQCsTm9uQ7Jl1QYrDnIT+uYZw+/4lKuCBNOVpCJ1DAZIzREbqyr+d7ndIr42MTWzwiS0MUmTH+nfE49GJ1ymXuOnsnhMl/Xm5Q+JSWtPI0neSyZVeZKt1XLPmAM8K0pnYm+CeCrn/Z8RNZnP3Ls5tVOc/0lgLLBmXIcZEefcTVj01OXBK5zwUGQzeADD2MrOvVVm5ga1dEyhjIWsk9S3ABsQHJuF75Mwsh4zNwBeZ1KmcBmuIozaAnRe5Jp2g3OOOL8j677B672x5CL9i 1BgjcBf9 oSFUgUcZRqmI6kc5aWZMbYrUGkZM2/mEYhSPMFD3UJzE4HVcusfdN7QJ1cL/PRdzyer7D5CTnaJ9QX9ciDZbOAsHgfCnmsUmtAt8GXz1zr4f5q26Z3y81TgOAg9osPoCdsOaqdkZnRXBj9mXLsCGRgcYtPLPYb40AKrnk46Mse5rOy7a50PjQH9bkNAGFRG4e1idUExRcU8mdbL8rHBvqUhCq0UATzksIN2bH2xtQRAhlEwJBedkrjsv6t7WGjrk3NbQcdbK4Xz8bqAqStuRyvMV9cu5Rni11gDxXE8uNX5uI9pKL5WAYfbVIaSpU7G+1iHwygAv50F9RAzfmZOYnYuxTKw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 6/19/2025 5:45 PM, Xiaoyao Li wrote: > On 6/19/2025 5:28 PM, Yan Zhao wrote: >> On Thu, Jun 19, 2025 at 05:18:44PM +0800, Xiaoyao Li wrote: >>> On 6/19/2025 4:59 PM, Xiaoyao Li wrote: >>>> On 6/19/2025 4:13 PM, Yan Zhao wrote: >>>>> On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote: >>>>>> Hello, >>>>>> >>>>>> This patchset builds upon discussion at LPC 2024 and many guest_memfd >>>>>> upstream calls to provide 1G page support for guest_memfd by taking >>>>>> pages from HugeTLB. >>>>>> >>>>>> This patchset is based on Linux v6.15-rc6, and requires the mmap >>>>>> support >>>>>> for guest_memfd patchset (Thanks Fuad!) [1]. >>>>>> >>>>>> For ease of testing, this series is also available, stitched >>>>>> together, >>>>>> at >>>>>> https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page- >>>>>> support-rfc-v2 >>>>> Just to record a found issue -- not one that must be fixed. >>>>> >>>>> In TDX, the initial memory region is added as private memory during >>>>> TD's build >>>>> time, with its initial content copied from source pages in shared >>>>> memory. >>>>> The copy operation requires simultaneous access to both shared >>>>> source memory >>>>> and private target memory. >>>>> >>>>> Therefore, userspace cannot store the initial content in shared >>>>> memory at the >>>>> mmap-ed VA of a guest_memfd that performs in-place conversion >>>>> between shared and >>>>> private memory. This is because the guest_memfd will first unmap a >>>>> PFN in shared >>>>> page tables and then check for any extra refcount held for the >>>>> shared PFN before >>>>> converting it to private. >>>> >>>> I have an idea. >>>> >>>> If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place >>>> conversion unmap the PFN in shared page tables while keeping the >>>> content >>>> of the page unchanged, right? >> However, whenever there's a GUP in TDX to get the source page, there >> will be an >> extra page refcount. > > The GUP in TDX happens after the gmem converts the page to private. May it's not GUP since the page has been unmapped from userspace? (Sorry that I'm not familiar with the terminology) > In the view of TDX, the physical page is converted to private already > and it contains the initial content. But the content is not usable for > TDX until TDX calls in-place PAGE.ADD > >>>> So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private >>>> memory >>>> actually for non-CoCo case actually, that userspace first mmap() it and >>>> ensure it's shared and writes the initial content to it, after it >>>> userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE. >> The conversion request here will be declined therefore. >> >> >>>> For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it >>>> wants the private memory to be initialized with initial content, and >>>> just do in-place TDH.PAGE.ADD in the hook. >>> >>> And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to >>> explicitly request that the page range is converted to private and the >>> content needs to be retained. So that TDX can identify which case >>> needs to >>> call in-place TDH.PAGE.ADD. >>> > >