From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F160CC3DA63 for ; Fri, 26 Jul 2024 06:55:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E0C86B008C; Fri, 26 Jul 2024 02:55:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 768716B0092; Fri, 26 Jul 2024 02:55:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5EEE76B0099; Fri, 26 Jul 2024 02:55:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3B59F6B008C for ; Fri, 26 Jul 2024 02:55:32 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id BEF71A5D36 for ; Fri, 26 Jul 2024 06:55:31 +0000 (UTC) X-FDA: 82380992862.12.5D910B8 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf29.hostedemail.com (Postfix) with ESMTP id A910B12001E for ; Fri, 26 Jul 2024 06:55:29 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=jT6hRhdg; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf29.hostedemail.com: domain of "prvs=9309151b5=roypat@amazon.co.uk" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=9309151b5=roypat@amazon.co.uk" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721976874; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0AdTsOthjnlgfDxLIVkwDvprncxlljTjxqJjxzx5ehw=; b=1nt2VLSkjEe9mjXYY8orR4wNi4ASi0w4mSc7T7Gpo3nTwCN84sTKTnFHoNv6EM/HaxGtnN K/jz+ACDnnFgzLlIUJV6AOiHhvwFWuuDL9pFpK+2c9xZOEWybjrn03EHqbxfpDVev9GAP4 jKI5BCZTGCKW9q5+IQBr3lZk+kYYyd8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721976874; a=rsa-sha256; cv=none; b=soJ28RioQQ6deQABXK2azDGEsHA1xzn+DtLZCSrcAVEUO3I8LZkEup9uUuIm0bqJwMyN6j d8h4b04MLwNCSrwk/qPCdkaYBCQHCF9e0ZlC+5lwkai1UZxxxyUO4Fvq9sPr/zv8vZqTLV 9vjAYDBSAAZaFOtsd2cFPbxO2HekwLw= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=jT6hRhdg; dmarc=pass (policy=quarantine) header.from=amazon.co.uk; spf=pass (imf29.hostedemail.com: domain of "prvs=9309151b5=roypat@amazon.co.uk" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=9309151b5=roypat@amazon.co.uk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1721976930; x=1753512930; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=0AdTsOthjnlgfDxLIVkwDvprncxlljTjxqJjxzx5ehw=; b=jT6hRhdg2pi/MfiCFc14u14PQ/22UCjgyUJumtWRsNP58kRMLRBC1zqF h7ZHqd43CXmyAWE+h0QyYsvScvQdTqtGn5ule6GG0jaZ7+BgCJExFuRYj r+nifGPAOffu5UwZtl/bstfrgijG/QLKWMYAL99QMDgIH0Ea5koIWIUbe s=; X-IronPort-AV: E=Sophos;i="6.09,238,1716249600"; d="scan'208";a="438564128" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jul 2024 06:55:22 +0000 Received: from EX19MTAUEB001.ant.amazon.com [10.0.0.204:50983] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.60.140:2525] with esmtp (Farcaster) id f3168678-d81e-430d-a953-37df33a02185; Fri, 26 Jul 2024 06:55:21 +0000 (UTC) X-Farcaster-Flow-ID: f3168678-d81e-430d-a953-37df33a02185 Received: from EX19D008UEC004.ant.amazon.com (10.252.135.170) by EX19MTAUEB001.ant.amazon.com (10.252.135.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Fri, 26 Jul 2024 06:55:20 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D008UEC004.ant.amazon.com (10.252.135.170) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Fri, 26 Jul 2024 06:55:20 +0000 Received: from [127.0.0.1] (172.19.88.180) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Fri, 26 Jul 2024 06:55:17 +0000 Message-ID: <7e175521-38bb-49f0-b1fb-8820f8708c9c@amazon.co.uk> Date: Fri, 26 Jul 2024 07:55:16 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map To: "Vlastimil Babka (SUSE)" , , , , , , CC: , , , , , , , , , , , , , , , , References: <20240709132041.3625501-1-roypat@amazon.co.uk> From: Patrick Roy Content-Language: en-US Autocrypt: addr=roypat@amazon.co.uk; keydata= xjMEY0UgYhYJKwYBBAHaRw8BAQdA7lj+ADr5b96qBcdINFVJSOg8RGtKthL5x77F2ABMh4PN NVBhdHJpY2sgUm95IChHaXRodWIga2V5IGFtYXpvbikgPHJveXBhdEBhbWF6b24uY28udWs+ wpMEExYKADsWIQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbAwULCQgHAgIiAgYVCgkI CwIEFgIDAQIeBwIXgAAKCRBVg4tqeAbEAmQKAQC1jMl/KT9pQHEdALF7SA1iJ9tpA5ppl1J9 AOIP7Nr9SwD/fvIWkq0QDnq69eK7HqW14CA7AToCF6NBqZ8r7ksi+QLOOARjRSBiEgorBgEE AZdVAQUBAQdAqoMhGmiXJ3DMGeXrlaDA+v/aF/ah7ARbFV4ukHyz+CkDAQgHwngEGBYKACAW IQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbDAAKCRBVg4tqeAbEAtjHAQDkh5jZRIsZ 7JMNkPMSCd5PuSy0/Gdx8LGgsxxPMZwePgEAn5Tnh4fVbf00esnoK588bYQgJBioXtuXhtom 8hlxFQM= In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A910B12001E X-Stat-Signature: ueg5rr4acfsibbpt1i9d38ywzd8ts9nd X-Rspam-User: X-HE-Tag: 1721976929-689310 X-HE-Meta: U2FsdGVkX19qu0kb4F+tBUnd6fRKl93kD02lGxwMzPcC207SmPqADK9sUfche8rR50NPAvzoiqrP1vwGWnB4sdIAN8XztWyLziGiLrKVl6HWRfcs1oP71LrCkAjM/Y2jYCk/L9IO5CzAU7EHzbI0bFvYWzYBnbEmIhmuyrkqMdX0B3Zwh8JH6ra9OOxVCHMnxRy6WoZK/IyEXQEVDURaMo20PWBbUrhpdkhR51aOSABbXZwDkZ6nTL9ATLh8D//bfUP1KGUlVWKK+kLzpqsFhCPcf91rX/TqufdQMtbWj4Cf5UkkyyP1zsfQUK/DoC7Ez99vsdjMe8t+hwNmLA4oXvIibIQqnHW1jD6XOJ9rWj0I8GlrD0QmJt76Dfei7qkQwfG/iG42NrUlwbu3r03egHLccUb7D0/zmYwINOCzXy0UFYhsKSsruP/O9p8Rhi+H40nYNnoJCebSuA5tjQ+5FN4EMhEu2kF62RjvRFXSb6EwEpqqjD5hHCK3Q+7Mtc91mMfuKEbLbu9MTXEBWYS6PR6Lqna+yYcFs8+jgyjUKZ/CoqTisLDBX+Vdvtukby1EzhTE3pjnT1RgEPaHcKHZGuTet9ASxCUWUznX/zbgkkGORdZhVRLigIS7gA18n3lOGBU2YhvIzQLSumPEUS2qBjwaLSlSBKE+PECxFAuYvd0ZYMwR1Dv0p1yoVTbyg/4/1RlO6wptDEAN9oItxTdCkk2mJNsym6uRCDksWmvdleuEJUOgg2O9QxH1YWBRe2KvydtF2dYtSowxEnvVFcM2Trz4s0pHXvFc9ma1v2grPr2givVslrGK9bz7vnXkZrSTA8c/gsDA7oxjrzTdb/EnQxtI13ZN4yb+H6oCAj0g1h+VwsTSLITmt1v6w/ZkCi9j0vOrQX4jUyNjXZtmz7AQfRAFH4jXQw1Y9MRbXap2upmH5nlC50cqVb5N3wL49ZnHQsqzGVhJzJdcNuNaJjU kYtnZPFP q/VK3DvYcDlIY7+xSrg+l6lKLsryJhQ7s46maA7fecnFS26FP5p/nDTKptq/djQoRMTMTe44fOcYtQl8oNLcXPDAhUn9pQYgY7uL8J9sKV5lsK/emSelJ6gkG7jQHn0Cw3XOU2emcP6Z3CSpVYyIwxsVZ1hdzUTXT3gbJQT1FxL+7pLhmigbyCOdKKgabuJB8eH3Hf0MB3GAPYKFY79BeC0CMXah07ecnsXKieoJKbO1b3dDh18VGLipCrCe3kJe8t/2IB84LRfXE05Plqbv2V6ADJ7b4RriOJnYwA+5BWGS/WVZRZ01MWcItMOobXMEFN+QEnm+MOXj6Q9TaRs5xYHGJ8laVzzYlafW4uNkN37VooptIujmi+xKqLK31nX0qXgHdRxhgqp62U5SfKhEcgNEO2/CO1BiWSI0NgxcThedz/8sWj5DFUW3ffb2FksF4/2Eq7gcRPSwCKweU1wT9tteN2732BSssvskwoeKL/o967F5fGwlqM/U4mVbCaLV0Hkfkjvd0h0DoiebiczR26qJA/HRkK2sh1VxNeZtmZpGfaWXyrXfiMsKlsQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 2024-07-22 at 13:28 +0100, "Vlastimil Babka (SUSE)" wrote: >> === Implementation === >> >> This patch series introduces a new flag to the `KVM_CREATE_GUEST_MEMFD` >> to remove its pages from the direct map when they are allocated. When >> trying to run a guest from such a VM, we now face the problem that >> without either userspace or kernelspace mappings of guest_memfd, KVM >> cannot access guest memory to, for example, do MMIO emulation of access >> memory used to guest/host communication. We have multiple options for >> solving this when running non-CoCo VMs: (1) implement a TDX-light >> solution, where the guest shares memory that KVM needs to access, and >> relies on paravirtual solutions where this is not possible (e.g. MMIO), >> (2) have KVM use userspace mappings of guest_memfd (e.g. a >> memfd_secret-style solution), or (3) dynamically reinsert pages into the >> direct map whenever KVM wants to access them. >> >> This RFC goes for option (3). Option (1) is a lot of overhead for very >> little gain, since we are not actually constrained by a physical >> inability to access guest memory (e.g. we are not in a TDX context where >> accesses to guest memory cause a #MC). Option (2) has previously been >> rejected [1]. > > Do the pages have to have the same address when they are temporarily mapped? > Wouldn't it be easier to do something similar to kmap_local_page() used for > HIMEM? I.e. you get a temporary kernel mapping to do what's needed, but it > doesn't have to alter the shared directmap. > > Maybe that was already discussed somewhere as unsuitable but didn't spot it > here. For what I had prototyped here, there's no requirement to have the pages mapped at the same address (I remember briefly looking at memremap to achieve the temporary mappings, but since that doesnt work for normal memory, I gave up on that path). However, I think guest_memfd is moving into a direction where ranges marked as "in-place shared" (e.g. those that are temporarily reinserted into the direct map in this RFC) should be able to be GUP'd [1]. I think for that the direct map entries would need to be present, right? >> In this patch series, we make sufficient parts of KVM gmem-aware to be >> able to boot a Linux initrd from private memory on x86. These include >> KVM's MMIO emulation (including guest page table walking) and kvm-clock. >> For VM types which do not allow accessing gmem, we return -EFAULT and >> attempt to prepare a KVM_EXIT_MEMORY_FAULT. >> >> Additionally, this patch series adds support for "restricted" userspace >> mappings of guest_memfd, which work similar to memfd_secret (e.g. >> disallow get_user_pages), which allows handling I/O and loading the >> guest kernel in a simple way. Support for this is completely independent >> of the rest of the functionality introduced in this patch series. >> However, it is required to build a minimal hypervisor PoC that actually >> allows booting a VM from a disk. [1]: https://lore.kernel.org/kvm/489d1494-626c-40d9-89ec-4afc4cd0624b@redhat.com/T/#mc944a6fdcd20a35f654c2be99f9c91a117c1bed4