From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AA55C54798 for ; Sat, 9 Mar 2024 11:15:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D6916B0071; Sat, 9 Mar 2024 06:15:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2855E6B0072; Sat, 9 Mar 2024 06:15:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14D366B0074; Sat, 9 Mar 2024 06:15:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0563C6B0071 for ; Sat, 9 Mar 2024 06:15:27 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C8949A090D for ; Sat, 9 Mar 2024 11:15:26 +0000 (UTC) X-FDA: 81877244652.24.7E0D269 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id 31DBC180008 for ; Sat, 9 Mar 2024 11:15:25 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Rl9M6geC; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709982925; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TMGY+Dag1bEOwsSIbF6kTDOjyr2TlYFeT1qyEyL4X4s=; b=SQI00VxZeUIIBabdIOd0He1o2mDItKPVLPw14crgjCmtUYBipLg0ibVJp0iclSGc2uah+1 dDa0FOGu2GOLge55mlbsKA2Q9asNutyGiVxFKMJ7zIZfrWA8XaT8cbdmDv3RENCwbS/6CF oLwppvHT5gJAaa/46v7ONfijEIbGRsc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=Rl9M6geC; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf16.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709982925; a=rsa-sha256; cv=none; b=E5zkYfxRgB7c44mG23XfCwja4iWVqeD+1SjKQWar4BXoxIBEGB6ycfz09FOwGyZPuzOgqZ m3M5YX9w2D1fSvL8fn+ag531xWUh0NLLWlJS3KyYZUazdAnym/p9cgjVzDUQkEty3X7rCf Wl8/vef5utOvjTkjtdu8RVtkLglGN8Y= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 23246601CF; Sat, 9 Mar 2024 11:15:24 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAB0EC433C7; Sat, 9 Mar 2024 11:15:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709982923; bh=YaUuuk7Crsl2a1PUUVlftTiGcGES7hvpa0gsLJH4zXA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Rl9M6geC4ebyRSg/d2wRXYZOIlzGBGevef0ILal/PwLTfJYHxWnepu1EFQhJSJqp1 dqKzS8GKxJ+rhJToGRkhboVOZdHt+vU98vtpuGtVx1OYquHCjgIoWOt5kIIIUgUdkA zVhFg8xuVaI+nt//g1J0r8SZk/3Q7xTmCC8eOzhi9FBsQ9Y0h+puyEVimWTJnC8oYu Y7MxIveJTBKivuvGizPHVxJlu5YPSGFls3TxEPBEiRf30S5/tAKcIhqs0kqyYQkT4I tS6X5kYMX/52fcixiE4zhGRDtQidSASFWNKNftwdz9lcEYoKUFz7B7hl+61ijvjwHK DHGPLi3zffCQA== Date: Sat, 9 Mar 2024 13:14:24 +0200 From: Mike Rapoport To: Sean Christopherson Cc: James Gowans , "akpm@linux-foundation.org" , Patrick Roy , "chao.p.peng@linux.intel.com" , Derek Manwaring , "pbonzini@redhat.com" , David Woodhouse , Nikita Kalyazin , "lstoakes@gmail.com" , "Liam.Howlett@oracle.com" , "linux-mm@kvack.org" , "qemu-devel@nongnu.org" , "kirill.shutemov@linux.intel.com" , "vbabka@suse.cz" , "mst@redhat.com" , "somlo@cmu.edu" , Alexander Graf , "kvm@vger.kernel.org" , "linux-coco@lists.linux.dev" Subject: Re: Unmapping KVM Guest Memory from Host Kernel Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: 31DBC180008 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 8kg9oj7357x7s6zrai1cym1bq9c55c1g X-HE-Tag: 1709982925-961283 X-HE-Meta: U2FsdGVkX1875YyLiUgcdzZVNMYbSCh4sdPIQwzQT8mIJ3SleA/qZcb86XdQY7frnEzQnhn+QO980c9Co1P5WlwGuYl2aRGcZohY91V5XgaI0pgw4uiveFpG0B6ILSo20cFtzDRn6LhoewGnBKVpx0w/1lo6pBpAxSRIc1QWY55L6+WVexZjQ9JXi8H2wpIOHSorU0azKCnlXGfg/o4Z/zAf8fhFrqvUWaS2VOND36yPmVbrNGmN5K60+c5J3LPGVPZLAzMoIj5FyTln1HyxZ3VuiMaA24bOal3we9fS6HbhjQmLZLJ1nCAOiZN9i/DqUasV9KH+ft4CqDfikXVOCqfKjMDZYvN2JBluHx6J4UBzauxVOJCTp8nFB1YZBGuR2hKXeqSLn6IOLrAC5yMgMW0X+IK83Io5ek1RHk5iE2s8MpFN7GypvNlhVrFgOLZv+IKEgJk1YOroY+EtlOROyINFpX91LN7Wf9tcgIgoZx9Y/ORP/vqc3H8cK0njJgk+/z1ORzmCT7irhG40GgK0OH5QFES/MoMYa8IXfEnK89qT1cZ+vaEJ7sHK9d0hD9NP/DIDNQk6Gecx7d33vDuAbfm3c9lw61jT4C31zxFr+5UusseaVUC+yNDSF2BZm5Vch8lbg811CCBZVA4IKRM2vMe9SK9Yk1WjQ9l7e58OckQsbYMkglwePR0lRZQFsKvIFOLTCKOPqE+cKHwu4Do4r76Rp9RzjTpi/iOHT1LK2fBO/Kc+BQkx5HW6kj/OTCMLQqCkFjnprAGRctbJrmt11rRMZi36YDrR8qDbVrGxImVdp4FshqvjZBG5jRIVF8RATv87ivvoymbTh7RL4681dsks9EcNkY568R0XtF9v/j0hX6x8X8jp1jOtLP59/cdlbNWYXLtaO8yB5nOOhYmY8UgUybMo30fuSeiNw+21JZHrQ5gqBPzW3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 08, 2024 at 03:22:50PM -0800, Sean Christopherson wrote: > On Fri, Mar 08, 2024, James Gowans wrote: > > However, memfd_secret doesn’t work out the box for KVM guest memory; the > > main reason seems to be that the GUP path is intentionally disabled for > > memfd_secret, so if we use a memfd_secret backed VMA for a memslot then > > KVM is not able to fault the memory in. If it’s been pre-faulted in by > > userspace then it seems to work. > > Huh, that _shouldn't_ work. The folio_is_secretmem() in gup_pte_range() is > supposed to prevent the "fast gup" path from getting secretmem pages. I suspect this works because KVM only calls gup on faults and if the memory was pre-faulted via memfd_secret there won't be faults and no gups from KVM. > > With this in mind, what’s the best way to solve getting guest RAM out of > > the direct map? Is memfd_secret integration with KVM the way to go, or > > should we build a solution on top of guest_memfd, for example via some > > flag that causes it to leave memory in the host userspace’s page tables, > > but removes it from the direct map? > > memfd_secret obviously gets you a PoC much faster, but in the long term I'm quite > sure you'll be fighting memfd_secret all the way. E.g. it's not dumpable, it > deliberately allocates at 4KiB granularity (though I suspect the bug you found > means that it can be inadvertantly mapped with 2MiB hugepages), it has no line > of sight to taking userspace out of the equation, etc. > > With guest_memfd on the other hand, everyone contributing to and maintaining it > has goals that are *very* closely aligned with what you want to do. I agree with Sean, guest_memfd seems a better interface to use. It's integrated by design with KVM and removing guest memory from the direct map looks like a natural enhancement to guest_memfd. Unless I'm missing something, for fast-and-dirty POC it'll be a oneliner that adds set_memory_np() to kvm_gmem_get_folio() and then figuring out what to do with virtio :) -- Sincerely yours, Mike.