From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39E69C4167D for ; Tue, 31 Oct 2023 18:24:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C751F8D0028; Tue, 31 Oct 2023 14:24:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFECC8D0012; Tue, 31 Oct 2023 14:24:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A78A58D0028; Tue, 31 Oct 2023 14:24:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 99A588D0012 for ; Tue, 31 Oct 2023 14:24:16 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5A888C02B6 for ; Tue, 31 Oct 2023 18:24:16 +0000 (UTC) X-FDA: 81406581312.15.70F9379 Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com [209.85.214.174]) by imf27.hostedemail.com (Postfix) with ESMTP id 62C2440013 for ; Tue, 31 Oct 2023 18:24:14 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3OynJsI+; spf=pass (imf27.hostedemail.com: domain of dmatlack@google.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698776654; a=rsa-sha256; cv=none; b=zwDtuMlgnHz+teAvCIkgSCKeDSENEY85W6evv/JlyQi9Vu2XmEU1VbINvwqD/94ZAWPTMh rU2k6i6QdHqY9Z+9OdkkcnFbFkGAnDaJgNLog6J1EjbjfnGTIbIGMY/p4wj7jbwzdXFcOy IOeb7LJ4xBEf8DNaO1utRby5mhVc4fg= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=3OynJsI+; spf=pass (imf27.hostedemail.com: domain of dmatlack@google.com designates 209.85.214.174 as permitted sender) smtp.mailfrom=dmatlack@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698776654; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fA4BSuC8POmGjMQJ0k8vnna5BPAxY9Ekm8SCU7EDDBY=; b=q7oOXnlVjTXJEQLuZKSQMxNRoB8z+6OHejpe9i65TLaeyKWCJB1nQC2PX0QRjOb6Ho0MJ9 LBE+FhtJZLzFOnDcs5ExSN+zEX2RR3+gaP3TNxzA3kB5WPA5ZAvwAWZsBRvPnxpMqQ1bUm fVDTXGIYywfQr6rEe5gvOuIxNkbbpJ8= Received: by mail-pl1-f174.google.com with SMTP id d9443c01a7336-1cc5b705769so17821335ad.0 for ; Tue, 31 Oct 2023 11:24:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698776653; x=1699381453; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=fA4BSuC8POmGjMQJ0k8vnna5BPAxY9Ekm8SCU7EDDBY=; b=3OynJsI+GbYnrwEhatp10/q2BQRX+/Anw6pDzmOKXKlvjdFw341xbboWgneByHJZHs 18GNcxLuPKxOSv76hTEn9OzpjTOkDQifDyevkF8fipE5fj0gr+Xpzab/cCCyOpwkyUoq jWTcxwOXRKUFMCq2Po2DnUYWanw3C7nxgmLdmrS6s64ryqWjM3Pcf1iFdpRqhY2y67iG yM+mARdPeaUGqsYh8Sqg0S4uAsEU60ycVSwgwwn6slTLVSyvYo3aD/cFbjzRNAuTlwjl jvwEpzFIoekyQSRUporZ70wl+y7MeMHg8FE+Hs0cZ200P6tiNF6sdgCjz61Jea5/UbtU 0Jag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698776653; x=1699381453; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fA4BSuC8POmGjMQJ0k8vnna5BPAxY9Ekm8SCU7EDDBY=; b=DVACyuglJQ1g+AlFUnvcEnQgNbV1RnF23BLoLV8sM3K+NNV7hKRCQVhS99CDyrMe+U EzzTxLNGepoYes8/XoAhriAnvw6LCECH4E2FAXDLs8vI7WzYp+/CgAW4wKdpOMo3Th5T bxl0oL9s3oA+G+6UTG3Fwj1Bvq0GvxIN3Jsv2CP96zmDf9LeWcJDeu8rSmo1wLbRHkn/ feDSKrpKGj3ldwahOQQ+c98nniFHm0rfHYn8I7oiRMNRhsxOCIbMWs5th8zyt1+VuchC 7jTRWCSIaiLyNmvWudT9BGlkWLudb0DFgSJRv56fTsIMJjV/Jd/GcFoaZxeB7bL+mjv2 naYQ== X-Gm-Message-State: AOJu0YxVPXrdzTobtjaibCxrLJarWG2pnqRkHtAt9gmDrb0QocgDxH70 T+GORP/7uEM/klGljs9onKGrJw== X-Google-Smtp-Source: AGHT+IFxkiDdZGGfHsdazjXYZcfW0cI1Huw/6T3VggKB9s0vUh/sA7Jolgtag5oCOtU2pTLnT1L5FA== X-Received: by 2002:a17:902:ecca:b0:1cc:54b5:b4fa with SMTP id a10-20020a170902ecca00b001cc54b5b4famr5681643plh.18.1698776652868; Tue, 31 Oct 2023 11:24:12 -0700 (PDT) Received: from google.com (175.199.125.34.bc.googleusercontent.com. [34.125.199.175]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c20d00b001c5dea67c26sm1620267pll.233.2023.10.31.11.24.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 11:24:11 -0700 (PDT) Date: Tue, 31 Oct 2023 11:24:07 -0700 From: David Matlack To: Sean Christopherson Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , =?iso-8859-1?Q?Micka=EBl_Sala=FCn?= , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Subject: Re: [PATCH v13 16/35] KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory Message-ID: References: <20231027182217.3615211-1-seanjc@google.com> <20231027182217.3615211-17-seanjc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20231027182217.3615211-17-seanjc@google.com> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 62C2440013 X-Stat-Signature: sdpwjf8bz9yxa3p1x3ypfx19omyoxtm8 X-Rspam-User: X-HE-Tag: 1698776654-403308 X-HE-Meta: U2FsdGVkX1+qhMfw1XyfQHlTgpZy3NwXT437fDx5GvzDT8tZPS5Qlg2bDsRYgDBa3BcfXXveZ6bwUKIdkQu1cz6wfkVLcqsufc8QvXlmjiifoWaWjcaSRiPdeEfHaPG6tqF9cozEyvRNhJblbENt/Nc8DDL7dVkS4JTsUDjwxFvvUuBjW/Pq86eRtepkywNKofuIVVa/NhVXLVGWANGj7Jdphk4knLpBYUIe6JVLm9H+xB4mwlPdP0j77o7Z6hK/hzaIAuMtLbUwgd+gk/SHpaGQFoLokaf3YrCwcW9ZWqepWHQ7JNL/SbhHoQkAb/MTfM5FI8FeQTrTHnZzlFp174KdlTsLYI+hgR0WfW2qTlfOqiXOC0sH9XRkFZYkJ2sz8ePG9P1fLQUWpiFSxZIaE05xCSd0397M372/URJMjk4Q1EV0jTNVrbbk1fKAnRIL6zlOKZseNPdeVIaWLprKGZKqIe2Bkswsrobdw59L+S7jTo4eHb4b7upktMDsPcB5Vxp+MT0pAvcsdTx4X3RrE+kElkVxPYv09I03AmMO2qjuWJGJu+FLtYUUq6cPveu7wW0whjfXvKfTuuf9rA+2A1xFiI2h8n+JuC3Bumcpk5OBVeegAzVPod3+UJ8bPrAIBjv2IotSoW3I1WepI0lmaG9UFuRKXwrrN59TiJOLSa54g2Q/rCCf+soZxkuMs0iJgB6l+ErOn9xs16WI8O9Ldv9YZ1hrboqiGngn6RXsK9WID5LM04T3qLlpUcr/BiqWwh2czc9TBG6EF+a58VBSvHy9J2Dj0ywR+5xEr8cH3mhV48I3WOrEYyDpYb2QoKQTyhDnGJliriTk6svmynjRzjYCyL4PAwSpjbfLsRSCZ/3/1wD0iW8cIkIElh3m9FCbpgtULRElhT4VsZI3SyrWey0xlH9L+x2CFaItfhXk6dKJVPeWxswM70L/UHRcuTfl5UhMZ3f2VL+x3xzdDVV rhk65FeP 152gfZ5tFkkfsDO52Pko75jm9Uog58gS+/bmrDiXm3+7LV8NleUlE0QPy9SMw6Yu7BeiUeuUEAjbjn3QGJwnUa+7uQqmjJwiw7I9+RrJ+lPgSS3uyAyOF5aZExyNqaCCPlei8S8WlbKfa7HCXpa79MLtGywOlp3n/el5iljLbr3K4gZmM9xjg0i50WN94TKei6soXtnvfUSZ5dPr83XEpjMeEoKeQJF2M/lcuvY0p3kImkVOdtCgtwXgMnyMo3jjzZdt/QrZ5Nh7Diw79BWopDXuCwmvMimXdAl1n/37ERhMgxs8Ns2Q1E1x2eXLOH4EiYhJL1SUjBUN362lDk7WWYYQeGWoAJHLrYEQTUPY/CHRD8HeLt00DRHgAbNHXDrvb/SDx9VTsZwcpN61FUCHvEM8QRhUjca7KZdqb5We/Ukmp/aATI+AYXCTvmisBeGWFjWnaR10eDS4NbPS+42aj79qwtOjv6AKjpDx2+O6mbeOQdco= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2023-10-27 11:21 AM, Sean Christopherson wrote: > Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based > memory that is tied to a specific KVM virtual machine and whose primary > purpose is to serve guest memory. > > A guest-first memory subsystem allows for optimizations and enhancements > that are kludgy or outright infeasible to implement/support in a generic > memory subsystem. With guest_memfd, guest protections and mapping sizes > are fully decoupled from host userspace mappings. E.g. KVM currently > doesn't support mapping memory as writable in the guest without it also > being writable in host userspace, as KVM's ABI uses VMA protections to > define the allow guest protection. Userspace can fudge this by > establishing two mappings, a writable mapping for the guest and readable > one for itself, but that’s suboptimal on multiple fronts. > > Similarly, KVM currently requires the guest mapping size to be a strict > subset of the host userspace mapping size, e.g. KVM doesn’t support > creating a 1GiB guest mapping unless userspace also has a 1GiB guest > mapping. Decoupling the mappings sizes would allow userspace to precisely > map only what is needed without impacting guest performance, e.g. to > harden against unintentional accesses to guest memory. > > Decoupling guest and userspace mappings may also allow for a cleaner > alternative to high-granularity mappings for HugeTLB, which has reached a > bit of an impasse and is unlikely to ever be merged. > > A guest-first memory subsystem also provides clearer line of sight to > things like a dedicated memory pool (for slice-of-hardware VMs) and > elimination of "struct page" (for offload setups where userspace _never_ > needs to mmap() guest memory). All of these use-cases involve using guest_memfd for shared pages, but this entire series sets up KVM to only use guest_memfd for private pages. For example, the per-page attributes are a property of a KVM VM, not the underlying guest_memfd. So that implies we will need separate guest_memfds for private and shared pages. But a given memslot can have a mix of private and shared pages. So that implies a memslot will need to support 2 guest_memfds? But the UAPI only allows 1 and uses the HVA for shared mappings. My initial reaction after reading through this series is that the per-page private/shared should be a property of the guest_memfd, not the VM. Maybe it would even be cleaner in the long-run to make all memory attributes a property of the guest_memfd. That way we can scope the support to only guest_memfds and not have to worry about making per-page attributes work with "legacy" HVA-based memslots. Maybe can you sketch out how you see this proposal being extensible to using guest_memfd for shared mappings?