From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2BBAC02182 for ; Thu, 23 Jan 2025 13:58:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DAEC6B007B; Thu, 23 Jan 2025 08:58:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 38B046B0083; Thu, 23 Jan 2025 08:58:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22B2A6B0085; Thu, 23 Jan 2025 08:58:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 089B06B007B for ; Thu, 23 Jan 2025 08:58:02 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A307BAF012 for ; Thu, 23 Jan 2025 13:58:01 +0000 (UTC) X-FDA: 83038870362.12.0159E49 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by imf09.hostedemail.com (Postfix) with ESMTP id 6447A14000F for ; Thu, 23 Jan 2025 13:57:59 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=LtdGNMM3; spf=pass (imf09.hostedemail.com: domain of "prvs=11115c5f1=roypat@amazon.co.uk" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=11115c5f1=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737640679; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0jTJgN8GxO5zlDk/fEj26ROo4npw/UxB9CSrdOxz2Ic=; b=Bxm3xDJI5n6ucZ9K2zhaWyYNLV+0CvL8RUTInu2SYcihVseFR63ryI/SUNGUiD+VweUKEu eaProvoYZVFldsGt+hGfm69ISsHyhjEZerunoxKKTwNlhzpGV7/Md+fkBZX+xTMe/TlPZB KToI/Celqo3nJcIE3lU7oTbSxFLGdno= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=LtdGNMM3; spf=pass (imf09.hostedemail.com: domain of "prvs=11115c5f1=roypat@amazon.co.uk" designates 207.171.184.29 as permitted sender) smtp.mailfrom="prvs=11115c5f1=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737640679; a=rsa-sha256; cv=none; b=Z9P+emBKHF3n5+1o3aNb09RwwVPj4f+D5rLWh4y7sh3ZuZiySWC0FZs60YQlyOd/HYKABt sq/C+9H82gS6XPHexm7oSNCDBad+eM7KWM5ys4YVwEPGkv1LNRmdq3P/gopC8tDzP7/nEE Eknu/yiE1lBtQo9i4BDQWmZjDxfQi3I= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1737640680; x=1769176680; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=0jTJgN8GxO5zlDk/fEj26ROo4npw/UxB9CSrdOxz2Ic=; b=LtdGNMM32gYSp4U0n8RC8aLexgwyJBbK6hkne+CF6AA0LPRbrxA3Nvqe 2MP3hb5tbDNS2/SsIdDvT9TjhKa6/+BlFea1LYwOI8TqM0LTk9HGWPGm5 iQU/2Eiqi0C/RCNQ41OtTxA1pQPqnu3OiN4iKZEB2UaUIUo9Rm+Jujfls Q=; X-IronPort-AV: E=Sophos;i="6.13,228,1732579200"; d="scan'208";a="488124432" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jan 2025 13:57:49 +0000 Received: from EX19MTAUEB001.ant.amazon.com [10.0.44.209:62865] by smtpin.naws.us-east-1.prod.farcaster.email.amazon.dev [10.0.80.208:2525] with esmtp (Farcaster) id d3c40c1b-073a-4d92-b76f-4df00d2e1653; Thu, 23 Jan 2025 13:57:48 +0000 (UTC) X-Farcaster-Flow-ID: d3c40c1b-073a-4d92-b76f-4df00d2e1653 Received: from EX19MTAUEC002.ant.amazon.com (10.252.135.146) by EX19MTAUEB001.ant.amazon.com (10.252.135.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Thu, 23 Jan 2025 13:57:47 +0000 Received: from email-imr-corp-prod-iad-all-1a-6ea42a62.us-east-1.amazon.com (10.43.8.6) by mail-relay.amazon.com (10.252.135.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39 via Frontend Transport; Thu, 23 Jan 2025 13:57:47 +0000 Received: from [127.0.0.1] (dev-dsk-roypat-1c-dbe2a224.eu-west-1.amazon.com [172.19.88.180]) by email-imr-corp-prod-iad-all-1a-6ea42a62.us-east-1.amazon.com (Postfix) with ESMTPS id 7B86D40497; Thu, 23 Jan 2025 13:57:41 +0000 (UTC) Message-ID: Date: Thu, 23 Jan 2025 13:57:40 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 2/9] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() To: Fuad Tabba CC: David Hildenbrand , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , References: <20250122152738.1173160-1-tabba@google.com> <20250122152738.1173160-3-tabba@google.com> <82d8d3a3-6f06-4904-9d94-6f92bba89dbc@redhat.com> From: Patrick Roy Content-Language: en-US Autocrypt: addr=roypat@amazon.co.uk; keydata= xjMEY0UgYhYJKwYBBAHaRw8BAQdA7lj+ADr5b96qBcdINFVJSOg8RGtKthL5x77F2ABMh4PN NVBhdHJpY2sgUm95IChHaXRodWIga2V5IGFtYXpvbikgPHJveXBhdEBhbWF6b24uY28udWs+ wpMEExYKADsWIQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbAwULCQgHAgIiAgYVCgkI CwIEFgIDAQIeBwIXgAAKCRBVg4tqeAbEAmQKAQC1jMl/KT9pQHEdALF7SA1iJ9tpA5ppl1J9 AOIP7Nr9SwD/fvIWkq0QDnq69eK7HqW14CA7AToCF6NBqZ8r7ksi+QLOOARjRSBiEgorBgEE AZdVAQUBAQdAqoMhGmiXJ3DMGeXrlaDA+v/aF/ah7ARbFV4ukHyz+CkDAQgHwngEGBYKACAW IQQ5DAcjaM+IvmZPLohVg4tqeAbEAgUCY0UgYgIbDAAKCRBVg4tqeAbEAtjHAQDkh5jZRIsZ 7JMNkPMSCd5PuSy0/Gdx8LGgsxxPMZwePgEAn5Tnh4fVbf00esnoK588bYQgJBioXtuXhtom 8hlxFQM= In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 6447A14000F X-Stat-Signature: j4m8mhcxtso6w4fy7qbd54rtrd7g9zgi X-Rspam-User: X-HE-Tag: 1737640679-611442 X-HE-Meta: U2FsdGVkX1/bbJFqSeFWn7XqXETwEPiLA6Vt6VEkhB14DwDx6buXEP2kU7DnG8jWJw4CfIcMKvkGKSuRp3k32tsVhrSjilBvpHEZVH7rCE9vC/yqrOaPpklwnAeCn51wKl649mJPM9R5x+Rm+mb3fXipmi5BuEsD7FsiXE2hzqfqY9dhbLczz8hUKyDku+yIJjhM5fUGa7j6sjqpvenjQqbFCO+plnXM8o5aNcH8wNAULt8v/0g2GPBYF55RNYUDurQlqLpDAYGc+mOcaJFe9qLrC4OJNAGdDVbI9aKrT1wCDzih1ytM9T9I3kzUCGBGnoc0DKnhsnn1+yEycynb1P9y59uTk1BcncnxPrCSzeLQ4dYOAFYbgbU9ba93ZBUTT5qfE+/AGDhtTYAWDHZyDuPmFdQKf6PN0IiSfgbcsf2tbJbW9GLIujV8QQdpM4dskGIA7jUYDTkVvHLqP+kkoG1aVRGFin/kvXAR393IkEBbzhYcVgXdwGCXDgEW2ZJmzerUyWwym8q6RDFIKf7tD9qKiST8OTmKuKuXjbo3szk9Qd/UJBuIu+1DS0wL/XgQem/O0vX6tCmcd/h9ZIi/+who6dwCgYQcne5c8ktpckIxl7FsUUMTxlgZOXz8080whEZtqxTUkqwFYXENfgAEbHeVGY0Uy5CtN7/r4mOre7zuttMpqSurGP1DAPXDUsaaczF05nSYV6HjKvyEzKhSr37ewqCXpIb8EUzwZPaUsAe9PcZ9/dNzNCHwF21w5j/2WvP6Y8+nOIbq7cNlAYlKl4VQV9EmG73BNw7jh52Jc6GL5aczxow5fhNknMEeA2bjx+OPT4NMRPMfQND588tCr1sCExppVPyecLT/zBkIpxA1H+6dNHZOnPwj+/lRovBMUW2lXrFstoZ4fFLmwzBZCfTepnsu6WzRee0l2nFME4Ooumg8AjkXLvy891TpnvEYyBdzP8gpf9YfsBoHQ2o FF3I7THf l1XBn1Rl6dpBYvefRNDQ8MFxPSJI2l4ryFGsb0s22elpg3Ju8WvHqVu6F9nGgQBemTVJbk4ANWBdaDfluvEaw124UIW/bYT3IWiaeAmDQcjz+rB07KRP9Nuw1/rvfnRcGiMncKXF+nKvXQBwk4EA1ayzK1tR/f/+6qpsSVGg0pAwoymlstAXxB7+wNZ+kNA1o7HPUDXgNnPWEHfX9m4Zux7462Uw8XXpWo0eKgQREosEkXrNyQrfcpHRtUtg+v9wzhvyn5HBTlajc+lQ+hie5WAAuIZhrlNzzrWgUIaLMe7cK/GPsQ00q1D4rUmYmUIpiEl4i6nvfgudXJJUyU5CxHqq9OPG2tdRGKcG5FCigHjeoskn3HW3QIYpv2u6CyaJDLm/2HI9p2S9QsrfHWNhdr9eHOPkuYBiEt8UIYWhoPAMqGXBDPuEXPsYtjPBcJ/i048z+dr4vnm5Q05sKQ0yFdZ/FD5EreQImHHEGqgcEPFyl8z5oOdZ/9UCOGNXufDAK9AQ11HPA7fcR7ObGUtMQ7oOfCsFzuL33THF3RJjF5xH25M3eiUqTMQXXKJKZexD3/MNYosZfYpOkGGmqTzKi67Y9k/y1cJ278n0M/5f30CKQ2JxAlyAdjUzqCGOUedXaPOmmXZgL6MZqFP3kUAjO9L5dWdy8rNW0WBAL+/Elm4tFiPbTh4Gq+/+2cFHJEid45CPBrqgttKKAJD1EZqlzSRmqBcVHJl20N4Zk6Vrxa8c00sM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 2025-01-23 at 12:28 +0000, Fuad Tabba wrote: > Hi Patrick, > > On Thu, 23 Jan 2025 at 11:57, Patrick Roy wrote: >> >> >> >> On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote: >>> On 23.01.25 10:48, Fuad Tabba wrote: >>>> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand wrote: >>>>> >>>>> On 22.01.25 16:27, Fuad Tabba wrote: >>>>>> Make kvm_(read|/write)_guest_page() capable of accessing guest >>>>>> memory for slots that don't have a userspace address, but only if >>>>>> the memory is mappable, which also indicates that it is >>>>>> accessible by the host. >>>>> >>>>> Interesting. So far my assumption was that, for shared memory, user >>>>> space would simply mmap() guest_memdd and pass it as userspace address >>>>> to the same memslot that has this guest_memfd for private memory. >>>>> >>>>> Wouldn't that be easier in the first shot? (IOW, not require this patch >>>>> with the cost of faulting the shared page into the page table on access) >>>> >>> >>> In light of: >>> >>> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com >>> >>> there can, in theory, be memslots that start at address 0 and have a >>> "valid" mapping. This case is done from the kernel (and on special s390x >>> hardware), though, so it does not apply here at all so far. >>> >>> In practice, getting address 0 as a valid address is unlikely, because >>> the default: >>> >>> $ sysctl vm.mmap_min_addr >>> vm.mmap_min_addr = 65536 >>> >>> usually prohibits it for good reason. >>> >>>> This has to do more with the ABI I had for pkvm and shared memory >>>> implementations, in which you don't need to specify the userspace >>>> address for memory in a guestmem memslot. The issue is there is no >>>> obvious address to map it to. This would be the case in kvm:arm64 for >>>> tracking paravirtualized time, which the userspace doesn't necessarily >>>> need to interact with, but kvm does. >>> >>> So I understand correctly: userspace wouldn't have to mmap it because it >>> is not interested in accessing it, but there is nothing speaking against >>> mmaping it, at least in the first shot. >>> >>> I assume it would not be a private memslot (so far, my understanding is >>> that internal memslots never have a guest_memfd attached). >>> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set >>> on user-created memslots. >>> >>>> >>>> That said, we could always have a userspace address dedicated to >>>> mapping shared locations, and use that address when the necessity >>>> arises. Or we could always require that memslots have a userspace >>>> address, even if not used. I don't really have a strong preference. >>> >>> So, the simpler version where user space would simply mmap guest_memfd >>> to provide the address via userspace_addr would at least work for the >>> use case of paravirtualized time? >> >> fwiw, I'm currently prototyping something like this for x86 (although >> not by putting the gmem address into userspace_addr, but by adding a new >> field to memslots, so that memory attributes continue working), based on >> what we talked about at the last guest_memfd sync meeting (the whole >> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd" >> story). So I guess if we're going down this route for x86, maybe it >> makes sense to do the same on ARM, for consistency? >> >>> It would get rid of the immediate need for this patch and patch #4 to >>> get it flying. >>> >>> >>> One interesting question is: when would you want shared memory in >>> guest_memfd and *not* provide it as part of the same memslot. >> >> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite >> far without giving KVM a way to internally access shared parts of gmem - >> it's why I was probing Fuad for this simplified series, because >> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a >> working non-CoCo VM on ARM (although I admittedly never looked at clocks >> inside the guest - maybe that's one thing that breaks if KVM can't >> access gmem. How to guest and host agree on the guest memory range >> used to exchange paravirtual timekeeping information? Could that exchange >> be intercepted in userspace, and set to shared via memory attributes (e.g. >> placed outside gmem)? That's the route I'm going down the paravirtual >> time on x86). > > For an idea of what it looks like on arm64, here's how kvmtool handles it: > https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c > > Cheers, > /fuad Thanks! In that example, kvmtool actually allocates a separate memslot for the pvclock stuff, so I guess it's always possible to simply put it into a non-gmem memslot, which indeed sidesteps this issue as you mention in your reply to David :D >>> One nice thing about the mmap might be that access go via user-space >>> page tables: E.g., __kvm_read_guest_page can just access the memory >>> without requiring the folio lock and an additional temporary folio >>> reference on every access -- it's handled implicitly via the mapcount. >>> >>> (of course, to map the page we still need that once on the fault path) >> >> Doing a direct map access in kvm_{read,write}_guest() and friends will >> also get tricky if guest_memfd folios ever don't have direct map >> entries. On-demand restoration is painful, both complexity and >> performance wise [1], while going through a userspace mapping of >> guest_memfd would "just work". >> >>> -- >>> Cheers, >>> >>> David / dhildenb >>>