From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A9DBC02182 for ; Thu, 23 Jan 2025 14:25:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C343E6B0083; Thu, 23 Jan 2025 09:25:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE3136B0085; Thu, 23 Jan 2025 09:25:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AAB576B0088; Thu, 23 Jan 2025 09:25:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8CE5F6B0083 for ; Thu, 23 Jan 2025 09:25:55 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3242212112B for ; Thu, 23 Jan 2025 14:25:55 +0000 (UTC) X-FDA: 83038940670.24.C72D4EE Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf04.hostedemail.com (Postfix) with ESMTP id 2E55040018 for ; Thu, 23 Jan 2025 14:25:53 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Xwrifpeo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of tabba@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737642353; a=rsa-sha256; cv=none; b=XJ/sU/aFMcU/5NftNxF1JKNb74Pv06vDRSXX6wHbFMrlGlNoSe6EaRG+Uwo/H67Nn9VnJD mcLP//1SbocOGc1pcAyGst4Pa3jYrHTCsk3D2y9MEkZ5dxr80roYk0wqghJlUwUkJZwD1I tpNewgR4teObJepfLYCeifRng4qwOts= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Xwrifpeo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of tabba@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=tabba@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737642353; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O0unFcqpZTejthcGsFFqk813fv8vChmX6x5uhJ6+ENs=; b=HAsf/4CAHl1xbVjV2RJFtUCXy/8dzWGJdXZlXLaUDSnOMzR70CfR3+GCN+V9ti0rv9fwIc RpCeuA8FtzD96hU8TsYAW1KFWetRp3I4ohJrSTlLqcWVf69pTp1Ikuq106cI2x2QiloRA6 kK8UWd7sEB0+2HJ6MKvqmPjbQ1wnHto= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-4679b5c66d0so207411cf.1 for ; Thu, 23 Jan 2025 06:25:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737642352; x=1738247152; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=O0unFcqpZTejthcGsFFqk813fv8vChmX6x5uhJ6+ENs=; b=Xwrifpeo5SZn2jhOXqO/h9pHIZvtyiaL5gKVrOXFv4Ijyl9AAHAd48UKFP4a8qetDD njDUhsiUulvfM+5hX7ZJhk9YFLj0TLiKOSMQXbsrQm4uf89yY4MLaorhCuGDw3FuTyk3 9NQ8mLY47Ujbey8fFMTIL/XRaeSKBW4kCvQNIbfGkvHnmLToQNTHKnzNagTe0iF9ojOS VGkPAWGaDggOP0z2s2F6DUw+T9kC4QXs/JgXoNtNITLlz4ZawfEtOpSHnQQ1MZBvHPzm 7PtwxLkUQggFnjlPnqy8jHH8IhsRoRvhNmVpWqjxJ87YO55PGuD2E0RoLQzKaHntbBuZ Ad+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737642352; x=1738247152; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=O0unFcqpZTejthcGsFFqk813fv8vChmX6x5uhJ6+ENs=; b=RBpe80VFoq3YoY8MV/DhwdAIwzFGxfuwJo5fl+lz8J7kOKWVMIViEMRcivhqNn1KIw opk1zSkYNFT09HufRI8sC4mx9AKj8zafZA+m5LpskcJDIstdVCHd7sAr3K53GBHAOCKP v3ZwynFb24J4mIGMsKKSeGyU1IuTNJ9dkt3VsFD1029mqPuNDqmsj3hP++V6PLDeYprW 3XrvOz7W0v2EVxAz2r+rX9hSkCjPdI9P8YRs7qaShzrwPfq4U/L0qVKu/S7i1BJui/j8 g+hjYGaZpS3Twv4mMZ+hRx510ts4GX1dE2WkJumwpvjMoBQ4qpimfliVb3otm9rBZvLs tZ3Q== X-Forwarded-Encrypted: i=1; AJvYcCXG9MbIAcm9DT/0pP67ebEG41L86KargVwTCg1lTyhJbibyok01VkaSEjbL/E7OLICm5QPTuZE+cA==@kvack.org X-Gm-Message-State: AOJu0Yz25Z1LJWXXyDeQvjGOTnQk7tVkcrO2OjcX1CIrQqWC3x8LNw/G LQppVmrwWbqOo/Qpss/taJAr8p400sFEXTbDj3Z8JriqZ9EX3sBaYQuZe7BOAIg5DbZZAcuYqIM 9Azwf6Clsklfs/PtZqynteN4WD6aeu1+gn3nq X-Gm-Gg: ASbGncs6ic+crrH7+yatM+tgy3yApcMEw10sWqnK6l1iuVKTVibE200fuzj28lxQAKf 9ItJGHfwS8QnuoT0ZjCxavV5eaHQjbW8E8vubeAi2DhLBMVYXEgVFMUpG6b/wLz+C3JafD/Qfcr xHiJ2KhMEA0DzE2kI= X-Google-Smtp-Source: AGHT+IHwLdvO8jUAYw+ZBMsSB1pMW7qvcJFpHdLhe5NCuiwdFGJhHa+vxCZyGFrNn+6wnp2cDkISLQGNfay7DpOrLaY= X-Received: by 2002:a05:622a:18a3:b0:46e:2769:a4b with SMTP id d75a77b69052e-46e5c1225c1mr3484851cf.12.1737642351989; Thu, 23 Jan 2025 06:25:51 -0800 (PST) MIME-Version: 1.0 References: <20250122152738.1173160-1-tabba@google.com> <20250122152738.1173160-3-tabba@google.com> <82d8d3a3-6f06-4904-9d94-6f92bba89dbc@redhat.com> <164e9d74-2f1f-4557-afda-06712e8415b0@redhat.com> In-Reply-To: <164e9d74-2f1f-4557-afda-06712e8415b0@redhat.com> From: Fuad Tabba Date: Thu, 23 Jan 2025 14:25:14 +0000 X-Gm-Features: AWEUYZlU5gjG70FmBSxfrrVHIoqMlts0zx6RtYRcoY0_rdwnFzhkEd9xFPpJokE Message-ID: Subject: Re: [RFC PATCH v1 2/9] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() To: David Hildenbrand Cc: Patrick Roy , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2E55040018 X-Stat-Signature: g3t1rw1jxhsdmjqsf9fnq37pgwxrx1j5 X-HE-Tag: 1737642353-325872 X-HE-Meta: U2FsdGVkX18PCfFbEIlSMfoJZtuR3Mp9XeYve7IxzW5JVTNGUjVOcXYbe5gs1nLQkaBqd8/q5utDTl4gUT9RHBg7G4KH4a9MmaVJY6QNwfFQBmf94MS36WxNVt4YdcRw5Io7bHB1hWNZ/YX7gjWYs4L0PUAunAxqn5rtfY7L4ufVt2cn7Q+oC6AxKOWeB5SWWGcIX2Oh1aWnBuPDtMB80znZ5XWk7KxisQeWyBmv+7tHPazNp1SntYtoeEW1BTNaAzrnNodPt6121JfeD5JWMblYCUIofDtAS13NuJWqFake2IqO8VWHSfF+elVkRUPYzAWm37waQNqTvCy6NiYhS7P57X5y4FWbwJ4TaHXUPG2tZKGGc2pk1P8nwjpUxzeHnsmT8zqLaNoCC+OLe0NKRfQotpz7qwkcbz1FaE1SwcRIaKey9bJzBytBPMVW4AIUVQs15XB+10lkGILxqR3xmQKLx2OKcG+4AQKivd5JOk7QpnisZmMMkMOGQzPxaAu+mjPNfMpdabUYifHqQDcFMiGWQvxUH5vwp7nqXanKUDx/SXSSZ+l4mAqtElGSTSYk9bZXNfzhpwXGSIgUQ8cky8Vc7fJ4dF3E1eaDX28pru8CuTsnShmOq1544S5gejgBmucUHe6Ia5nk/Q+vwR6xbtxOoPrJTn3ju8qlYITo7IgB45v1t1XPdaHNgL1NQ2ti071oKhHSgtY3kP2tjSE3yRcmtMPXyfrqb7pnASAtVV98T31IAQBJHiFJzPbW+5GgBMdyOBgqbTKRIt/0fhgtpeyXQBqcrWWElpj6epXg/1ozmG/Q544QOse4aeTj2GgJxpcQPtZqYLPqWem/Dm6t/vWtTcFPwRdbI2p33mA8MORLMaiKMREhb8SbCt2KV43GbgmF6RMRYX76OxFf30hAAgHkyQcySGZlruA/YLVJIVPdAa/+8PtGyjI53iHiUCFBcno/f8/88R0wI5RU9/c vZOW5FIg 97LGT2Mj3rlMedl85FVrWpkGXtitrn7Lr1m/BR2LsLLyyMol/seQQO/pK/xAe20VF5u+QreJ1jbGh3QrjBPQNu2pKXR2JnU7/gp3hCX+3dRiRM2z6kbAWWSnQMOLqiSoFQP7VrB/rCOaxvhME5a/IR1u9CtdKEdi7L3JZbvE7Qtau4VBHngqVyBicMxxp8QRdetXXDH77L0ymgXvee52fYgoHRlwj7DJyLG/IFtIadvCG+8Y9rNZphrHJ1P28FD9W8HheKafnncz1KrkBitTClm1UnqKyn+iXm80928eVX5BJrQzpTHsquZlTPtvX2mbO27my/t5dIccQQ4nHt/GMJPG5ctXsAI9+arfVEzrcG1HCJXNqf8NMo+2kp6erUwS2CEsTk0vzl2BVTkp7SzQ2k/JT3ANWijS44KRLjEsWI2SB7zM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 23 Jan 2025 at 14:21, David Hildenbrand wrote: > > On 23.01.25 14:57, Patrick Roy wrote: > > > > > > On Thu, 2025-01-23 at 12:28 +0000, Fuad Tabba wrote: > >> Hi Patrick, > >> > >> On Thu, 23 Jan 2025 at 11:57, Patrick Roy wrote: > >>> > >>> > >>> > >>> On Thu, 2025-01-23 at 11:39 +0000, David Hildenbrand wrote: > >>>> On 23.01.25 10:48, Fuad Tabba wrote: > >>>>> On Wed, 22 Jan 2025 at 22:10, David Hildenbrand wrote: > >>>>>> > >>>>>> On 22.01.25 16:27, Fuad Tabba wrote: > >>>>>>> Make kvm_(read|/write)_guest_page() capable of accessing guest > >>>>>>> memory for slots that don't have a userspace address, but only if > >>>>>>> the memory is mappable, which also indicates that it is > >>>>>>> accessible by the host. > >>>>>> > >>>>>> Interesting. So far my assumption was that, for shared memory, user > >>>>>> space would simply mmap() guest_memdd and pass it as userspace address > >>>>>> to the same memslot that has this guest_memfd for private memory. > >>>>>> > >>>>>> Wouldn't that be easier in the first shot? (IOW, not require this patch > >>>>>> with the cost of faulting the shared page into the page table on access) > >>>>> > >>>> > >>>> In light of: > >>>> > >>>> https://lkml.kernel.org/r/20250117190938.93793-4-imbrenda@linux.ibm.com > >>>> > >>>> there can, in theory, be memslots that start at address 0 and have a > >>>> "valid" mapping. This case is done from the kernel (and on special s390x > >>>> hardware), though, so it does not apply here at all so far. > >>>> > >>>> In practice, getting address 0 as a valid address is unlikely, because > >>>> the default: > >>>> > >>>> $ sysctl vm.mmap_min_addr > >>>> vm.mmap_min_addr = 65536 > >>>> > >>>> usually prohibits it for good reason. > >>>> > >>>>> This has to do more with the ABI I had for pkvm and shared memory > >>>>> implementations, in which you don't need to specify the userspace > >>>>> address for memory in a guestmem memslot. The issue is there is no > >>>>> obvious address to map it to. This would be the case in kvm:arm64 for > >>>>> tracking paravirtualized time, which the userspace doesn't necessarily > >>>>> need to interact with, but kvm does. > >>>> > >>>> So I understand correctly: userspace wouldn't have to mmap it because it > >>>> is not interested in accessing it, but there is nothing speaking against > >>>> mmaping it, at least in the first shot. > >>>> > >>>> I assume it would not be a private memslot (so far, my understanding is > >>>> that internal memslots never have a guest_memfd attached). > >>>> kvm_gmem_create() is only called via KVM_CREATE_GUEST_MEMFD, to be set > >>>> on user-created memslots. > >>>> > >>>>> > >>>>> That said, we could always have a userspace address dedicated to > >>>>> mapping shared locations, and use that address when the necessity > >>>>> arises. Or we could always require that memslots have a userspace > >>>>> address, even if not used. I don't really have a strong preference. > >>>> > >>>> So, the simpler version where user space would simply mmap guest_memfd > >>>> to provide the address via userspace_addr would at least work for the > >>>> use case of paravirtualized time? > >>> > >>> fwiw, I'm currently prototyping something like this for x86 (although > >>> not by putting the gmem address into userspace_addr, but by adding a new > >>> field to memslots, so that memory attributes continue working), based on > >>> what we talked about at the last guest_memfd sync meeting (the whole > >>> "how to get MMIO emulation working for non-CoCo VMs in guest_memfd" > >>> story). So I guess if we're going down this route for x86, maybe it > >>> makes sense to do the same on ARM, for consistency? > >>> > >>>> It would get rid of the immediate need for this patch and patch #4 to > >>>> get it flying. > >>>> > >>>> > >>>> One interesting question is: when would you want shared memory in > >>>> guest_memfd and *not* provide it as part of the same memslot. > >>> > >>> In my testing of non-CoCo gmem VMs on ARM, I've been able to get quite > >>> far without giving KVM a way to internally access shared parts of gmem - > >>> it's why I was probing Fuad for this simplified series, because > >>> KVM_SW_PROTECTED_VM + mmap (for loading guest kernel) is enough to get a > >>> working non-CoCo VM on ARM (although I admittedly never looked at clocks > >>> inside the guest - maybe that's one thing that breaks if KVM can't > >>> access gmem. How to guest and host agree on the guest memory range > >>> used to exchange paravirtual timekeeping information? Could that exchange > >>> be intercepted in userspace, and set to shared via memory attributes (e.g. > >>> placed outside gmem)? That's the route I'm going down the paravirtual > >>> time on x86). > >> > >> For an idea of what it looks like on arm64, here's how kvmtool handles it: > >> https://github.com/kvmtool/kvmtool/blob/master/arm/aarch64/pvtime.c > >> > >> Cheers, > >> /fuad > > > > Thanks! In that example, kvmtool actually allocates a separate memslot for > > the pvclock stuff, so I guess it's always possible to simply put it into > > a non-gmem memslot, which indeed sidesteps this issue as you mention in > > your reply to David :D > > Does that work on CC where all memory defaults to private first, and the > VM explicitly has to opt into marking it shared first, or how exactly > would the flow of operations be in the cases of the non-gmem ("good > old") memslot? We use a normal memslot, without the KVM_MEM_GUEST_MEMFD flag, and consider that kind of slot to be shared by default. Cheers, /fuad > -- > Cheers, > > David / dhildenb >