From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 754F2C3ABD8 for ; Fri, 16 May 2025 11:12:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 16D5B6B013E; Fri, 16 May 2025 07:12:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11C516B0140; Fri, 16 May 2025 07:12:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F265A6B0141; Fri, 16 May 2025 07:12:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D44BC6B013E for ; Fri, 16 May 2025 07:12:52 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2D8C71A1EED for ; Fri, 16 May 2025 11:12:54 +0000 (UTC) X-FDA: 83448508668.17.2CE0089 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 4935740004 for ; Fri, 16 May 2025 11:12:51 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Ys3Ii/gL"; spf=pass (imf12.hostedemail.com: domain of gshan@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=gshan@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747393971; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bKu/W8DU3ij6rwr4JOg/v4VOvnNcQknWHlz00s+AUBg=; b=k+RnpyeTdQ20ckcTDujmsP6rLOOCrz/LM45JJDHcwmIU984jVC3ECnTdyBEtooydLH+1G1 C78TitdZ9ylUM9+2PgU+N0pV32YJiBnf5wfDaJS5YB3H5GZZZKmzPTyijJdaqxF6ECqgeR TW0rQevmRfNW9WjowDdUFVB9iBW5Gm0= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Ys3Ii/gL"; spf=pass (imf12.hostedemail.com: domain of gshan@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=gshan@redhat.com; dmarc=pass (policy=quarantine) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747393971; a=rsa-sha256; cv=none; b=pqqHgh1H6fOQHxP9PYbkZQehHEtJhq1dYPKieQMb9AVKx7XmTd6NzGBozJZWI9A9wO/RFL Qyj2u95dRcuyl/hy3KpfczFRllS8EcJV+wZgYC5S72cfMFXzXkAzntJMPx1e1WQIzQaLEu 1Lh9s32gSsza3j8TQ907Vt+yQ75PlgQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747393970; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bKu/W8DU3ij6rwr4JOg/v4VOvnNcQknWHlz00s+AUBg=; b=Ys3Ii/gL0h9CTHmerTqVd+b0NzJh3xoYIrSCVZ+Sa25ZPZ7dq5en82JbSDBRq1kXXQhZXl deny6XUyn4MAjn0pLA/S6zuxlzg/GKv36xXv3VhVGgc5iN4iwrtgq+dwhlNx12z1TvMAKt 2hbZ6aJbJYA2nSSoz1m+W1AlX6jI1RY= Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-510-dqEDqlT6MYGq1nGMnyF9OQ-1; Fri, 16 May 2025 07:12:49 -0400 X-MC-Unique: dqEDqlT6MYGq1nGMnyF9OQ-1 X-Mimecast-MFC-AGG-ID: dqEDqlT6MYGq1nGMnyF9OQ_1747393968 Received: by mail-pg1-f199.google.com with SMTP id 41be03b00d2f7-b26e73d375aso1583143a12.2 for ; Fri, 16 May 2025 04:12:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747393967; x=1747998767; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bKu/W8DU3ij6rwr4JOg/v4VOvnNcQknWHlz00s+AUBg=; b=HQKxAAAkB42ZYRD6dYhs2NEWJMFBgRqqclyRBPm+rf9/SA/ihsfhj3klikVNAsNZWr /SFAvckIXwuWBovRdRhCuaB4e6KiI2mBmUbm9JAjL+aT+9o9pP2D94ax+P4ptxm01sJK wy/68c0Tz8VdsmoLRkoJsFONDQrm5aLaTZufw0lt534KbD6T7YYaKfMh0UhGoZWloFyf Dvm60YrjwT6KItV4LiNyqcybrPo/UeWSgsoe7W8j5OrS3hKNDGOsLHEdNaru09SmBcWX IXhWx8/IMCpKALtKhtxg/J84PxNpLl8c5m9pIObE/i92xiPxDJNkYau/skZRkS7gTl+X ecRw== X-Forwarded-Encrypted: i=1; AJvYcCWy5Lx1lkP0bi+v94PhqXCOxhNDyu3Y3tLuPi7pQH4OHY9QsCuE/COYVmPW3OtfLR1AEitdOmrfFA==@kvack.org X-Gm-Message-State: AOJu0Yxft2R/RoOwRLWklnQxRY4Um8y89qwMYptrE90h514CQRfGvi5a zjAi+2n9SyDBvIJnyShg3yD3jY4SX/syRXq5+LTXoayurDlL2cLLzr2+ZnlwxDLBrkVb+vU0Edk tt6/SJYClfp4QXqBYDMXhonYnF/YzsNdWMtDG3fCAUuKxu+R3uVyZuNQeE19K X-Gm-Gg: ASbGncvvjUFqRzRJmCeva6Xr1z3vX+wyln8nYK1qo5vQ50wL10IomF9gZfbsxffqso8 G884l16B3DGtdCaBMNwGpBKC0p855g/jlR7dBuJjc4bc7szvUko9x0ODiFG6bas6aBARCupkWm4 RGO5UpwX6z2fBVCW5K3Dns5Coco7BH2Qi0AnFNoQ6HQqnmWuD8O2kopcsIQ/M+1HyyXRVRuPs7V T6XZUV15HM58lwn3GdQJRNCx+SJAe59HwRNdXxDXsgpbEOOt+hXHFTDA8u2zE1/y5YexxVzDl9R 1hab+bzATnkL X-Received: by 2002:a05:6a20:432b:b0:1f5:709d:e0c6 with SMTP id adf61e73a8af0-2170ce3c83bmr3582220637.42.1747393967491; Fri, 16 May 2025 04:12:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGxpvxHOG99CzRGK2Zc4xi+ExxaRfU8D1erG5H74rgtlSvuPwg5IuG+lL5c4ya8/S7YPRObzQ== X-Received: by 2002:a05:6a20:432b:b0:1f5:709d:e0c6 with SMTP id adf61e73a8af0-2170ce3c83bmr3582175637.42.1747393966998; Fri, 16 May 2025 04:12:46 -0700 (PDT) Received: from [192.168.68.51] ([180.233.125.65]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-231d4ac91b7sm12339915ad.19.2025.05.16.04.12.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 May 2025 04:12:46 -0700 (PDT) Message-ID: Date: Fri, 16 May 2025 21:12:25 +1000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v9 07/17] KVM: guest_memfd: Allow host to map guest_memfd() pages To: Fuad Tabba Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com References: <20250513163438.3942405-1-tabba@google.com> <20250513163438.3942405-8-tabba@google.com> From: Gavin Shan In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 7FuZcXfmtFUYBR3nhL6Yfpvs2_ZChx9pukjfqKxyCOQ_1747393968 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4935740004 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: fy1daowriuiumw7rdcwbigb4fuuzgo4m X-HE-Tag: 1747393971-467736 X-HE-Meta: U2FsdGVkX1/FGj6E8qmZYhchK5I+H4GYZ32UBFhznqQOUfC6SaR5k82BQEj5HcKx16PIcVVi39p90bxI3Xoi+tRe6mIxLFwXHLtCop4JjtfCPJpfUwCcy6IuTKQ0WktGTkk70pBGKScF6UIHynJvocpfXv0RkymQqNEdPqDQ5taJP6GZbpSRnOKvqyXrKqs/DG6rAFP6d3GbMmV68W+RZlXsdWUli8UvtFvQLstoCN69BVGLU1nR/fUAwlsHNhBPfjr7lEU9TMbqtpkTh/vJE1COk+W8ySI09yLFaYP5ZeQ7qOZkjqI4MuL5Ah9IuqDTAryi84YkV9qg8/8bNnbTZ9QH9aW4uKm5pzE04g1bLRIzz2CIudOtVzp99EcbojS7nVbLDNh1BVSAW534uI/Gw+K7YLIBHoC7uglryGSJWST26/YBYzyUYEwO9kAgFBOs04lEsXM9WtoyDpCriqEFcQEUPhJ7Q0E/U+PRgqPADMV2lhaTTYP7jfZoy0iVYFPxD4sa6VnJgpZE2jLyKU5vWY0pngCActM5lyMiE8dURq+uTjSRigbjS4NirtZF2CVoxP/OCAkqHhDtVCG9Iooq5OHW5lvi1pv2tDz+YSGL2dBC68XlQBG6MdQfZOE74GjEs2fDShHWXyYfM2XDTVm8dtrRkdsQMvwkGn55pC6JoAJekhngtJ8Y/nYPcXWdXXx5igU+LWq26Qc6ylYG2JzAa0Az0mvjXVW6gq4xqgUYi7is06fdRqK65taQYNROsceNsuuRNEBMLYvhU9/50PGkHQitSqK0qtxfN//tkXw2qstWu6PK3tDW1pS/l3vydmD6w+loeONhhW3pErmZYilkwk4xAKe4E4ips9+8BFEztRbPhLfS2z4xsmW/uAExBED0SPRPvF3h79i0AuEU5EZSs/DxRfqz0IocA6BLeOFd/uD515bEe4mYFvSchlHFJqUWBZuWZMSvJt83iDMEK42 edEIeQNX s2cbaB5d39M+RNw48L0Rd/feJWv4+d60eUrGc1w0bnfEv9WYUOzrxdpkjtj2gCz99QleD7FxHLxMIIlwYxTrwym64k2GE2dRMv1FS2J7lymi1z+0y5OMYPIRgWIrP/6oW0poIapslqfjxePYZFsegACgreR8+I5IH1BJxaIIrUWwU6EmtZl7Eez0H3fmD3JeXQAeqtiwdTdqAwLdpGz7Hyav+BZxTH9PSYgLEn6TFlzSNPpSiiVYWPz9zUdE58jBkATvaiomhEZxLGt3ivWfWqax7Tjmty7PPE+DAptIfr6Hn/8JdmnNVTyZ2cTsu7crKAi8kBZm/hZLmuGb6sAiz57vkqGRNVu8Rl1FpIWL2kxmaxwxbY8X/n3QPWLX6g1hjUbdbevInv4SqYwFqVNpfBtHLZBmf0TO6uaH3pDNDpkEI2E6IsHYKityk6tq5MNCA3wuLn0QT2LqKUAf8BxvcUymml4K5eDagsaBnR9PwsaBXZos6mcOHX+prpzGQu0MsEcpvfppInhLpFuDUcvlGhXVL2JnYqe7g0lBI6m6gddrETeM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Fuad, On 5/16/25 5:56 PM, Fuad Tabba wrote: > On Fri, 16 May 2025 at 08:09, Gavin Shan wrote: >> On 5/14/25 2:34 AM, Fuad Tabba wrote: >>> This patch enables support for shared memory in guest_memfd, including >>> mapping that memory at the host userspace. This support is gated by the >>> configuration option KVM_GMEM_SHARED_MEM, and toggled by the guest_memfd >>> flag GUEST_MEMFD_FLAG_SUPPORT_SHARED, which can be set when creating a >>> guest_memfd instance. >>> >>> Co-developed-by: Ackerley Tng >>> Signed-off-by: Ackerley Tng >>> Signed-off-by: Fuad Tabba >>> --- >>> arch/x86/include/asm/kvm_host.h | 10 ++++ >>> include/linux/kvm_host.h | 13 +++++ >>> include/uapi/linux/kvm.h | 1 + >>> virt/kvm/Kconfig | 5 ++ >>> virt/kvm/guest_memfd.c | 88 +++++++++++++++++++++++++++++++++ >>> 5 files changed, 117 insertions(+) >>> >> >> [...] >> >>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >>> index 6db515833f61..8e6d1866b55e 100644 >>> --- a/virt/kvm/guest_memfd.c >>> +++ b/virt/kvm/guest_memfd.c >>> @@ -312,7 +312,88 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) >>> return gfn - slot->base_gfn + slot->gmem.pgoff; >>> } >>> >>> +#ifdef CONFIG_KVM_GMEM_SHARED_MEM >>> + >>> +static bool kvm_gmem_supports_shared(struct inode *inode) >>> +{ >>> + uint64_t flags = (uint64_t)inode->i_private; >>> + >>> + return flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED; >>> +} >>> + >>> +static vm_fault_t kvm_gmem_fault_shared(struct vm_fault *vmf) >>> +{ >>> + struct inode *inode = file_inode(vmf->vma->vm_file); >>> + struct folio *folio; >>> + vm_fault_t ret = VM_FAULT_LOCKED; >>> + >>> + filemap_invalidate_lock_shared(inode->i_mapping); >>> + >>> + folio = kvm_gmem_get_folio(inode, vmf->pgoff); >>> + if (IS_ERR(folio)) { >>> + int err = PTR_ERR(folio); >>> + >>> + if (err == -EAGAIN) >>> + ret = VM_FAULT_RETRY; >>> + else >>> + ret = vmf_error(err); >>> + >>> + goto out_filemap; >>> + } >>> + >>> + if (folio_test_hwpoison(folio)) { >>> + ret = VM_FAULT_HWPOISON; >>> + goto out_folio; >>> + } >>> + >>> + if (WARN_ON_ONCE(folio_test_large(folio))) { >>> + ret = VM_FAULT_SIGBUS; >>> + goto out_folio; >>> + } >>> + >> >> I don't think there is a large folio involved since the max/min folio order >> (stored in struct address_space::flags) should have been set to 0, meaning >> only order-0 is possible when the folio (page) is allocated and added to the >> page-cache. More details can be referred to AS_FOLIO_ORDER_MASK. It's unnecessary >> check but not harmful. Maybe a comment is needed to mention large folio isn't >> around yet, but double confirm. > > The idea is to document the lack of hugepage support in code, but if > you think it's necessary, I could add a comment. > Ok, I was actually nit-picky since we're at v9, which is close to integration, I guess. If another respin is needed, a comment wouldn't be harmful, but it's also perfectly fine without it :) > >> >>> + if (!folio_test_uptodate(folio)) { >>> + clear_highpage(folio_page(folio, 0)); >>> + kvm_gmem_mark_prepared(folio); >>> + } >>> + >> >> I must be missing some thing here. This chunk of code is out of sync to kvm_gmem_get_pfn(), >> where kvm_gmem_prepare_folio() and kvm_arch_gmem_prepare() are executed, and then >> PG_uptodate is set after that. In the latest ARM CCA series, kvm_arch_gmem_prepare() >> isn't used, but it would delegate the folio (page) with the prerequisite that >> the folio belongs to the private address space. >> >> I guess that kvm_arch_gmem_prepare() is skipped here because we have the assumption that >> the folio belongs to the shared address space? However, this assumption isn't always >> true. We probably need to ensure the folio range is really belonging to the shared >> address space by poking kvm->mem_attr_array, which can be modified by VMM through >> ioctl KVM_SET_MEMORY_ATTRIBUTES. > > This series only supports shared memory, and the idea is not to use > the attributes to check. We ensure that only certain VM types can set > the flag (e.g., VM_TYPE_DEFAULT and KVM_X86_SW_PROTECTED_VM). > > In the patch series that builds on it, with in-place conversion > between private and shared, we do add a check that the memory faulted > in is in-fact shared. > Ok, thanks for your clarification. I plan to review that series, but not getting a chance yet. Right, it's sensible to limit the capability of modifying page's attribute (private vs shared) to the particular machine types since the whole feature (restricted mmap and in-place conversion) is applicable to particular machine types. I can understand KVM_X86_SW_PROTECTED_VM (similar to pKVM) needs the feature, but I don't understand why VM_TYPE_DEFAULT needs the feature. I guess we may want to use guest-memfd as to tmpfs or shmem, meaning all the address space associated with a guest-memfd is shared, but without the corresponding private space pointed by struct kvm_userspace_memory_region2 ::userspace_addr. Instead, the 'userspace_addr' will be mmap(guest-memfd) from VMM's perspective if I'm correct. Thanks, Gavin > Thanks, > /fuad > >>> + vmf->page = folio_file_page(folio, vmf->pgoff); >>> + >>> +out_folio: >>> + if (ret != VM_FAULT_LOCKED) { >>> + folio_unlock(folio); >>> + folio_put(folio); >>> + } >>> + >>> +out_filemap: >>> + filemap_invalidate_unlock_shared(inode->i_mapping); >>> + >>> + return ret; >>> +} >>> + >> >> Thanks, >> Gavin >> >