From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99576C0219B for ; Tue, 11 Feb 2025 15:58:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 290BB280008; Tue, 11 Feb 2025 10:58:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 240D5280001; Tue, 11 Feb 2025 10:58:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10919280008; Tue, 11 Feb 2025 10:58:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E76CC280001 for ; Tue, 11 Feb 2025 10:58:15 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9A53DA044A for ; Tue, 11 Feb 2025 15:58:15 +0000 (UTC) X-FDA: 83108120550.12.4732684 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf30.hostedemail.com (Postfix) with ESMTP id C15DA8000B for ; Tue, 11 Feb 2025 15:58:13 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Z5kb+zmQ; spf=pass (imf30.hostedemail.com: domain of 3lHOrZwsKCG0LNVPcWPjeYRRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3lHOrZwsKCG0LNVPcWPjeYRRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739289493; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=KugYmNRHB/rqEl/hhsrUNCGrUgr0xKZ+F/TbEirDM6k=; b=1fWRRbBt6HCROWOabNklhSTVpV9WiKjnrMikgSIoqWDbcPsm6reemA8BPloFyTIFTzs1tD L3dxBLh7vXx7g2loMXPi0TF8Vh7k9kKDoU1cxbzmqQ7oEb0sYRGfPWkM93y3+1DXUseR3Z L5uCnoguYBGZSOzWQ6vBSxoAvg2t7/I= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Z5kb+zmQ; spf=pass (imf30.hostedemail.com: domain of 3lHOrZwsKCG0LNVPcWPjeYRRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--ackerleytng.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3lHOrZwsKCG0LNVPcWPjeYRRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739289493; a=rsa-sha256; cv=none; b=0UFFckp8w7EPpagHe08BaAnQ9gAGtaaVsA+x1bVdK8nKTIiUY8PJyfLWHHGgy154gr8341 c7bx0zoybbBKPcpkSrum9LeaXF0qo2W6Rz1nSqvrm/g2J/D4r6vy7eYic3bpJgC01sul0P zejXSa/8GZhuHkBEyj8kpoPsaa3mSX0= Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2fa29b4614aso8391510a91.1 for ; Tue, 11 Feb 2025 07:58:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739289492; x=1739894292; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=KugYmNRHB/rqEl/hhsrUNCGrUgr0xKZ+F/TbEirDM6k=; b=Z5kb+zmQVEaWclImoXXg7lOqMLn+c1qiOkRfgGtoezXE8gMdFEGPc5aRkluw1tc0ls 1g1Nvy2eRu9x0R5BlgblECrxlI6usbFK8F+MvLw5zq5PkFNm3LBz2Md/v1i1/pMSzH4a 1ahZMUP/9MQdlRReFpIi69aYH38GTRz26kmlCc2DF/I/eyBGrijgyf6rIiRGCJlHXdIT ow1AwzpFuw3Jkb1bzUvUZmL3jIHx6U5OLGZJ9FyyDIx68lj+JLrbFMlFfxdS1VIuyrus NbAEayWx8RTTt5r4IoS0HuOOBXtjssP14bDKREX9Kj6KxYzP+I5mxtZ52fqkFxdzdgaT VNzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739289492; x=1739894292; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KugYmNRHB/rqEl/hhsrUNCGrUgr0xKZ+F/TbEirDM6k=; b=K3D9Rb3KdK1N8s3EuQQQiEA9OiS24FaR9pbiLhl3zLbfl3dkk6zKyqdGqDZscDYxt+ Dtlav+XEdLbf5mGrs78HUkqpT9gUxFSTneAQVsUUEo7vXBi35N3zq3jM10hKBW9bLfSk eabkBL8MaATqm0Qr/D3RWz+Iy31v+oWcQBmhlqA8R/p5shP7w+zWCZ32HlDNvqkdC+Yi qpJ3NTdh0tOO1aZdbsphzvbvF5TIgPxQ/AEeg2bH2zF3cAk1pcXs24kbeXy7X2F+kYVQ n6GgpqVfYQXmXgjmNVKOK66hoVLD354h171j+5yM8U/zqG9kAV76Hc5cL5AmEChM1Qu5 QiDA== X-Forwarded-Encrypted: i=1; AJvYcCUFRTLrcFMHsxu1fDgdl9tNGnLP7ZtZ/+1E+sfXrKJYsdrckO17YsvXmDqNNmGUdSKBZu9UMbxf0w==@kvack.org X-Gm-Message-State: AOJu0YyNK09bOJ8464BC21Hlwsc4/j7LMjfFQgcXC/OtVQMzwZ5mhAbk 6v9PHLV+G0bkmiQSPEDDKVoBtToenoh21pnc4Cp8Jdro9KghU+TRTRfkL1BkJ+OlBc9b0sbrDBW sVZ1UJkow2auhLZYgPqwX1Q== X-Google-Smtp-Source: AGHT+IEwKVxw5NOUrw2sjKseSpgw8dhnRS7+vynrTOEPtOyl0zNsNiBtbqZzTyyHzPoVBnsReUJS+ENgEnR+DN/Q1A== X-Received: from pjbsq11.prod.google.com ([2002:a17:90b:530b:b0:2f7:ff61:48e7]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1807:b0:2fa:1d9f:c80 with SMTP id 98e67ed59e1d1-2faa099b33bmr5918634a91.17.1739289492486; Tue, 11 Feb 2025 07:58:12 -0800 (PST) Date: Tue, 11 Feb 2025 15:58:11 +0000 In-Reply-To: <186047ea-a782-494b-bfcf-f5088806bbb4@redhat.com> (message from Gavin Shan on Fri, 24 Jan 2025 14:25:56 +1000) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes From: Ackerley Tng To: Gavin Shan Cc: tabba@google.com, kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: C15DA8000B X-Stat-Signature: 3y7dr6dj4m65qjg9jn54a1e6i1q575z7 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1739289493-513477 X-HE-Meta: U2FsdGVkX19VTgFUzeJxmH/x0TYxJW912aaMOEDKM0+KfZOU4dxAt6Kzv7JImWz8E2Xw3WTl/DvXU1QNAcscxNgGOwTta8U3ehzBKkVa9AWcHhY5lck+3grdc1Z5FTrf10p0qIj3IrlUk4OtlHHtN830q/EEYNowMlP8oLYH8LbhfQwcGy8QG+97UgYUGmHVfwFvlkQZd8dTnbAG78XZuGAjTKNMZviFqDn7/tjadpiFo5Bo37whZVP5NT7GQCKIlClYcDnZ+mdpp+Q+iNZ/scEkVE7NfOxy5BDYqXKUsoX0y6wGIxlDxArRgs/vQFhINcVcBibxZ5YhMuFgMtGXtUhsY146aIa0ieuYL6+JbDjpGxJvoV01o6ozk4MzXZCQhVZkoWg35W3053J+rRqbNLs1ZOoTkCp5fSprYGUOjHDMr/apXzxWDfX7CwrSodnFdCVMqQ2pi9tI4hcYqLu4mkoGattwjSoOz8l0mPnqqjJ6YrVQq+d6ojJy3OhUytjmDjo2LEa29DlKw2FxzNvA1iBlaVUzhg83LCqgKcY2850eSK34H2BCCxWwF4t4ZeYTHDESDI4FRcPIihl/Eu5qDsZQY9dcgBUmhAfHU6Gja12PdEWAbMrUOXorIZeKr6sFjQnefyVNIymxVNUDYJwCOv4miZg9BZYFcCEZK0EKYfAyE7qCYQ507PnpAeYPCFXZYJwxyxamusvAoxwyz8iZcWVwa84RnG5cfgv+WytPHGDDzrSB1NAfemR5uFTb5LXCFFiN6PFIHWoXEBYNP7Z39z1xojzx1wqqG2dN4fU+egKdRTCKomM0X0d+GTlVHLMz7JAhroPcQbGWhEvCCJO8GYk3xC5oySDa0ybRCnludSyOCbT13igdWjosxiFSYjKUm4B5demz5TiboGRWksH7Y5Ms99/kLAfqv0mjlJyEcQb5ZfFe8Kcl5t0B9Wxca2yp/wL+FWE9feLF1JTiRcH wcP2eWbT uzBtKohe72ME28LI8HSY2e5aicsdCTzzyFQ9+NRiP2Ul/Y4SSdaDmHXeUFIy+MkcdHVDbUsogaVFpUQHkpWrZ7mwXm6ORDf6jgfD6R25kLCV4hWRwStAsv+hdZJmoYtOoqmloSayyVvLlW8Z5YtU5kFk1FZ2QY3cuqKGIMBn8y9rLXodZFehQHPN7O8ZemTq8nM9wdcwsHRShcNMV7O1veXQF9WbXuu9CvYwq24ONFS0JJba20dsZTQzfFsVpFGLPP5/SwXskA+L4m9kz5m9Kvcez3yCxjd9CSTpkFBRTmMLCYW/zOPkVVt6bGvEWVphM2RsQXvSZ6qP9EBjDnLT6165pEm5PhvA2K4yFUTZDUPTXGeVOn3Dj8L7KC5i/Vle6A01s5NNTJZctXqpm3v7HtE+ErNqCdjEFBBCLasWFkWRMXpR7LEV44b+amNAplTl9lajIaIozOPX13BiFJjRg2gvMU+9c8HFdp4WkdUAhzmtPMig53if/2LKFerGI8yBeDThjw8E2RjS4o8Ecg0MguGKGhHiu5AstS8AHUd1SSAqAXaFJHkzu1RIaiALt/S111l8GXR7d4g++XIj9uX3IaSR3IiBA3WZVwjQ90H8WiATQ/kXK4EZN3Cqkbi5NUA+TC5zt8l8A5wMZT1JDgxSfmrAKf+OTNtPifdf9dT2xvgDRqNJcMcaRH2O1iBHYjkrcK7gBSmgaJBWRhXw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Thanks for reviewing, Gavin! I'll also adopt these when I respin. Gavin Shan writes: > Hi Fuad, > > On 1/18/25 2:29 AM, Fuad Tabba wrote: >> From: Ackerley Tng >> >> Using guest mem inodes allows us to store metadata for the backing >> memory on the inode. Metadata will be added in a later patch to >> support HugeTLB pages. >> >> Metadata about backing memory should not be stored on the file, since >> the file represents a guest_memfd's binding with a struct kvm, and >> metadata about backing memory is not unique to a specific binding and >> struct kvm. >> >> Signed-off-by: Ackerley Tng >> Signed-off-by: Fuad Tabba >> --- >> include/uapi/linux/magic.h | 1 + >> virt/kvm/guest_memfd.c | 119 ++++++++++++++++++++++++++++++------- >> 2 files changed, 100 insertions(+), 20 deletions(-) >> >> diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h >> index bb575f3ab45e..169dba2a6920 100644 >> --- a/include/uapi/linux/magic.h >> +++ b/include/uapi/linux/magic.h >> @@ -103,5 +103,6 @@ >> #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ >> #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ >> #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ >> +#define GUEST_MEMORY_MAGIC 0x474d454d /* "GMEM" */ >> >> #endif /* __LINUX_MAGIC_H__ */ >> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >> index 47a9f68f7b24..198554b1f0b5 100644 >> --- a/virt/kvm/guest_memfd.c >> +++ b/virt/kvm/guest_memfd.c >> @@ -1,12 +1,17 @@ >> // SPDX-License-Identifier: GPL-2.0 >> +#include >> +#include > > This can be dropped since "linux/mount.h" has been included to "linux/fs.h". > >> #include >> #include >> #include >> +#include >> #include >> #include >> >> #include "kvm_mm.h" >> >> +static struct vfsmount *kvm_gmem_mnt; >> + >> struct kvm_gmem { >> struct kvm *kvm; >> struct xarray bindings; >> @@ -307,6 +312,38 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) >> return gfn - slot->base_gfn + slot->gmem.pgoff; >> } >> >> +static const struct super_operations kvm_gmem_super_operations = { >> + .statfs = simple_statfs, >> +}; >> + >> +static int kvm_gmem_init_fs_context(struct fs_context *fc) >> +{ >> + struct pseudo_fs_context *ctx; >> + >> + if (!init_pseudo(fc, GUEST_MEMORY_MAGIC)) >> + return -ENOMEM; >> + >> + ctx = fc->fs_private; >> + ctx->ops = &kvm_gmem_super_operations; >> + >> + return 0; >> +} >> + >> +static struct file_system_type kvm_gmem_fs = { >> + .name = "kvm_guest_memory", >> + .init_fs_context = kvm_gmem_init_fs_context, >> + .kill_sb = kill_anon_super, >> +}; >> + >> +static void kvm_gmem_init_mount(void) >> +{ >> + kvm_gmem_mnt = kern_mount(&kvm_gmem_fs); >> + BUG_ON(IS_ERR(kvm_gmem_mnt)); >> + >> + /* For giggles. Userspace can never map this anyways. */ >> + kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; >> +} >> + >> static struct file_operations kvm_gmem_fops = { >> .open = generic_file_open, >> .release = kvm_gmem_release, >> @@ -316,6 +353,8 @@ static struct file_operations kvm_gmem_fops = { >> void kvm_gmem_init(struct module *module) >> { >> kvm_gmem_fops.owner = module; >> + >> + kvm_gmem_init_mount(); >> } >> >> static int kvm_gmem_migrate_folio(struct address_space *mapping, >> @@ -397,11 +436,67 @@ static const struct inode_operations kvm_gmem_iops = { >> .setattr = kvm_gmem_setattr, >> }; >> >> +static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, >> + loff_t size, u64 flags) >> +{ >> + const struct qstr qname = QSTR_INIT(name, strlen(name)); >> + struct inode *inode; >> + int err; >> + >> + inode = alloc_anon_inode(kvm_gmem_mnt->mnt_sb); >> + if (IS_ERR(inode)) >> + return inode; >> + >> + err = security_inode_init_security_anon(inode, &qname, NULL); >> + if (err) { >> + iput(inode); >> + return ERR_PTR(err); >> + } >> + >> + inode->i_private = (void *)(unsigned long)flags; >> + inode->i_op = &kvm_gmem_iops; >> + inode->i_mapping->a_ops = &kvm_gmem_aops; >> + inode->i_mode |= S_IFREG; >> + inode->i_size = size; >> + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); >> + mapping_set_inaccessible(inode->i_mapping); >> + /* Unmovable mappings are supposed to be marked unevictable as well. */ >> + WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); >> + >> + return inode; >> +} >> + >> +static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, >> + u64 flags) >> +{ >> + static const char *name = "[kvm-gmem]"; >> + struct inode *inode; >> + struct file *file; >> + >> + if (kvm_gmem_fops.owner && !try_module_get(kvm_gmem_fops.owner)) >> + return ERR_PTR(-ENOENT); >> + > > The validation on 'kvm_gmem_fops.owner' can be removed since try_module_get() > and module_put() are friendly to a NULL parameter, even when CONFIG_MODULE_UNLOAD == N > > A module_put(kvm_gmem_fops.owner) is needed in the various erroneous cases in > this function. Otherwise, the reference count of the owner (module) will become > imbalanced on any errors. > Thanks for catching this! Will add module_put() for error paths. > >> + inode = kvm_gmem_inode_make_secure_inode(name, size, flags); >> + if (IS_ERR(inode)) >> + return ERR_CAST(inode); >> + > > ERR_CAST may be dropped since there is nothing to be casted or converted? > This cast is necessary as it casts from a struct inode * to a struct file *. >> + file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, >> + &kvm_gmem_fops); >> + if (IS_ERR(file)) { >> + iput(inode); >> + return file; >> + } >> + >> + file->f_mapping = inode->i_mapping; >> + file->f_flags |= O_LARGEFILE; >> + file->private_data = priv; >> + > > 'file->f_mapping = inode->i_mapping' may be dropped since it's already correctly > set by alloc_file_pseudo(). > > alloc_file_pseudo > alloc_path_pseudo > alloc_file > alloc_empty_file > file_init_path // Set by this function > Thanks! > >> + return file; >> +} >> + >> static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) >> { >> - const char *anon_name = "[kvm-gmem]"; >> struct kvm_gmem *gmem; >> - struct inode *inode; >> struct file *file; >> int fd, err; >> >> @@ -415,32 +510,16 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) >> goto err_fd; >> } >> >> - file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, >> - O_RDWR, NULL); >> + file = kvm_gmem_inode_create_getfile(gmem, size, flags); >> if (IS_ERR(file)) { >> err = PTR_ERR(file); >> goto err_gmem; >> } >> >> - file->f_flags |= O_LARGEFILE; >> - >> - inode = file->f_inode; >> - WARN_ON(file->f_mapping != inode->i_mapping); >> - >> - inode->i_private = (void *)(unsigned long)flags; >> - inode->i_op = &kvm_gmem_iops; >> - inode->i_mapping->a_ops = &kvm_gmem_aops; >> - inode->i_mode |= S_IFREG; >> - inode->i_size = size; >> - mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); >> - mapping_set_inaccessible(inode->i_mapping); >> - /* Unmovable mappings are supposed to be marked unevictable as well. */ >> - WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); >> - >> kvm_get_kvm(kvm); >> gmem->kvm = kvm; >> xa_init(&gmem->bindings); >> - list_add(&gmem->entry, &inode->i_mapping->i_private_list); >> + list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list); >> >> fd_install(fd, file); >> return fd; > > Thanks, > Gavin