From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50057C0218D for ; Wed, 29 Jan 2025 10:12:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D104A280005; Wed, 29 Jan 2025 05:12:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CBFD16B00C6; Wed, 29 Jan 2025 05:12:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6005280005; Wed, 29 Jan 2025 05:12:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 99C046B00C5 for ; Wed, 29 Jan 2025 05:12:44 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1DDD01A0AEB for ; Wed, 29 Jan 2025 10:12:44 +0000 (UTC) X-FDA: 83060075448.19.E7C15E1 Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf07.hostedemail.com (Postfix) with ESMTP id 36CAC40002 for ; Wed, 29 Jan 2025 10:12:42 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ch7SY32q; spf=pass (imf07.hostedemail.com: domain of tabba@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738145562; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kzQG2opZhhOKKY7Si+vReF11GIIp44bAwoNf+bF++fc=; b=dZATYhBfvTyv9BU9F1qurIGZkwOm1O1hV0Vx/U+I3u2xxHW9k/fbVbTyd1M9S/My9FgHSi NsE96JWkHPOX1IYcxIDXjdeRFQx5lo1rLcBDKau9eeyaIyLCM7j527IJfueWj3S5KMoEK1 tFjZvpFIm+U6FscUZZnogyZyAN9R+B4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738145562; a=rsa-sha256; cv=none; b=1AV2sZcpMtenJsISTxRBDk8jQpO/i3CoB7+HBZcEXGxu9J3nn4/zW7gS66XOijRSWyWv8i N+Fz5iZuE1UzWxkwQotObSZYOrkR0Tvv115tARdcPYvj/dS5LR34+pE3FSjqGAkb6RZ6Eh 7dDAEghQGk1SoSNDtPV4MBjEAM/+ATw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ch7SY32q; spf=pass (imf07.hostedemail.com: domain of tabba@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-4679b5c66d0so493681cf.1 for ; Wed, 29 Jan 2025 02:12:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1738145561; x=1738750361; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=kzQG2opZhhOKKY7Si+vReF11GIIp44bAwoNf+bF++fc=; b=Ch7SY32qo3uD/Gg05ti9SPyDh2IZd6kv1F7+vJiQ9DZtq9RNrHNUbs+PCk2OjOsUio mzhv/omjyg4v6Uu1/f+Sdb5GcS4rxstJ+y5lqJ2WZBW7MZ9DLHbGIQezPPGliRgezb10 b9cFYZ1vODDzQJLUgOsX9DRNMU8Z0NCQ6MPc2e46yy0+K4v9Zt1XxM3tsMtSqAeRM0oq CZJMPHB8mebG83mw7NBt/XYVi9qLv+deCaVQzwwlvRoHjF7QZrF+oEuXvxbRQNkR6NZj cdF9I8535B80BJuOtDKDGZMSX9cnvbXW0f27zMZTxv6PfeWOLOEVVSWw8cr0+K5uXjX8 oULQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738145561; x=1738750361; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kzQG2opZhhOKKY7Si+vReF11GIIp44bAwoNf+bF++fc=; b=YZqXxFI/n74OYTL5utxNzNVdwgrCSYxzk8vpXPr4hc1038PO3tAc1AboIKhBEMIXZu CXzEPFkD6UzM+GQbwyQnLb4bOWSOE4gScsUHeOk/uKcCDN91CtXt3B7dk2zA/vhKL+o0 I/dOYBOIdKxVYegJen5miPh2EZeg8ERgbmlb1mHQ7EikxUUEhCdkimNEJN8geBBO34YY AJZ9kGalA3aD7Gs/CZ5YVkoB9KfcoMy7rjZy9u/CEgEf+yoV5VEHFJfetc3V7GiEr+38 aV0bUd6T+nr49QW+0GHtI+KR3wsju0B/iqIpxb4RkNrduI/jvDQoU1f5A1k2hAoF/1nx OG+g== X-Forwarded-Encrypted: i=1; AJvYcCXMR69Jpq1HY0m545KTQIhuXQ0eBddEImvU3dk4Zt0kqFuiLuEdpKXgwrWJ7VT1wpQ7FLsj24Ce+Q==@kvack.org X-Gm-Message-State: AOJu0YzmvgtqubNdumwnf9z/dFMCC+51LZUJiITXxtpjubm/FbpIhVea vdqNPubTJH/K/I5SQNgWgsf8oy4A9YQ5AgoQUIIxLb1F4Xyl2YbP0kEzZHSBi8Qan27p1xOi8f9 tjBqRsuKiIVax0jXrXY4JgEAfrtr53dq/C3cR X-Gm-Gg: ASbGnct9TJf6DWU0Ik7I5cMDljSTRrlkUvRYMDd0EkcFn+zo4x9GrPID54czFUPu5kO HSmXuvEU1JDBdkwbNLpEUroJ9UBkMs9WuVMjzIA9sQgMyhQJOtSUyFPTlsWcVB24J/7lkF2Skl5 QdKhDUZoyqwkTNf+ztIsl/c87063ViQQR8gi7z X-Google-Smtp-Source: AGHT+IEkJ52Lo2mOYxnqSwPjqqO5/Xoqdep/U5y1rZZYTLMHPafFjxzzbs2yJhPTmMNiun4xWSRUswxE+qBtG5Us8zA= X-Received: by 2002:ac8:7dc2:0:b0:466:7a06:2d03 with SMTP id d75a77b69052e-46fd274a261mr1990311cf.1.1738145560803; Wed, 29 Jan 2025 02:12:40 -0800 (PST) MIME-Version: 1.0 References: <20250117163001.2326672-1-tabba@google.com> <20250117163001.2326672-3-tabba@google.com> <186047ea-a782-494b-bfcf-f5088806bbb4@redhat.com> In-Reply-To: <186047ea-a782-494b-bfcf-f5088806bbb4@redhat.com> From: Fuad Tabba Date: Wed, 29 Jan 2025 10:12:04 +0000 X-Gm-Features: AWEUYZlRk_QxpmELFibDAQ0cRPeKANrt0PkWChtkN5Z_TriadREt_68-o2Y91UQ Message-ID: Subject: Re: [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes To: Gavin Shan Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: q91sb7apxkwwpc4eu6y1jhd4b7brwgu5 X-Rspamd-Queue-Id: 36CAC40002 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1738145562-656104 X-HE-Meta: U2FsdGVkX1/qIrSx0QR0fMDrHXJNOzdy1vL+nIVVv/5yZDVcH79a+nzt+DULqrQW0egpyuGB4iASa08vKXNC83kHdkqzScUYZmH0v0iljUt/3gBDiuzDkmmQfULooSypwSUlu3h5QXgDmImY5fgVT22ioae/bzOau3sHIhEvWLYiQKZR8QncU/l567EHHLBKg/OtMTED2MCA9v0WetwUGehGkeSrJ6WUphfvW1jF17qtYqH56ZCmRB8gYWQ05RHBUJWOW7HpK2cHGCNRdlbC8JiTsgpuq0i5BpZ/qIkLwg5Ab2eQ06CvjXF9aw95NGxp9U3DPzUkehkzxUSt4HZ5CSSDPe2gbskG0yhRISl3phsjZHG5Y6u05gdQaGL2fhUxb17FwnKF71DyOjIN11X3rYIGMhLI8+9Icohpb0AIiOjdxgdM6CPVraXtVt+zoK2MJoU3+rZqDmd5dIHKbHW1/nmjTfCOIFSw/w4CjnbrWHXcXuz7Jfo7egt20firZBo50o+DxFTajU2eC2EsYX59A2R4jHHwaitZYDNb7TEAaBAmsdpLaswMSmddNFOId0MLfhsjzuN1iOu+eSdf3ICwAeZEBTymDGvCbj7GtwiAsOi03d1Tvea/VxqTjfRT1RxvYJOvhf6lHTaNwI9xGJ07gtnGlNpU9ipPMRCpXVex78vbcyJOWBAnc+hpOxKSrTGUDhbNDcK2kHUeSvF7gaOT1gy4Y9cdaJwtUhEh3S5NKE+ejMz+G6rd2wWSHOWmuNPE0l/W5GipO03T2B0coUmNFFY25dOp3xojvwgGiIyQHfoHsJnpGxTUqtqleA9S0SBFk5ykqrIwpRBo/OqJUle67Dze+eyzJawDco4qGUrE2KkS/mOIbejOX7QfydW6DsaBJvqu2IrWQFNbLFwfFps1ibOKTloomnaUw390i0PDpw6iLgjkOkITRyjhE9ZXK32M6gRkYtLLaxioEx6G5IB eqelM1y1 GOg4x7iAyLVA7Zv5i2xPLr01VEctdYBeiws2AZxgBk+GRKKYNmzFgNcXvMbcNt/TPTyMUFQjpsDWLMDjxGEXG6McbKvFdUXLFpq+itS8yAB7kyeuFRhlxqtspmcWxjBDpKLRMempy3XqSd9XoS7FZQV5Kgr9UD1pn6OLe0cTVDRPoLk2DX3VrKN8EQb8cYUSzIWdWeRxmYx0jmuISe5FLwHfmTuDsDUC5Y2EfBfkNV0v4J3+ezPMISi2FvAcbDduCjXFCVade+fHqmPCwErYdVGNCxbiM3BQhx/+qsO/IvgadllKG/7lABMzSqYSsEdfYdIv0345AI7UUFhlsAGSimXCO2QHjj2diDnudD6LdAAhHLF0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Gavin, On Fri, 24 Jan 2025 at 04:26, Gavin Shan wrote: > > Hi Fuad, > > On 1/18/25 2:29 AM, Fuad Tabba wrote: > > From: Ackerley Tng > > > > Using guest mem inodes allows us to store metadata for the backing > > memory on the inode. Metadata will be added in a later patch to > > support HugeTLB pages. > > > > Metadata about backing memory should not be stored on the file, since > > the file represents a guest_memfd's binding with a struct kvm, and > > metadata about backing memory is not unique to a specific binding and > > struct kvm. > > > > Signed-off-by: Ackerley Tng > > Signed-off-by: Fuad Tabba > > --- > > include/uapi/linux/magic.h | 1 + > > virt/kvm/guest_memfd.c | 119 ++++++++++++++++++++++++++++++------- > > 2 files changed, 100 insertions(+), 20 deletions(-) > > > > diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h > > index bb575f3ab45e..169dba2a6920 100644 > > --- a/include/uapi/linux/magic.h > > +++ b/include/uapi/linux/magic.h > > @@ -103,5 +103,6 @@ > > #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ > > #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ > > #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ > > +#define GUEST_MEMORY_MAGIC 0x474d454d /* "GMEM" */ > > > > #endif /* __LINUX_MAGIC_H__ */ > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index 47a9f68f7b24..198554b1f0b5 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -1,12 +1,17 @@ > > // SPDX-License-Identifier: GPL-2.0 > > +#include > > +#include > > This can be dropped since "linux/mount.h" has been included to "linux/fs.h". > > > #include > > #include > > #include > > +#include > > #include > > #include > > > > #include "kvm_mm.h" > > > > +static struct vfsmount *kvm_gmem_mnt; > > + > > struct kvm_gmem { > > struct kvm *kvm; > > struct xarray bindings; > > @@ -307,6 +312,38 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) > > return gfn - slot->base_gfn + slot->gmem.pgoff; > > } > > > > +static const struct super_operations kvm_gmem_super_operations = { > > + .statfs = simple_statfs, > > +}; > > + > > +static int kvm_gmem_init_fs_context(struct fs_context *fc) > > +{ > > + struct pseudo_fs_context *ctx; > > + > > + if (!init_pseudo(fc, GUEST_MEMORY_MAGIC)) > > + return -ENOMEM; > > + > > + ctx = fc->fs_private; > > + ctx->ops = &kvm_gmem_super_operations; > > + > > + return 0; > > +} > > + > > +static struct file_system_type kvm_gmem_fs = { > > + .name = "kvm_guest_memory", > > + .init_fs_context = kvm_gmem_init_fs_context, > > + .kill_sb = kill_anon_super, > > +}; > > + > > +static void kvm_gmem_init_mount(void) > > +{ > > + kvm_gmem_mnt = kern_mount(&kvm_gmem_fs); > > + BUG_ON(IS_ERR(kvm_gmem_mnt)); > > + > > + /* For giggles. Userspace can never map this anyways. */ > > + kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC; > > +} > > + > > static struct file_operations kvm_gmem_fops = { > > .open = generic_file_open, > > .release = kvm_gmem_release, > > @@ -316,6 +353,8 @@ static struct file_operations kvm_gmem_fops = { > > void kvm_gmem_init(struct module *module) > > { > > kvm_gmem_fops.owner = module; > > + > > + kvm_gmem_init_mount(); > > } > > > > static int kvm_gmem_migrate_folio(struct address_space *mapping, > > @@ -397,11 +436,67 @@ static const struct inode_operations kvm_gmem_iops = { > > .setattr = kvm_gmem_setattr, > > }; > > > > +static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, > > + loff_t size, u64 flags) > > +{ > > + const struct qstr qname = QSTR_INIT(name, strlen(name)); > > + struct inode *inode; > > + int err; > > + > > + inode = alloc_anon_inode(kvm_gmem_mnt->mnt_sb); > > + if (IS_ERR(inode)) > > + return inode; > > + > > + err = security_inode_init_security_anon(inode, &qname, NULL); > > + if (err) { > > + iput(inode); > > + return ERR_PTR(err); > > + } > > + > > + inode->i_private = (void *)(unsigned long)flags; > > + inode->i_op = &kvm_gmem_iops; > > + inode->i_mapping->a_ops = &kvm_gmem_aops; > > + inode->i_mode |= S_IFREG; > > + inode->i_size = size; > > + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); > > + mapping_set_inaccessible(inode->i_mapping); > > + /* Unmovable mappings are supposed to be marked unevictable as well. */ > > + WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); > > + > > + return inode; > > +} > > + > > +static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, > > + u64 flags) > > +{ > > + static const char *name = "[kvm-gmem]"; > > + struct inode *inode; > > + struct file *file; > > + > > + if (kvm_gmem_fops.owner && !try_module_get(kvm_gmem_fops.owner)) > > + return ERR_PTR(-ENOENT); > > + > > The validation on 'kvm_gmem_fops.owner' can be removed since try_module_get() > and module_put() are friendly to a NULL parameter, even when CONFIG_MODULE_UNLOAD == N > > A module_put(kvm_gmem_fops.owner) is needed in the various erroneous cases in > this function. Otherwise, the reference count of the owner (module) will become > imbalanced on any errors. > > > > + inode = kvm_gmem_inode_make_secure_inode(name, size, flags); > > + if (IS_ERR(inode)) > > + return ERR_CAST(inode); > > + > > ERR_CAST may be dropped since there is nothing to be casted or converted? > > > + file = alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, > > + &kvm_gmem_fops); > > + if (IS_ERR(file)) { > > + iput(inode); > > + return file; > > + } > > + > > + file->f_mapping = inode->i_mapping; > > + file->f_flags |= O_LARGEFILE; > > + file->private_data = priv; > > + > > 'file->f_mapping = inode->i_mapping' may be dropped since it's already correctly > set by alloc_file_pseudo(). > > alloc_file_pseudo > alloc_path_pseudo > alloc_file > alloc_empty_file > file_init_path // Set by this function Thanks for the fixes. Will include them when we respin. Cheers, /fuad > > + return file; > > +} > > + > > static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) > > { > > - const char *anon_name = "[kvm-gmem]"; > > struct kvm_gmem *gmem; > > - struct inode *inode; > > struct file *file; > > int fd, err; > > > > @@ -415,32 +510,16 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) > > goto err_fd; > > } > > > > - file = anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, > > - O_RDWR, NULL); > > + file = kvm_gmem_inode_create_getfile(gmem, size, flags); > > if (IS_ERR(file)) { > > err = PTR_ERR(file); > > goto err_gmem; > > } > > > > - file->f_flags |= O_LARGEFILE; > > - > > - inode = file->f_inode; > > - WARN_ON(file->f_mapping != inode->i_mapping); > > - > > - inode->i_private = (void *)(unsigned long)flags; > > - inode->i_op = &kvm_gmem_iops; > > - inode->i_mapping->a_ops = &kvm_gmem_aops; > > - inode->i_mode |= S_IFREG; > > - inode->i_size = size; > > - mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); > > - mapping_set_inaccessible(inode->i_mapping); > > - /* Unmovable mappings are supposed to be marked unevictable as well. */ > > - WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); > > - > > kvm_get_kvm(kvm); > > gmem->kvm = kvm; > > xa_init(&gmem->bindings); > > - list_add(&gmem->entry, &inode->i_mapping->i_private_list); > > + list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list); > > > > fd_install(fd, file); > > return fd; > > Thanks, > Gavin >