From: James Gowans <jgowans@amazon.com>
To: <linux-kernel@vger.kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>,
<kexec@lists.infradead.org>, "Joerg Roedel" <joro@8bytes.org>,
Will Deacon <will@kernel.org>, <iommu@lists.linux.dev>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"Christian Brauner" <brauner@kernel.org>,
<linux-fsdevel@vger.kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>, <kvm@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
Alexander Graf <graf@amazon.com>,
David Woodhouse <dwmw@amazon.co.uk>,
"Jan H . Schoenherr" <jschoenh@amazon.de>,
Usama Arif <usama.arif@bytedance.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
<madvenka@linux.microsoft.com>, <steven.sistare@oracle.com>,
<yuleixzhang@tencent.com>
Subject: [RFC 05/18] pkernfs: add file mmap callback
Date: Mon, 5 Feb 2024 12:01:50 +0000 [thread overview]
Message-ID: <20240205120203.60312-6-jgowans@amazon.com> (raw)
In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com>
Make the file data useable to userspace by adding mmap. That's all that
QEMU needs for guest RAM, so that's all be bother implementing for now.
When mmaping the file the VMA is marked as PFNMAP to indicate that there
are no struct pages for the memory in this VMA. Remap_pfn_range() is
used to actually populate the page tables. All PTEs are pre-faulted into
the pgtables at mmap time so that the pgtables are useable when this
virtual address range is given to VFIO's MAP_DMA.
---
fs/pkernfs/file.c | 42 +++++++++++++++++++++++++++++++++++++++++-
fs/pkernfs/pkernfs.c | 2 +-
fs/pkernfs/pkernfs.h | 2 ++
3 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/fs/pkernfs/file.c b/fs/pkernfs/file.c
index 27a637423178..844b6cc63840 100644
--- a/fs/pkernfs/file.c
+++ b/fs/pkernfs/file.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
#include "pkernfs.h"
+#include <linux/mm.h>
static int truncate(struct inode *inode, loff_t newsize)
{
@@ -42,6 +43,45 @@ static int inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry, struct
return 0;
}
+/*
+ * To be able to use PFNMAP VMAs for VFIO DMA mapping we need the page tables
+ * populated with mappings. Pre-fault everything.
+ */
+static int mmap(struct file *filp, struct vm_area_struct *vma)
+{
+ int rc;
+ unsigned long *mappings_block;
+ struct pkernfs_inode *pkernfs_inode;
+
+ pkernfs_inode = pkernfs_get_persisted_inode(filp->f_inode->i_sb, filp->f_inode->i_ino);
+
+ mappings_block = (unsigned long *)pkernfs_addr_for_block(filp->f_inode->i_sb,
+ pkernfs_inode->mappings_block);
+
+ /* Remap-pfn-range will mark the range VM_IO */
+ for (unsigned long vma_addr_offset = vma->vm_start;
+ vma_addr_offset < vma->vm_end;
+ vma_addr_offset += PMD_SIZE) {
+ int block, mapped_block;
+
+ block = (vma_addr_offset - vma->vm_start) / PMD_SIZE;
+ mapped_block = *(mappings_block + block);
+ /*
+ * It's wrong to use rempa_pfn_range; this will install PTE-level entries.
+ * The whole point of 2 MiB allocs is to improve TLB perf!
+ * We should use something like mm/huge_memory.c#insert_pfn_pmd
+ * but that is currently static.
+ * TODO: figure out the best way to install PMDs.
+ */
+ rc = remap_pfn_range(vma,
+ vma_addr_offset,
+ (pkernfs_base >> PAGE_SHIFT) + (mapped_block * 512),
+ PMD_SIZE,
+ vma->vm_page_prot);
+ }
+ return 0;
+}
+
const struct inode_operations pkernfs_file_inode_operations = {
.setattr = inode_setattr,
.getattr = simple_getattr,
@@ -49,5 +89,5 @@ const struct inode_operations pkernfs_file_inode_operations = {
const struct file_operations pkernfs_file_fops = {
.owner = THIS_MODULE,
- .iterate_shared = NULL,
+ .mmap = mmap,
};
diff --git a/fs/pkernfs/pkernfs.c b/fs/pkernfs/pkernfs.c
index 199c2c648bca..f010c2d76c76 100644
--- a/fs/pkernfs/pkernfs.c
+++ b/fs/pkernfs/pkernfs.c
@@ -7,7 +7,7 @@
#include <linux/fs_context.h>
#include <linux/io.h>
-static phys_addr_t pkernfs_base, pkernfs_size;
+phys_addr_t pkernfs_base, pkernfs_size;
void *pkernfs_mem;
static const struct super_operations pkernfs_super_ops = { };
diff --git a/fs/pkernfs/pkernfs.h b/fs/pkernfs/pkernfs.h
index 8b4fee8c5b2e..1a7aa783a9be 100644
--- a/fs/pkernfs/pkernfs.h
+++ b/fs/pkernfs/pkernfs.h
@@ -6,6 +6,8 @@
#define PKERNFS_FILENAME_LEN 255
extern void *pkernfs_mem;
+/* Units of bytes */
+extern phys_addr_t pkernfs_base, pkernfs_size;
struct pkernfs_sb {
unsigned long magic_number;
--
2.40.1
next prev parent reply other threads:[~2024-02-05 12:03 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-05 12:01 [RFC 00/18] Pkernfs: Support persistence for live update James Gowans
2024-02-05 12:01 ` [RFC 01/18] pkernfs: Introduce filesystem skeleton James Gowans
2024-02-05 12:01 ` [RFC 02/18] pkernfs: Add persistent inodes hooked into directies James Gowans
2024-02-05 12:01 ` [RFC 03/18] pkernfs: Define an allocator for persistent pages James Gowans
2024-02-05 12:01 ` [RFC 04/18] pkernfs: support file truncation James Gowans
2024-02-05 12:01 ` James Gowans [this message]
2024-02-05 23:34 ` [RFC 05/18] pkernfs: add file mmap callback Dave Chinner
2024-02-05 12:01 ` [RFC 06/18] init: Add liveupdate cmdline param James Gowans
2024-02-05 12:01 ` [RFC 07/18] pkernfs: Add file type for IOMMU root pgtables James Gowans
2024-02-05 12:01 ` [RFC 08/18] iommu: Add allocator for pgtables from persistent region James Gowans
2024-02-05 12:01 ` [RFC 09/18] intel-iommu: Use pkernfs for root/context pgtable pages James Gowans
2024-02-05 12:01 ` [RFC 10/18] iommu/intel: zap context table entries on kexec James Gowans
2024-02-05 12:01 ` [RFC 11/18] dma-iommu: Always enable deferred attaches for liveupdate James Gowans
2024-02-05 17:45 ` Jason Gunthorpe
2024-02-05 12:01 ` [RFC 12/18] pkernfs: Add IOMMU domain pgtables file James Gowans
2024-02-05 12:01 ` [RFC 13/18] vfio: add ioctl to define persistent pgtables on container James Gowans
2024-02-05 17:08 ` Jason Gunthorpe
2024-02-05 12:01 ` [RFC 14/18] intel-iommu: Allocate domain pgtable pages from pkernfs James Gowans
2024-02-05 17:12 ` Jason Gunthorpe
2024-02-05 12:02 ` [RFC 15/18] pkernfs: register device memory for IOMMU domain pgtables James Gowans
2024-02-05 12:02 ` [RFC 16/18] vfio: support not mapping IOMMU pgtables on live-update James Gowans
2024-02-05 12:02 ` [RFC 17/18] pci: Don't clear bus master is persistence enabled James Gowans
2024-02-05 12:02 ` [RFC 18/18] vfio-pci: Assume device working after liveupdate James Gowans
2024-02-05 17:10 ` [RFC 00/18] Pkernfs: Support persistence for live update Alex Williamson
2024-02-07 14:56 ` Gowans, James
2024-02-07 15:28 ` Jason Gunthorpe
2024-02-05 17:42 ` Jason Gunthorpe
2024-02-07 14:45 ` Gowans, James
2024-02-07 15:22 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240205120203.60312-6-jgowans@amazon.com \
--to=jgowans@amazon.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=brauner@kernel.org \
--cc=dwmw@amazon.co.uk \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=jschoenh@amazon.de \
--cc=kexec@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=madvenka@linux.microsoft.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=steven.sistare@oracle.com \
--cc=usama.arif@bytedance.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=yuleixzhang@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox