From: Peter Xu <peterx@redhat.com>
To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>, Nico Pache <npache@redhat.com>,
Zi Yan <ziy@nvidia.com>, Alex Mastro <amastro@fb.com>,
David Hildenbrand <david@redhat.com>,
Alex Williamson <alex@shazbot.org>, Zhi Wang <zhiw@nvidia.com>,
David Laight <david.laight.linux@gmail.com>,
Yi Liu <yi.l.liu@intel.com>, Ankit Agrawal <ankita@nvidia.com>,
peterx@redhat.com, Kevin Tian <kevin.tian@intel.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Date: Thu, 4 Dec 2025 10:10:03 -0500 [thread overview]
Message-ID: <20251204151003.171039-5-peterx@redhat.com> (raw)
In-Reply-To: <20251204151003.171039-1-peterx@redhat.com>
This patch enables best-effort mmap() for vfio-pci bars even without
MAP_FIXED, so as to utilize huge pfnmaps as much as possible. It should
also avoid userspace changes (switching to MAP_FIXED with pre-aligned VA
addresses) to start enabling huge pfnmaps on VFIO bars.
Here the trick is making sure the MMIO PFNs will be aligned with the VAs
allocated from mmap() when !MAP_FIXED, so that whatever returned from
mmap(!MAP_FIXED) of vfio-pci MMIO regions will be automatically suitable
for huge pfnmaps as much as possible.
To achieve that, a custom vfio_device's get_mapping_hint() for vfio-pci
devices is needed.
Note that BAR's MMIO physical addresses should normally be guaranteed to be
BAR-size aligned. It means the MMIO address will also always be aligned
with vfio-pci's file offset address space, per VFIO_PCI_OFFSET_SHIFT.
With that guaranteed, VA allocator can calculate the alignment with pgoff,
which will be further aligned with the MMIO physical addresses to be mapped
in the VMA later.
So far, stick with the simple plan to rely on the hardware assumption that
should always be true. Leave it for later if pgoff needs adjustments when
there's a real demand of it when calculating the alignment.
For discussion on the requirement of this feature, see:
https://lore.kernel.org/linux-pci/20250529214414.1508155-1-amastro@fb.com/
Signed-off-by: Peter Xu <peterx@redhat.com>
---
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 49 ++++++++++++++++++++++++++++++++
include/linux/vfio_pci_core.h | 2 ++
3 files changed, 52 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index ac10f14417f2f..8f29037cee6eb 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -145,6 +145,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
.detach_ioas = vfio_iommufd_physical_detach_ioas,
.pasid_attach_ioas = vfio_iommufd_physical_pasid_attach_ioas,
.pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas,
+ .get_mapping_order = vfio_pci_core_get_mapping_order,
};
static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 7dcf5439dedc9..28ab37715acc0 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1640,6 +1640,55 @@ static unsigned long vma_to_pfn(struct vm_area_struct *vma)
return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff;
}
+/*
+ * Hint function for mmap() about the size of mapping to be carried out.
+ * This helps to enable huge pfnmaps as much as possible on BAR mappings.
+ *
+ * This function does the minimum check on mmap() parameters to make the
+ * hint valid only. The majority of mmap() sanity check will be done later
+ * in mmap().
+ */
+int vfio_pci_core_get_mapping_order(struct vfio_device *device,
+ unsigned long pgoff, size_t len)
+{
+ struct vfio_pci_core_device *vdev =
+ container_of(device, struct vfio_pci_core_device, vdev);
+ struct pci_dev *pdev = vdev->pdev;
+ unsigned int index = pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
+ unsigned long req_start;
+ size_t phys_len;
+
+ /* Currently, only bars 0-5 supports huge pfnmap */
+ if (index >= VFIO_PCI_ROM_REGION_INDEX)
+ return 0;
+
+ /*
+ * NOTE: we're keeping things simple as of now, assuming the
+ * physical address of BARs (aka, pci_resource_start(pdev, index))
+ * should always be aligned with pgoff in vfio-pci's address space.
+ */
+ req_start = (pgoff << PAGE_SHIFT) & ((1UL << VFIO_PCI_OFFSET_SHIFT) - 1);
+ phys_len = PAGE_ALIGN(pci_resource_len(pdev, index));
+
+ /*
+ * If this happens, it will probably fail mmap() later.. mapping
+ * hint isn't important anymore.
+ */
+ if (req_start >= phys_len)
+ return 0;
+
+ phys_len = MIN(phys_len - req_start, len);
+
+ if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PUD_PFNMAP) && phys_len >= PUD_SIZE)
+ return PUD_ORDER;
+
+ if (IS_ENABLED(CONFIG_ARCH_SUPPORTS_PMD_PFNMAP) && phys_len >= PMD_SIZE)
+ return PMD_ORDER;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_get_mapping_order);
+
static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
unsigned int order)
{
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index f541044e42a2a..d320dfacc5681 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -119,6 +119,8 @@ ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
size_t count, loff_t *ppos);
ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
size_t count, loff_t *ppos);
+int vfio_pci_core_get_mapping_order(struct vfio_device *device,
+ unsigned long pgoff, size_t len);
int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma);
void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count);
int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
--
2.50.1
next prev parent reply other threads:[~2025-12-04 15:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 15:09 [PATCH v2 0/4] mm/vfio: " Peter Xu
2025-12-04 15:10 ` [PATCH v2 1/4] mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment Peter Xu
2025-12-04 15:10 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Peter Xu
2025-12-04 15:19 ` Peter Xu
2025-12-04 15:10 ` [PATCH v2 3/4] vfio: Introduce vfio_device_ops.get_mapping_order hook Peter Xu
2025-12-04 15:10 ` Peter Xu [this message]
2025-12-05 4:33 ` [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings kernel test robot
2025-12-05 7:45 ` kernel test robot
2025-12-04 18:16 ` [PATCH v2 0/4] mm/vfio: " Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251204151003.171039-5-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=david.laight.linux@gmail.com \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=yi.l.liu@intel.com \
--cc=zhiw@nvidia.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox