From: Peter Xu <peterx@redhat.com>
To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Jason Gunthorpe <jgg@nvidia.com>, Nico Pache <npache@redhat.com>,
Zi Yan <ziy@nvidia.com>, Alex Mastro <amastro@fb.com>,
David Hildenbrand <david@redhat.com>,
Alex Williamson <alex@shazbot.org>, Zhi Wang <zhiw@nvidia.com>,
David Laight <david.laight.linux@gmail.com>,
Yi Liu <yi.l.liu@intel.com>, Ankit Agrawal <ankita@nvidia.com>,
peterx@redhat.com, Kevin Tian <kevin.tian@intel.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH v2 0/4] mm/vfio: huge pfnmaps with !MAP_FIXED mappings
Date: Thu, 4 Dec 2025 10:09:59 -0500 [thread overview]
Message-ID: <20251204151003.171039-1-peterx@redhat.com> (raw)
This series is based on v6.18. It allows mmap(!MAP_FIXED) to work with
huge pfnmaps with best effort. Meanwhile, it enables it for vfio-pci as
the first user.
v1: https://lore.kernel.org/r/20250613134111.469884-1-peterx@redhat.com
A changelog may not apply because all the patches were rewrote based on a
new interface this v2 introduced. Hence omitted.
In this version, a new file operation, get_mapping_order(), is introduced
(based on discussion with Jason on v1) to minimize the code needed for
drivers to implement this. It also helps avoid exporting any mm functions.
One can refer to the discussion in v1 for more information.
Currently, get_mapping_order() API is define as:
int (*get_mapping_order)(struct file *file, unsigned long pgoff, size_t len);
The first argument is the file pointer, the 2nd+3rd are the pgoff+len
specified from a mmap() request. The driver can use this interface to
opt-in providing mapping order hints to core mm on VA allocations for the
range of the file specified. I kept the interface as simple for now, so
that core mm will always do the alignment with pgoff assuming that would
always work. The driver can only report the order from pgoff+len, which
will be used to do the alignment.
Before this series, an userapp in most cases need to be modified to benefit
from huge mappings to provide huge size aligned VA using MAP_FIXED. After
this series, the userapp can benefit from huge pfnmap automatically after
the kernel upgrades, with no userspace modifications.
It's still best-effort, because the auto-alignment will require a larger VA
range to be allocated via the per-arch allocator, hence if the huge-mapping
aligned VA cannot be allocated then it'll still fallback to small mappings
like before. However that's from theory POV: in reality I don't yet know
when it'll fail especially when on a 64bits system.
So far, only vfio-pci is supported. But the logic should be applicable to
all the drivers that support or will support huge pfnmaps. I've copied
some more people in this version too from hardware perspective.
For testings:
- checkpatch.pl
- cross build harness
- unit test that I got from Alex [1], checking mmap() alignments on a QEMU
instance with an 128MB bar.
Checking the alignments look all sane with mmap(!MAP_FIXED), and huge
mappings properly installed. I didn't observe anything wrong.
I currently lack larger bars to test PUD sizes. Please kindly report if
one can run this with 1G+ bars and hit issues.
Alex Mastro: thanks for the testing offered in v1, but since this series
was rewritten, a re-test will be needed. I hence didn't collect the T-b.
Comments welcomed, thanks.
[1] https://github.com/awilliam/tests/blob/vfio-pci-device-map-alignment/vfio-pci-device-map-alignment.c
Peter Xu (4):
mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment
mm: Add file_operations.get_mapping_order()
vfio: Introduce vfio_device_ops.get_mapping_order hook
vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Documentation/filesystems/vfs.rst | 4 +++
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 49 ++++++++++++++++++++++++++
drivers/vfio/vfio_main.c | 14 ++++++++
include/linux/fs.h | 1 +
include/linux/huge_mm.h | 5 +--
include/linux/vfio.h | 5 +++
include/linux/vfio_pci_core.h | 2 ++
mm/huge_memory.c | 7 ++--
mm/mmap.c | 58 +++++++++++++++++++++++++++----
10 files changed, 135 insertions(+), 11 deletions(-)
--
2.50.1
next reply other threads:[~2025-12-04 15:10 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 15:09 Peter Xu [this message]
2025-12-04 15:10 ` [PATCH v2 1/4] mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment Peter Xu
2025-12-04 15:10 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Peter Xu
2025-12-04 15:19 ` Peter Xu
2025-12-04 15:10 ` [PATCH v2 3/4] vfio: Introduce vfio_device_ops.get_mapping_order hook Peter Xu
2025-12-04 15:10 ` [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-12-05 4:33 ` kernel test robot
2025-12-05 7:45 ` kernel test robot
2025-12-04 18:16 ` [PATCH v2 0/4] mm/vfio: " Cédric Le Goater
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251204151003.171039-1-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex@shazbot.org \
--cc=amastro@fb.com \
--cc=ankita@nvidia.com \
--cc=david.laight.linux@gmail.com \
--cc=david@redhat.com \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npache@redhat.com \
--cc=yi.l.liu@intel.com \
--cc=zhiw@nvidia.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox