linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO
@ 2025-05-29 21:44 Alex Mastro
  2025-05-30 13:10 ` Jason Gunthorpe
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Mastro @ 2025-05-29 21:44 UTC (permalink / raw)
  To: linux-pci; +Cc: alex.williamson, jgg, peterx, kbusch, linux-mm

Hello,

We are running user space drivers in production on top of VFIO, and after
upgrading from v6.9.0 to v6.13.2 noticed intermittent, slow performance leading
to "rcu_sched self-detected stall" when issuing VFIO_IOMMU_MAP_DMA on ~64 GiB
mmap-ed BAR regions. When doing this on enough devices concurrently, we
triggered softlockup_panic. The mmap-ed BAR regions were obtained from mmap on
a VFIO device fd.

We map regions > 1G, which sometimes do not start at 1G-aligned BAR offsets,
but they are always aligned by at least 2 MiB.

We determined that slow, stalling runs were correlated with 4 KiB-aligned
addresses returned by mmap, and normal runs with >= 2 MiB alignment.

Inspired by QEMU's mmap-alloc.c, we are handling this by reserving VA with an
oversized mmap, and then clobbering with MAP_FIXED at a good address inside the
reservation with the mmap on the VFIO device fd.

At first we settled for aligning the mmap address to {1 GiB, 2 MiB} exactly,
and the stalls disappeared, but then improved performance with the following:

We found that the best addresses to pass to VFIO_IOMMU_MAP_DMA have the
following properties, where va_align and va_offset are chosen based on the size
and BAR offsets of the desired mapping.

va_align = {1 GiB, 2 MiB, 4 KiB}
va_offset = mmap_offset % va_align
(addr_to_mmap % va_align) == va_offset

Using addresses with the above properties seems to optimize the count and
granularity of faults as confirmed by bpftrace-ing vfio_pci_mmap_huge_fault.

We then backported "Improve DMA mapping performance for huge pfnmaps" [1] to
our 6.13 tree, and saw further performance improvements consistent with those
described in the patch (thank you!). However, with the backport, we still need
to align mmap addresses manually, otherwise we see stalls.

We are wondering the following:
- Is all of the above expected behavior, and usage of VFIO?
- Is there an expected minimum alignment greater than 4K (our system page size)
  for non-MAP_FIXED mmap on a VFIO device fd?
- Was there an unintended regression to our use-case in between 6.9 and 6.13?

Thanks,
Alex Mastro

[1] https://lore.kernel.org/all/20250205231728.2527186-1-alex.williamson@redhat.com/


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-06-09  0:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-29 21:44 [BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO Alex Mastro
2025-05-30 13:10 ` Jason Gunthorpe
2025-05-30 14:25   ` Peter Xu
2025-05-30 23:05     ` Alex Mastro
2025-06-06 18:49   ` Alex Mastro
2025-06-09  0:20     ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox