From: Peter Xu <peterx@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Mastro <amastro@fb.com>,
linux-pci@vger.kernel.org, alex.williamson@redhat.com,
kbusch@kernel.org, linux-mm@kvack.org
Subject: Re: [BUG?] vfio/pci: VA alignment sensitivity of VFIO_IOMMU_MAP_DMA which target MMIO
Date: Fri, 30 May 2025 10:25:01 -0400 [thread overview]
Message-ID: <aDm_vaQUnrVbuvxO@x1.local> (raw)
In-Reply-To: <20250530131050.GA233377@nvidia.com>
On Fri, May 30, 2025 at 10:10:50AM -0300, Jason Gunthorpe wrote:
> On Thu, May 29, 2025 at 02:44:14PM -0700, Alex Mastro wrote:
>
> > We are wondering the following:
> > - Is all of the above expected behavior, and usage of VFIO?
> > - Is there an expected minimum alignment greater than 4K (our system page size)
> > for non-MAP_FIXED mmap on a VFIO device fd?
> > - Was there an unintended regression to our use-case in between 6.9 and 6.13?
Probably due to aac6db75a9fc vfio/pci: Use unmap_mapping_range(). IIUC the
plan was huge fault could bring back the lost perf, but indeed the
alignment is still a challenge to at least always make right.
>
> I think this is something we have missed. VFIO should automatically
> align the VMA's address if not MAP_FIXED, otherwise it can't use the
> efficient huge page sizes anymore. qemu uses MAP_FIXED so we've left
> out the non-qemu users from this performance optimization.
>
> To fix it, the flow from the mm side is something like what
> shmem_get_unmapped_area() does. VFIO would probably want to align all
> BAR's to their size.
Good point! I overlooked the VA hints when QEMU doesn't need it. I can
have a closer look if nobody else will.
>
> Which seems to me probably wants some refactoring and a core helper
> 'mm_get_aligned_unmapped_area()'..
>
> I think if you are mmaping a huge huge BAR it is not surprising that
> it will take a huge amount of time to write out all of the 4K
> PTEs. The stalls on old kernels should probably be addressed by having
> cond_resched() inside the remap_pfnmap().
Right, but then that'll be a stable-only fix.
If VFIO can provide a valid get_unmapped_area(), then with huge faults
maybe we don't even need it, and such change can copy stable too.
Meanwhile, just to mention there's one more commit that vfio huge_fault
stable branches would like to have soon, that Alex fixed yet another
alignment related issue to do reliable huge faults:
commit c1d9dac0db168198b6f63f460665256dedad9b6e
Author: Alex Williamson <alex.williamson@redhat.com>
Date: Fri May 2 16:40:31 2025 -0600
vfio/pci: Align huge faults to order
I think if your trace shows correct huge faults when you did correct
alignment, it should mean it doesn't affect your case (likely your app
sequentially fault in the bar region.. meanwhile likely there's no
concurrent, especially unaligned, faults when pre-fault everything). But
just something FYI and IIUC that commit will land 6.13.z soon.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2025-05-30 14:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-29 21:44 Alex Mastro
2025-05-30 13:10 ` Jason Gunthorpe
2025-05-30 14:25 ` Peter Xu [this message]
2025-05-30 23:05 ` Alex Mastro
2025-06-06 18:49 ` Alex Mastro
2025-06-09 0:20 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aDm_vaQUnrVbuvxO@x1.local \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=amastro@fb.com \
--cc=jgg@nvidia.com \
--cc=kbusch@kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox