From: Alex Williamson <alex.williamson@redhat.com>
To: Qinyun Tan <qinyuntan@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2]
Date: Thu, 24 Oct 2024 11:06:24 -0600 [thread overview]
Message-ID: <20241024110624.63871cfa.alex.williamson@redhat.com> (raw)
In-Reply-To: <cover.1729760996.git.qinyuntan@linux.alibaba.com>
On Thu, 24 Oct 2024 17:34:42 +0800
Qinyun Tan <qinyuntan@linux.alibaba.com> wrote:
> When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address,
> the general handler 'vfio_pin_map_dma' attempts to pin the memory and
> then create the mapping in the iommu.
>
> However, some mappings aren't backed by a struct page, for example an
> mmap'd MMIO range for our own or another device. In this scenario, a vma
> with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the
> pin operation incurs a large overhead which will result in a longer
> startup time for the VM. We don't actually need a pin in this scenario.
>
> To address this issue, we introduce a new DMA MAP flag
> 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote'
> operation in the DMA map process for mmio memory. Additionally, we add
> the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can
> directly obtain the pfn through vma->vm_pgoff.
>
> This approach allows us to avoid unnecessary memory pinning operations,
> which would otherwise introduce additional overhead during DMA mapping.
>
> In my tests, using vfio to pass through an 8-card AMD GPU which with a
> large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced
> from about 50.79s to 1.57s.
If the vma has a flag to indicate pfnmap, why does the user need to
provide a mapping flag to indicate not to pin? We generally cannot
trust such a user directive anyway, nor do we in this series, so it all
seems rather redundant.
What about simply improving the batching of pfnmap ranges rather than
imposing any sort of mm or uapi changes? Or perhaps, since we're now
using huge_fault to populate the vma, maybe we can iterate at PMD or
PUD granularity rather than PAGE_SIZE? Seems like we have plenty of
optimizations to pursue that could be done transparently to the user.
Thanks,
Alex
next prev parent reply other threads:[~2024-10-24 17:06 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-24 9:34 Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 1/2] mm: introduce vma flag VM_PGOFF_IS_PFN Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 2/2] vfio: avoid unnecessary pin memory when dma map io address space Qinyun Tan
2024-10-24 17:06 ` Alex Williamson [this message]
2024-10-24 18:19 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Jason Gunthorpe
2024-10-29 2:50 ` 半叶
2024-10-29 3:32 ` qinyuntan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241024110624.63871cfa.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=qinyuntan@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox