* [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2]
@ 2024-10-24 9:34 Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 1/2] mm: introduce vma flag VM_PGOFF_IS_PFN Qinyun Tan
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Qinyun Tan @ 2024-10-24 9:34 UTC (permalink / raw)
To: Andrew Morton, Alex Williamson; +Cc: linux-mm, kvm, linux-kernel, Qinyun Tan
When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address,
the general handler 'vfio_pin_map_dma' attempts to pin the memory and
then create the mapping in the iommu.
However, some mappings aren't backed by a struct page, for example an
mmap'd MMIO range for our own or another device. In this scenario, a vma
with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the
pin operation incurs a large overhead which will result in a longer
startup time for the VM. We don't actually need a pin in this scenario.
To address this issue, we introduce a new DMA MAP flag
'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote'
operation in the DMA map process for mmio memory. Additionally, we add
the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can
directly obtain the pfn through vma->vm_pgoff.
This approach allows us to avoid unnecessary memory pinning operations,
which would otherwise introduce additional overhead during DMA mapping.
In my tests, using vfio to pass through an 8-card AMD GPU which with a
large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced
from about 50.79s to 1.57s.
Qinyun Tan (2):
mm: introduce vma flag VM_PGOFF_IS_PFN
vfio: avoid unnecessary pin memory when dma map io address space
drivers/vfio/pci/vfio_pci_core.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 64 +++++++++++++++++++++++++-------
include/linux/mm.h | 6 +++
include/uapi/linux/vfio.h | 11 ++++++
4 files changed, 68 insertions(+), 15 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 1/2] mm: introduce vma flag VM_PGOFF_IS_PFN
2024-10-24 9:34 [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Qinyun Tan
@ 2024-10-24 9:34 ` Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 2/2] vfio: avoid unnecessary pin memory when dma map io address space Qinyun Tan
2024-10-24 17:06 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Alex Williamson
2 siblings, 0 replies; 7+ messages in thread
From: Qinyun Tan @ 2024-10-24 9:34 UTC (permalink / raw)
To: Andrew Morton, Alex Williamson
Cc: linux-mm, kvm, linux-kernel, Qinyun Tan, Guanghui Feng, Xunlei Pang
Introduce a new vma flag 'VM_PGOFF_IS_PFN', which means
vma->vm_pgoff == pfn. This allows us to directly obtain pfn
through vma->vm_pgoff. No Functional Change.
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Reviewed-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com>
---
include/linux/mm.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ecf63d2b05825..80849b1b9aa92 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -322,6 +322,12 @@ extern unsigned int kobjsize(const void *objp);
#define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
+#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS /* vma->vm_pgoff == pfn */
+ #define VM_PGOFF_IS_PFN BIT(62)
+#else
+ #define VM_PGOFF_IS_PFN VM_NONE
+#endif
+
#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
#define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
#define VM_HIGH_ARCH_BIT_1 33 /* bit only usable on 64-bit architectures */
--
2.43.5
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 2/2] vfio: avoid unnecessary pin memory when dma map io address space
2024-10-24 9:34 [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 1/2] mm: introduce vma flag VM_PGOFF_IS_PFN Qinyun Tan
@ 2024-10-24 9:34 ` Qinyun Tan
2024-10-24 17:06 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Alex Williamson
2 siblings, 0 replies; 7+ messages in thread
From: Qinyun Tan @ 2024-10-24 9:34 UTC (permalink / raw)
To: Andrew Morton, Alex Williamson
Cc: linux-mm, kvm, linux-kernel, Qinyun Tan, Guanghui Feng, Xunlei Pang
When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address,
the general handler 'vfio_pin_map_dma' attempts to pin the memory and
then create the mapping in the iommu.
However, some mappings aren't backed by a struct page, for example an
mmap'd MMIO range for our own or another device. In this scenario, a vma
with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the
pin operation incurs a large overhead which will result in a longer
startup time for the VM. We don't actually need a pin in this scenario.
To address this issue, we introduce a new DMA MAP flag
'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote'
operation in the DMA map process for mmio memory. Additionally, we add
the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can
directly obtain the pfn through vma->vm_pgoff.
This approach allows us to avoid unnecessary memory pinning operations,
which would otherwise introduce additional overhead during DMA mapping.
In my tests, using vfio to pass through an 8-card AMD GPU which with a
large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced
from about 50.79s to 1.57s.
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Signed-off-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
Reviewed-by: Xunlei Pang <xlpang@linux.alibaba.com>
---
drivers/vfio/pci/vfio_pci_core.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 64 +++++++++++++++++++++++++-------
include/uapi/linux/vfio.h | 11 ++++++
3 files changed, 62 insertions(+), 15 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1ab58da9f38a6..9e8743429e490 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1802,7 +1802,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
* the VMA flags.
*/
vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED | VM_IO | VM_PFNMAP |
- VM_DONTEXPAND | VM_DONTDUMP);
+ VM_DONTEXPAND | VM_DONTDUMP | VM_PGOFF_IS_PFN);
vma->vm_ops = &vfio_pci_mmap_ops;
return 0;
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index bf391b40e576f..156e668de117d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1439,7 +1439,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
}
static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
- size_t map_size)
+ size_t map_size, unsigned int map_flags)
{
dma_addr_t iova = dma->iova;
unsigned long vaddr = dma->vaddr;
@@ -1448,27 +1448,61 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
long npage;
unsigned long pfn, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
int ret = 0;
+ struct mm_struct *mm = current->mm;
+ bool mmio_dont_pin = map_flags & VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN;
+
+ /* This code path is only user initiated */
+ if (!mm) {
+ ret = -ENODEV;
+ goto out;
+ }
vfio_batch_init(&batch);
while (size) {
- /* Pin a contiguous chunk of memory */
- npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
- size >> PAGE_SHIFT, &pfn, limit,
- &batch);
- if (npage <= 0) {
- WARN_ON(!npage);
- ret = (int)npage;
- break;
+ struct vm_area_struct *vma;
+ unsigned long start = vaddr + dma->size;
+ bool do_pin_pages = true;
+
+ if (mmio_dont_pin) {
+ mmap_read_lock(mm);
+
+ vma = find_vma_intersection(mm, start, start+1);
+
+ /*
+ * If this dma address rang belongs to the IO address space with VMA flags
+ * VM_IO | VM_PFNMAP | VM_PGOFF_IS_PFN, it doesn't need to be pinned.
+ * Simply skip the pin operation to avoid unnecessary overhead.
+ */
+ if (vma && (vma->vm_flags & VM_PFNMAP) && (vma->vm_flags & VM_IO)
+ && (vma->vm_flags & VM_PGOFF_IS_PFN)) {
+ pfn = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
+ npage = min_t(long, (vma->vm_end - start), size) >> PAGE_SHIFT;
+ do_pin_pages = false;
+ }
+ mmap_read_unlock(mm);
+ }
+
+ if (do_pin_pages) {
+ /* Pin a contiguous chunk of memory */
+ npage = vfio_pin_pages_remote(dma, start, size >> PAGE_SHIFT, &pfn,
+ limit, &batch);
+ if (npage <= 0) {
+ WARN_ON(!npage);
+ ret = (int)npage;
+ break;
+ }
}
/* Map it! */
ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage,
dma->prot);
if (ret) {
- vfio_unpin_pages_remote(dma, iova + dma->size, pfn,
- npage, true);
- vfio_batch_unpin(&batch, dma);
+ if (do_pin_pages) {
+ vfio_unpin_pages_remote(dma, iova + dma->size, pfn,
+ npage, true);
+ vfio_batch_unpin(&batch, dma);
+ }
break;
}
@@ -1479,6 +1513,7 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
vfio_batch_fini(&batch);
dma->iommu_mapped = true;
+out:
if (ret)
vfio_remove_dma(iommu, dma);
@@ -1645,7 +1680,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
if (list_empty(&iommu->domain_list))
dma->size = size;
else
- ret = vfio_pin_map_dma(iommu, dma, size);
+ ret = vfio_pin_map_dma(iommu, dma, size, map->flags);
if (!ret && iommu->dirty_page_tracking) {
ret = vfio_dma_bitmap_alloc(dma, pgsize);
@@ -2639,6 +2674,7 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
case VFIO_TYPE1_IOMMU:
case VFIO_TYPE1v2_IOMMU:
case VFIO_TYPE1_NESTING_IOMMU:
+ case VFIO_DMA_MAP_MMIO_DONT_PIN:
case VFIO_UNMAP_ALL:
return 1;
case VFIO_UPDATE_VADDR:
@@ -2811,7 +2847,7 @@ static int vfio_iommu_type1_map_dma(struct vfio_iommu *iommu,
struct vfio_iommu_type1_dma_map map;
unsigned long minsz;
uint32_t mask = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE |
- VFIO_DMA_MAP_FLAG_VADDR;
+ VFIO_DMA_MAP_FLAG_VADDR | VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN;
minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 2b68e6cdf1902..ca391ec41b3c3 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -56,6 +56,16 @@
*/
#define VFIO_UPDATE_VADDR 10
+/*
+ * Support VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN for DMA mapping. For MMIO addresses,
+ * we do not need to pin the pages or establish address mapping in the MMU
+ * at this stage. We only need to establish the address mapping in the IOMMU for
+ * the ioctl(VFIO_IOMMU_MAP_DMA). The page table mapping in the MMU will be
+ * dynamically established through the page fault mechanism when the page
+ * is accessed in the future.
+ */
+#define VFIO_DMA_MAP_MMIO_DONT_PIN 11
+
/*
* The IOCTL interface is designed for extensibility by embedding the
* structure length (argsz) and flags into structures passed between
@@ -1560,6 +1570,7 @@ struct vfio_iommu_type1_dma_map {
#define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
#define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
#define VFIO_DMA_MAP_FLAG_VADDR (1 << 2)
+#define VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN (1 << 3) /* MMIO doesn't need pin page */
__u64 vaddr; /* Process virtual address */
__u64 iova; /* IO virtual address */
__u64 size; /* Size of mapping (bytes) */
--
2.43.5
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2]
2024-10-24 9:34 [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 1/2] mm: introduce vma flag VM_PGOFF_IS_PFN Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 2/2] vfio: avoid unnecessary pin memory when dma map io address space Qinyun Tan
@ 2024-10-24 17:06 ` Alex Williamson
2024-10-24 18:19 ` Jason Gunthorpe
` (2 more replies)
2 siblings, 3 replies; 7+ messages in thread
From: Alex Williamson @ 2024-10-24 17:06 UTC (permalink / raw)
To: Qinyun Tan; +Cc: Andrew Morton, linux-mm, kvm, linux-kernel
On Thu, 24 Oct 2024 17:34:42 +0800
Qinyun Tan <qinyuntan@linux.alibaba.com> wrote:
> When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address,
> the general handler 'vfio_pin_map_dma' attempts to pin the memory and
> then create the mapping in the iommu.
>
> However, some mappings aren't backed by a struct page, for example an
> mmap'd MMIO range for our own or another device. In this scenario, a vma
> with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the
> pin operation incurs a large overhead which will result in a longer
> startup time for the VM. We don't actually need a pin in this scenario.
>
> To address this issue, we introduce a new DMA MAP flag
> 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote'
> operation in the DMA map process for mmio memory. Additionally, we add
> the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can
> directly obtain the pfn through vma->vm_pgoff.
>
> This approach allows us to avoid unnecessary memory pinning operations,
> which would otherwise introduce additional overhead during DMA mapping.
>
> In my tests, using vfio to pass through an 8-card AMD GPU which with a
> large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced
> from about 50.79s to 1.57s.
If the vma has a flag to indicate pfnmap, why does the user need to
provide a mapping flag to indicate not to pin? We generally cannot
trust such a user directive anyway, nor do we in this series, so it all
seems rather redundant.
What about simply improving the batching of pfnmap ranges rather than
imposing any sort of mm or uapi changes? Or perhaps, since we're now
using huge_fault to populate the vma, maybe we can iterate at PMD or
PUD granularity rather than PAGE_SIZE? Seems like we have plenty of
optimizations to pursue that could be done transparently to the user.
Thanks,
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2]
2024-10-24 17:06 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Alex Williamson
@ 2024-10-24 18:19 ` Jason Gunthorpe
2024-10-29 2:50 ` 半叶
2024-10-29 3:32 ` qinyuntan
2 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2024-10-24 18:19 UTC (permalink / raw)
To: Alex Williamson; +Cc: Qinyun Tan, Andrew Morton, linux-mm, kvm, linux-kernel
On Thu, Oct 24, 2024 at 11:06:24AM -0600, Alex Williamson wrote:
> On Thu, 24 Oct 2024 17:34:42 +0800
> Qinyun Tan <qinyuntan@linux.alibaba.com> wrote:
>
> > When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address,
> > the general handler 'vfio_pin_map_dma' attempts to pin the memory and
> > then create the mapping in the iommu.
> >
> > However, some mappings aren't backed by a struct page, for example an
> > mmap'd MMIO range for our own or another device. In this scenario, a vma
> > with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the
> > pin operation incurs a large overhead which will result in a longer
> > startup time for the VM. We don't actually need a pin in this scenario.
> >
> > To address this issue, we introduce a new DMA MAP flag
> > 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote'
> > operation in the DMA map process for mmio memory. Additionally, we add
> > the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can
> > directly obtain the pfn through vma->vm_pgoff.
> >
> > This approach allows us to avoid unnecessary memory pinning operations,
> > which would otherwise introduce additional overhead during DMA mapping.
> >
> > In my tests, using vfio to pass through an 8-card AMD GPU which with a
> > large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced
> > from about 50.79s to 1.57s.
>
> If the vma has a flag to indicate pfnmap, why does the user need to
> provide a mapping flag to indicate not to pin? We generally cannot
> trust such a user directive anyway, nor do we in this series, so it all
> seems rather redundant.
The best answer is to map from DMABUF not from VMA and then you get
perfect aggregation cheaply.
> What about simply improving the batching of pfnmap ranges rather than
> imposing any sort of mm or uapi changes? Or perhaps, since we're now
> using huge_fault to populate the vma, maybe we can iterate at PMD or
> PUD granularity rather than PAGE_SIZE? Seems like we have plenty of
> optimizations to pursue that could be done transparently to the
> user.
I don't want to add more stuff to support the security broken
follow_pfn path. It needs to be replaced.
Leon's work to improve the DMA API is soo close so we may be close to
the end!
There are two versions of the dmabuf patches on the list, it would be
good to get that in good shape. We could make a full solution,
including the vfio/iommufd map side while waiting.
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2]
2024-10-24 17:06 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Alex Williamson
2024-10-24 18:19 ` Jason Gunthorpe
@ 2024-10-29 2:50 ` 半叶
2024-10-29 3:32 ` qinyuntan
2 siblings, 0 replies; 7+ messages in thread
From: 半叶 @ 2024-10-29 2:50 UTC (permalink / raw)
To: Alex Williamson; +Cc: Andrew Morton, linux-mm, kvm, linux-kernel
> 2024年10月25日 01:06,Alex Williamson <alex.williamson@redhat.com> 写道:
>
> On Thu, 24 Oct 2024 17:34:42 +0800
> Qinyun Tan <qinyuntan@linux.alibaba.com> wrote:
>
>> When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address,
>> the general handler 'vfio_pin_map_dma' attempts to pin the memory and
>> then create the mapping in the iommu.
>>
>> However, some mappings aren't backed by a struct page, for example an
>> mmap'd MMIO range for our own or another device. In this scenario, a vma
>> with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the
>> pin operation incurs a large overhead which will result in a longer
>> startup time for the VM. We don't actually need a pin in this scenario.
>>
>> To address this issue, we introduce a new DMA MAP flag
>> 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote'
>> operation in the DMA map process for mmio memory. Additionally, we add
>> the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can
>> directly obtain the pfn through vma->vm_pgoff.
>>
>> This approach allows us to avoid unnecessary memory pinning operations,
>> which would otherwise introduce additional overhead during DMA mapping.
>>
>> In my tests, using vfio to pass through an 8-card AMD GPU which with a
>> large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced
>> from about 50.79s to 1.57s.
>
> If the vma has a flag to indicate pfnmap, why does the user need to
> provide a mapping flag to indicate not to pin? We generally cannot
> trust such a user directive anyway, nor do we in this series, so it all
> seems rather redundant.
>
> What about simply improving the batching of pfnmap ranges rather than
> imposing any sort of mm or uapi changes? Or perhaps, since we're now
> using huge_fault to populate the vma, maybe we can iterate at PMD or
> PUD granularity rather than PAGE_SIZE? Seems like we have plenty of
> optimizations to pursue that could be done transparently to the user.
> Thanks,
>
> Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2]
2024-10-24 17:06 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Alex Williamson
2024-10-24 18:19 ` Jason Gunthorpe
2024-10-29 2:50 ` 半叶
@ 2024-10-29 3:32 ` qinyuntan
2 siblings, 0 replies; 7+ messages in thread
From: qinyuntan @ 2024-10-29 3:32 UTC (permalink / raw)
To: Alex Williamson; +Cc: Andrew Morton, linux-mm, kvm, linux-kernel
You are right, it seems I did not get the relevant updates in time. In
the patch f9e54c3a2f5b7 ("vfio/pci: implement huge_fault support"),
huge_fault was introduced, and maybe we can achieve the same effect by
adjusting the function vfio_pci_mmap_huge_fault's order parameter.
Thanks,
Qinyun Tan
On 2024/10/25 01:06, Alex Williamson wrote:
> On Thu, 24 Oct 2024 17:34:42 +0800
> Qinyun Tan <qinyuntan@linux.alibaba.com> wrote:
>
>> When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address,
>> the general handler 'vfio_pin_map_dma' attempts to pin the memory and
>> then create the mapping in the iommu.
>>
>> However, some mappings aren't backed by a struct page, for example an
>> mmap'd MMIO range for our own or another device. In this scenario, a vma
>> with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the
>> pin operation incurs a large overhead which will result in a longer
>> startup time for the VM. We don't actually need a pin in this scenario.
>>
>> To address this issue, we introduce a new DMA MAP flag
>> 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote'
>> operation in the DMA map process for mmio memory. Additionally, we add
>> the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can
>> directly obtain the pfn through vma->vm_pgoff.
>>
>> This approach allows us to avoid unnecessary memory pinning operations,
>> which would otherwise introduce additional overhead during DMA mapping.
>>
>> In my tests, using vfio to pass through an 8-card AMD GPU which with a
>> large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced
>> from about 50.79s to 1.57s.
>
> If the vma has a flag to indicate pfnmap, why does the user need to
> provide a mapping flag to indicate not to pin? We generally cannot
> trust such a user directive anyway, nor do we in this series, so it all
> seems rather redundant.
>
> What about simply improving the batching of pfnmap ranges rather than
> imposing any sort of mm or uapi changes? Or perhaps, since we're now
> using huge_fault to populate the vma, maybe we can iterate at PMD or
> PUD granularity rather than PAGE_SIZE? Seems like we have plenty of
> optimizations to pursue that could be done transparently to the user.
> Thanks,
>
> Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-29 3:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-24 9:34 [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 1/2] mm: introduce vma flag VM_PGOFF_IS_PFN Qinyun Tan
2024-10-24 9:34 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 2/2] vfio: avoid unnecessary pin memory when dma map io address space Qinyun Tan
2024-10-24 17:06 ` [PATCH v1: vfio: avoid unnecessary pin memory when dma map io address space 0/2] Alex Williamson
2024-10-24 18:19 ` Jason Gunthorpe
2024-10-29 2:50 ` 半叶
2024-10-29 3:32 ` qinyuntan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox