From: James Gowans <jgowans@amazon.com>
To: <linux-kernel@vger.kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>,
<kexec@lists.infradead.org>, "Joerg Roedel" <joro@8bytes.org>,
Will Deacon <will@kernel.org>, <iommu@lists.linux.dev>,
Alexander Viro <viro@zeniv.linux.org.uk>,
"Christian Brauner" <brauner@kernel.org>,
<linux-fsdevel@vger.kernel.org>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>, <kvm@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
Alexander Graf <graf@amazon.com>,
David Woodhouse <dwmw@amazon.co.uk>,
"Jan H . Schoenherr" <jschoenh@amazon.de>,
Usama Arif <usama.arif@bytedance.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
<madvenka@linux.microsoft.com>, <steven.sistare@oracle.com>,
<yuleixzhang@tencent.com>
Subject: [RFC 16/18] vfio: support not mapping IOMMU pgtables on live-update
Date: Mon, 5 Feb 2024 12:02:01 +0000 [thread overview]
Message-ID: <20240205120203.60312-17-jgowans@amazon.com> (raw)
In-Reply-To: <20240205120203.60312-1-jgowans@amazon.com>
When restoring VMs after live update kexec, the IOVAs for the guest VM
are already present in the persisted page tables. It is unnecessary to
clobber the existing pgtable entries and it may introduce races if
pgtable modifications happen concurrently with DMA.
Provide a new VFIO MAP_DMA flag which userspace can supply to inform
VFIO that the IOVAs are already mapped. In this case VFIO will skip over
the call to the IOMMU driver to do the mapping. VFIO still needs the
MAP_DMA ioctl to set up its internal data structures about the mapping.
It would probably be better to move the persistence one layer up and
persist the VFIO container in pkernfs. That way the whole container
could be picked up and re-used without needing to do any MAP_DMA ioctls
after kexec.
---
drivers/vfio/vfio_iommu_type1.c | 24 +++++++++++++-----------
include/uapi/linux/vfio.h | 1 +
2 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b36edfc5c9ef..dc2682fbda2e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1456,7 +1456,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
}
static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
- size_t map_size)
+ size_t map_size, unsigned int flags)
{
dma_addr_t iova = dma->iova;
unsigned long vaddr = dma->vaddr;
@@ -1479,14 +1479,16 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,
break;
}
- /* Map it! */
- ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage,
- dma->prot);
- if (ret) {
- vfio_unpin_pages_remote(dma, iova + dma->size, pfn,
- npage, true);
- vfio_batch_unpin(&batch, dma);
- break;
+ if (!(flags & VFIO_DMA_MAP_FLAG_LIVE_UPDATE)) {
+ /* Map it! */
+ ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage,
+ dma->prot);
+ if (ret) {
+ vfio_unpin_pages_remote(dma, iova + dma->size, pfn,
+ npage, true);
+ vfio_batch_unpin(&batch, dma);
+ break;
+ }
}
size -= npage << PAGE_SHIFT;
@@ -1662,7 +1664,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
if (list_empty(&iommu->domain_list))
dma->size = size;
else
- ret = vfio_pin_map_dma(iommu, dma, size);
+ ret = vfio_pin_map_dma(iommu, dma, size, map->flags);
if (!ret && iommu->dirty_page_tracking) {
ret = vfio_dma_bitmap_alloc(dma, pgsize);
@@ -2836,7 +2838,7 @@ static int vfio_iommu_type1_map_dma(struct vfio_iommu *iommu,
struct vfio_iommu_type1_dma_map map;
unsigned long minsz;
uint32_t mask = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE |
- VFIO_DMA_MAP_FLAG_VADDR;
+ VFIO_DMA_MAP_FLAG_VADDR | VFIO_DMA_MAP_FLAG_LIVE_UPDATE;
minsz = offsetofend(struct vfio_iommu_type1_dma_map, size);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index fa9676bb4b26..d04d28e52110 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1536,6 +1536,7 @@ struct vfio_iommu_type1_dma_map {
#define VFIO_DMA_MAP_FLAG_READ (1 << 0) /* readable from device */
#define VFIO_DMA_MAP_FLAG_WRITE (1 << 1) /* writable from device */
#define VFIO_DMA_MAP_FLAG_VADDR (1 << 2)
+#define VFIO_DMA_MAP_FLAG_LIVE_UPDATE (1 << 3) /* IOVAs already mapped in IOMMU before LU */
__u64 vaddr; /* Process virtual address */
__u64 iova; /* IO virtual address */
__u64 size; /* Size of mapping (bytes) */
--
2.40.1
next prev parent reply other threads:[~2024-02-05 12:06 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-05 12:01 [RFC 00/18] Pkernfs: Support persistence for live update James Gowans
2024-02-05 12:01 ` [RFC 01/18] pkernfs: Introduce filesystem skeleton James Gowans
2024-02-05 12:01 ` [RFC 02/18] pkernfs: Add persistent inodes hooked into directies James Gowans
2024-02-05 12:01 ` [RFC 03/18] pkernfs: Define an allocator for persistent pages James Gowans
2024-02-05 12:01 ` [RFC 04/18] pkernfs: support file truncation James Gowans
2024-02-05 12:01 ` [RFC 05/18] pkernfs: add file mmap callback James Gowans
2024-02-05 23:34 ` Dave Chinner
2024-02-05 12:01 ` [RFC 06/18] init: Add liveupdate cmdline param James Gowans
2024-02-05 12:01 ` [RFC 07/18] pkernfs: Add file type for IOMMU root pgtables James Gowans
2024-02-05 12:01 ` [RFC 08/18] iommu: Add allocator for pgtables from persistent region James Gowans
2024-02-05 12:01 ` [RFC 09/18] intel-iommu: Use pkernfs for root/context pgtable pages James Gowans
2024-02-05 12:01 ` [RFC 10/18] iommu/intel: zap context table entries on kexec James Gowans
2024-02-05 12:01 ` [RFC 11/18] dma-iommu: Always enable deferred attaches for liveupdate James Gowans
2024-02-05 17:45 ` Jason Gunthorpe
2024-02-05 12:01 ` [RFC 12/18] pkernfs: Add IOMMU domain pgtables file James Gowans
2024-02-05 12:01 ` [RFC 13/18] vfio: add ioctl to define persistent pgtables on container James Gowans
2024-02-05 17:08 ` Jason Gunthorpe
2024-02-05 12:01 ` [RFC 14/18] intel-iommu: Allocate domain pgtable pages from pkernfs James Gowans
2024-02-05 17:12 ` Jason Gunthorpe
2024-02-05 12:02 ` [RFC 15/18] pkernfs: register device memory for IOMMU domain pgtables James Gowans
2024-02-05 12:02 ` James Gowans [this message]
2024-02-05 12:02 ` [RFC 17/18] pci: Don't clear bus master is persistence enabled James Gowans
2024-02-05 12:02 ` [RFC 18/18] vfio-pci: Assume device working after liveupdate James Gowans
2024-02-05 17:10 ` [RFC 00/18] Pkernfs: Support persistence for live update Alex Williamson
2024-02-07 14:56 ` Gowans, James
2024-02-07 15:28 ` Jason Gunthorpe
2024-02-05 17:42 ` Jason Gunthorpe
2024-02-07 14:45 ` Gowans, James
2024-02-07 15:22 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240205120203.60312-17-jgowans@amazon.com \
--to=jgowans@amazon.com \
--cc=akpm@linux-foundation.org \
--cc=anthony.yznaga@oracle.com \
--cc=brauner@kernel.org \
--cc=dwmw@amazon.co.uk \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=jschoenh@amazon.de \
--cc=kexec@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=madvenka@linux.microsoft.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=skinsburskii@linux.microsoft.com \
--cc=steven.sistare@oracle.com \
--cc=usama.arif@bytedance.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=yuleixzhang@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox