linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: "Leon Romanovsky" <leon@kernel.org>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Leon Romanovsky" <leonro@nvidia.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Christian König" <christian.koenig@amd.com>,
	dri-devel@lists.freedesktop.org, iommu@lists.linux.dev,
	"Jens Axboe" <axboe@kernel.dk>, "Joerg Roedel" <joro@8bytes.org>,
	kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-media@vger.kernel.org, linux-mm@kvack.org,
	linux-pci@vger.kernel.org,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Marek Szyprowski" <m.szyprowski@samsung.com>,
	"Robin Murphy" <robin.murphy@arm.com>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Vivek Kasireddy" <vivek.kasireddy@intel.com>,
	"Will Deacon" <will@kernel.org>
Subject: Re: [PATCH v5 1/9] PCI/P2PDMA: Separate the mmap() support from the core logic
Date: Mon, 20 Oct 2025 09:58:54 -0300	[thread overview]
Message-ID: <20251020125854.GL316284@nvidia.com> (raw)
In-Reply-To: <aPYqliGwJTcZznSX@infradead.org>

On Mon, Oct 20, 2025 at 05:27:02AM -0700, Christoph Hellwig wrote:
> On Fri, Oct 17, 2025 at 08:53:20AM -0300, Jason Gunthorpe wrote:
> > On Thu, Oct 16, 2025 at 11:30:06PM -0700, Christoph Hellwig wrote:
> > > On Mon, Oct 13, 2025 at 06:26:03PM +0300, Leon Romanovsky wrote:
> > > > The DMA API now has a new flow, and has gained phys_addr_t support, so
> > > > it no longer needs struct pages to perform P2P mapping.
> > > 
> > > That's news to me.  All the pci_p2pdma_map_state machinery is still
> > > based on pgmaps and thus pages.
> > 
> > We had this discussion already three months ago:
> > 
> > https://lore.kernel.org/all/20250729131502.GJ36037@nvidia.com/
> > 
> > These couple patches make the core pci_p2pdma_map_state machinery work
> > on struct p2pdma_provider, and pgmap is just one way to get a
> > p2pdma_provider *
> > 
> > The struct page paths through pgmap go page->pgmap->mem to get
> > p2pdma_provider.
> > 
> > The non-struct page paths just have a p2pdma_provider * without a
> > pgmap. In this series VFIO uses
> > 
> > +	*provider = pcim_p2pdma_provider(pdev, bar);
> > 
> > To get the provider for a specific BAR.
> 
> And what protects that life time?  I've not seen anyone actually
> building the proper lifetime management.  And if someone did the patches
> need to clearly point to that.

It is this series!

The above API gives a lifetime that is driver bound. The calling
driver must ensure it stops using provider and stops doing DMA with it
before remove() completes.

This VFIO series does that through the move_notify callchain I showed
in the previous email. This callchain is always triggered before
remove() of the VFIO PCI driver is completed.

> > I think I've answered this three times now - for DMABUF the DMABUF
> > invalidation scheme is used to control the lifetime and no DMA mapping
> > outlives the provider, and the provider doesn't outlive the driver.
> 
> How?

I explained it in detail in the message you are repling to. If
something is not clear can you please be more specific??

Is it the mmap in VFIO perhaps that is causing these questions?

VFIO uses a PFNMAP VMA, so you can't pin_user_page() it. It uses
unmap_mapping_range() during its remove() path to get rid of the VMA
PTEs.

The DMA activity doesn't use the mmap *at all*. It isn't like NVMe
which relies on the ZONE_DEVICE pages and VMAs to link drivers
togther.

Instead the DMABUF FD is used to pass the MMIO pages between VFIO and
another driver. DMABUF has a built in invalidation mechanism that VFIO
triggers before remove(). The invalidation removes access from the
other driver.

This is different than NVMe which has no invalidation. NVMe does
unmap_mapping_range() on the VMA and waits for all the short lived
pgmap references to clear. We don't need anything like that because
DMABUF invalidation is synchronous.

The full picture for VFIO is something like:

[startup]
  MMIO is acquired from the pci_resource
  p2p_providers are setup

[runtime]
  MMIO is mapped into PFNMAP VMAs
  MMIO is linked to a DMABUF FD
  DMABUF FD gets DMA mapped using the p2p_provider

[unplug]
  unmap_mapping_range() is called so all VMAs are emptied out and the
  fault handler prevents new PTEs 
    ** No access to the MMIO through VMAs is possible**

  vfio_pci_dma_buf_cleanup() is called which prevents new DMABUF
  mappings from starting, and does dma_buf_move_notify() on all the
  open DMABUF FDs to invalidate other drivers. Other drivers stop
  doing DMA and we need to free the IOVA from the IOMMU/etc.
    ** No DMA access from other drivers is possible now**

  Any still open DMABUF FD will fail inside VFIO immediately due to
  the priv->revoked checks.
    **No code touches the p2p_provider anymore**

  The p2p_provider is destroyed by devm.

> > Obviously you cannot use the new p2provider mechanism without some
> > kind of protection against use after hot unplug, but it doesn't have
> > to be struct page based.
> 
> And how does this interact with everyone else expecting pgmap based
> lifetime management.

They continue to use pgmap and nothing changes for them.

The pgmap path always waited until nothing was using the pgmap and
thus provider before allowing device driver remove() to complete.

The refactoring doesn't change the lifecycle model, it just provides
entry points to access the driver bound lifetime model directly
instead of being forced to use pgmap.

Leon, can you add some remarks to the comments about what the rules
are to call pcim_p2pdma_provider() ?

Jason


  reply	other threads:[~2025-10-20 12:59 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-13 15:26 [PATCH v5 0/9] vfio/pci: Allow MMIO regions to be exported through dma-buf Leon Romanovsky
2025-10-13 15:26 ` [PATCH v5 1/9] PCI/P2PDMA: Separate the mmap() support from the core logic Leon Romanovsky
2025-10-17  6:30   ` Christoph Hellwig
2025-10-17 11:53     ` Jason Gunthorpe
2025-10-20 12:27       ` Christoph Hellwig
2025-10-20 12:58         ` Jason Gunthorpe [this message]
2025-10-20 15:04           ` Leon Romanovsky
2025-10-22  7:10           ` Christoph Hellwig
2025-10-22 11:43             ` Jason Gunthorpe
2025-10-13 15:26 ` [PATCH v5 2/9] PCI/P2PDMA: Simplify bus address mapping API Leon Romanovsky
2025-10-13 15:26 ` [PATCH v5 3/9] PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation Leon Romanovsky
2025-10-13 15:26 ` [PATCH v5 4/9] PCI/P2PDMA: Export pci_p2pdma_map_type() function Leon Romanovsky
2025-10-17  6:31   ` Christoph Hellwig
2025-10-17 12:14     ` Jason Gunthorpe
2025-10-20 12:29       ` Christoph Hellwig
2025-10-20 13:14         ` Jason Gunthorpe
2025-10-13 15:26 ` [PATCH v5 5/9] types: move phys_vec definition to common header Leon Romanovsky
2025-10-13 15:26 ` [PATCH v5 6/9] vfio: Export vfio device get and put registration helpers Leon Romanovsky
2025-10-13 15:26 ` [PATCH v5 7/9] vfio/pci: Share the core device pointer while invoking feature functions Leon Romanovsky
2025-10-13 15:26 ` [PATCH v5 8/9] vfio/pci: Enable peer-to-peer DMA transactions by default Leon Romanovsky
2025-10-16  4:09   ` Nicolin Chen
2025-10-16  6:10     ` Leon Romanovsky
2025-10-17  6:32   ` Christoph Hellwig
2025-10-17 11:55     ` Jason Gunthorpe
2025-10-20 12:28       ` Christoph Hellwig
2025-10-20 13:08         ` Jason Gunthorpe
2025-10-22  7:08           ` Christoph Hellwig
2025-10-22 11:38             ` Jason Gunthorpe
2025-10-22 11:54   ` Jason Gunthorpe
2025-10-13 15:26 ` [PATCH v5 9/9] vfio/pci: Add dma-buf export support for MMIO regions Leon Romanovsky
2025-10-16 23:53   ` Jason Gunthorpe
2025-10-17  5:40     ` Leon Romanovsky
2025-10-17 15:58       ` Jason Gunthorpe
2025-10-17 16:01         ` Jason Gunthorpe
2025-10-17  0:01   ` Jason Gunthorpe
2025-10-17  6:33   ` Christoph Hellwig
2025-10-17 12:16     ` Jason Gunthorpe
2025-10-17 13:02   ` Jason Gunthorpe
2025-10-17 16:13     ` Leon Romanovsky
2025-10-20 16:15       ` Jason Gunthorpe
2025-10-20 16:44         ` Leon Romanovsky
2025-10-20 16:51           ` Jason Gunthorpe
2025-10-17 23:40   ` Jason Gunthorpe
2025-10-22 12:50   ` Jason Gunthorpe
2025-10-26  7:55     ` Shuai Xue
2025-10-27 12:09       ` Jason Gunthorpe
2025-10-28 13:46         ` Shuai Xue
2025-10-27 23:13   ` David Matlack
2025-10-28 12:02     ` Leon Romanovsky
2025-10-30 22:28       ` Alex Mastro
2025-10-29 16:50   ` Alex Mastro
2025-10-29 18:21     ` Leon Romanovsky
2025-10-30  0:25   ` Samiullah Khawaja
2025-10-30  6:48     ` Leon Romanovsky
2025-10-30 12:57       ` Jason Gunthorpe
2025-10-30 20:38   ` Alex Williamson
2025-10-31  6:48     ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251020125854.GL316284@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=bhelgaas@google.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=m.szyprowski@samsung.com \
    --cc=robin.murphy@arm.com \
    --cc=sumit.semwal@linaro.org \
    --cc=vivek.kasireddy@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox