* [PATCH v4 1/3] kernfs: remove page_mkwrite() from vm_operations_struct
2024-07-08 16:57 [PATCH v4 0/3] Enable P2PDMA in Userspace RDMA Martin Oliveira
@ 2024-07-08 16:57 ` Martin Oliveira
2024-07-09 5:24 ` Greg Kroah-Hartman
2024-07-09 14:34 ` Matthew Wilcox
2024-07-08 16:57 ` [PATCH v4 2/3] mm/gup: allow FOLL_LONGTERM & FOLL_PCI_P2PDMA Martin Oliveira
2024-07-08 16:57 ` [PATCH v4 3/3] RDMA/umem: add support for P2P RDMA Martin Oliveira
2 siblings, 2 replies; 6+ messages in thread
From: Martin Oliveira @ 2024-07-08 16:57 UTC (permalink / raw)
To: linux-kernel, linux-mm, linux-rdma
Cc: Andrew Morton, Artemy Kovalyov, Greg Kroah-Hartman,
Jason Gunthorpe, Leon Romanovsky, Logan Gunthorpe,
Martin Oliveira, Michael Guralnik, Mike Marciniszyn,
Shiraz Saleem, Tejun Heo, John Hubbard, Dan Williams,
David Sloan
The .page_mkwrite operator of kernfs just calls file_update_time().
This is the same behaviour that the fault code does if .page_mkwrite is
not set.
Furthermore, having the page_mkwrite() operator causes
writable_file_mapping_allowed() to fail due to
vma_needs_dirty_tracking() on the gup flow, which is a pre-requisite for
enabling P2PDMA over RDMA.
There are no users of .page_mkwrite and no known valid use cases, so
just remove the .page_mkwrite from kernfs_ops and WARN if an mmap()
implementation sets .page_mkwrite.
Co-developed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>
---
fs/kernfs/file.c | 40 +++++++++++-----------------------------
1 file changed, 11 insertions(+), 29 deletions(-)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 8502ef68459b..fb2b77bf0c04 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -386,28 +386,6 @@ static vm_fault_t kernfs_vma_fault(struct vm_fault *vmf)
return ret;
}
-static vm_fault_t kernfs_vma_page_mkwrite(struct vm_fault *vmf)
-{
- struct file *file = vmf->vma->vm_file;
- struct kernfs_open_file *of = kernfs_of(file);
- vm_fault_t ret;
-
- if (!of->vm_ops)
- return VM_FAULT_SIGBUS;
-
- if (!kernfs_get_active(of->kn))
- return VM_FAULT_SIGBUS;
-
- ret = 0;
- if (of->vm_ops->page_mkwrite)
- ret = of->vm_ops->page_mkwrite(vmf);
- else
- file_update_time(file);
-
- kernfs_put_active(of->kn);
- return ret;
-}
-
static int kernfs_vma_access(struct vm_area_struct *vma, unsigned long addr,
void *buf, int len, int write)
{
@@ -432,7 +410,6 @@ static int kernfs_vma_access(struct vm_area_struct *vma, unsigned long addr,
static const struct vm_operations_struct kernfs_vm_ops = {
.open = kernfs_vma_open,
.fault = kernfs_vma_fault,
- .page_mkwrite = kernfs_vma_page_mkwrite,
.access = kernfs_vma_access,
};
@@ -475,12 +452,17 @@ static int kernfs_fop_mmap(struct file *file, struct vm_area_struct *vma)
if (of->mmapped && of->vm_ops != vma->vm_ops)
goto out_put;
- /*
- * It is not possible to successfully wrap close.
- * So error if someone is trying to use close.
- */
- if (vma->vm_ops && vma->vm_ops->close)
- goto out_put;
+ if (vma->vm_ops) {
+ /*
+ * It is not possible to successfully wrap close.
+ * So error if someone is trying to use close.
+ */
+ if (vma->vm_ops->close)
+ goto out_put;
+
+ if (WARN_ON_ONCE(vma->vm_ops->page_mkwrite))
+ goto out_put;
+ }
rc = 0;
if (!of->mmapped) {
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH v4 1/3] kernfs: remove page_mkwrite() from vm_operations_struct
2024-07-08 16:57 ` [PATCH v4 1/3] kernfs: remove page_mkwrite() from vm_operations_struct Martin Oliveira
@ 2024-07-09 5:24 ` Greg Kroah-Hartman
2024-07-09 14:34 ` Matthew Wilcox
1 sibling, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2024-07-09 5:24 UTC (permalink / raw)
To: Martin Oliveira
Cc: linux-kernel, linux-mm, linux-rdma, Andrew Morton,
Artemy Kovalyov, Jason Gunthorpe, Leon Romanovsky,
Logan Gunthorpe, Michael Guralnik, Mike Marciniszyn,
Shiraz Saleem, Tejun Heo, John Hubbard, Dan Williams,
David Sloan
On Mon, Jul 08, 2024 at 10:57:12AM -0600, Martin Oliveira wrote:
> The .page_mkwrite operator of kernfs just calls file_update_time().
> This is the same behaviour that the fault code does if .page_mkwrite is
> not set.
>
> Furthermore, having the page_mkwrite() operator causes
> writable_file_mapping_allowed() to fail due to
> vma_needs_dirty_tracking() on the gup flow, which is a pre-requisite for
> enabling P2PDMA over RDMA.
>
> There are no users of .page_mkwrite and no known valid use cases, so
> just remove the .page_mkwrite from kernfs_ops and WARN if an mmap()
> implementation sets .page_mkwrite.
>
> Co-developed-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>
> ---
> fs/kernfs/file.c | 40 +++++++++++-----------------------------
> 1 file changed, 11 insertions(+), 29 deletions(-)
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 1/3] kernfs: remove page_mkwrite() from vm_operations_struct
2024-07-08 16:57 ` [PATCH v4 1/3] kernfs: remove page_mkwrite() from vm_operations_struct Martin Oliveira
2024-07-09 5:24 ` Greg Kroah-Hartman
@ 2024-07-09 14:34 ` Matthew Wilcox
1 sibling, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2024-07-09 14:34 UTC (permalink / raw)
To: Martin Oliveira
Cc: linux-kernel, linux-mm, linux-rdma, Andrew Morton,
Artemy Kovalyov, Greg Kroah-Hartman, Jason Gunthorpe,
Leon Romanovsky, Logan Gunthorpe, Michael Guralnik,
Mike Marciniszyn, Shiraz Saleem, Tejun Heo, John Hubbard,
Dan Williams, David Sloan
On Mon, Jul 08, 2024 at 10:57:12AM -0600, Martin Oliveira wrote:
> - /*
> - * It is not possible to successfully wrap close.
> - * So error if someone is trying to use close.
> - */
> - if (vma->vm_ops && vma->vm_ops->close)
> - goto out_put;
> + if (vma->vm_ops) {
> + /*
> + * It is not possible to successfully wrap close.
> + * So error if someone is trying to use close.
> + */
> + if (vma->vm_ops->close)
> + goto out_put;
> +
> + if (WARN_ON_ONCE(vma->vm_ops->page_mkwrite))
> + goto out_put;
> + }
This is stupid. Warn for both or neither.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v4 2/3] mm/gup: allow FOLL_LONGTERM & FOLL_PCI_P2PDMA
2024-07-08 16:57 [PATCH v4 0/3] Enable P2PDMA in Userspace RDMA Martin Oliveira
2024-07-08 16:57 ` [PATCH v4 1/3] kernfs: remove page_mkwrite() from vm_operations_struct Martin Oliveira
@ 2024-07-08 16:57 ` Martin Oliveira
2024-07-08 16:57 ` [PATCH v4 3/3] RDMA/umem: add support for P2P RDMA Martin Oliveira
2 siblings, 0 replies; 6+ messages in thread
From: Martin Oliveira @ 2024-07-08 16:57 UTC (permalink / raw)
To: linux-kernel, linux-mm, linux-rdma
Cc: Andrew Morton, Artemy Kovalyov, Greg Kroah-Hartman,
Jason Gunthorpe, Leon Romanovsky, Logan Gunthorpe,
Martin Oliveira, Michael Guralnik, Mike Marciniszyn,
Shiraz Saleem, Tejun Heo, John Hubbard, Dan Williams,
David Sloan
This check existed originally due to concerns that P2PDMA needed to copy
fsdax until pgmap refcounts were fixed (see [1]).
The P2PDMA infrastructure will only call unmap_mapping_range() when the
underlying device is unbound, and immediately after unmapping it waits
for the reference of all ZONE_DEVICE pages to be released before
continuing. This does not allow for a page to be reused and no user
access fault is therefore possible. It does not have the same problem as
fsdax.
The one minor concern with FOLL_LONGTERM pins is they will block device
unbind until userspace releases them all.
Co-developed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>
[1]: https://lkml.kernel.org/r/Yy4Ot5MoOhsgYLTQ@ziepe.ca
---
mm/gup.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index ca0f5cedce9b..6922e1c38d75 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2614,11 +2614,6 @@ static bool is_valid_gup_args(struct page **pages, int *locked,
if (WARN_ON_ONCE((gup_flags & (FOLL_GET | FOLL_PIN)) && !pages))
return false;
- /* We want to allow the pgmap to be hot-unplugged at all times */
- if (WARN_ON_ONCE((gup_flags & FOLL_LONGTERM) &&
- (gup_flags & FOLL_PCI_P2PDMA)))
- return false;
-
*gup_flags_p = gup_flags;
return true;
}
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v4 3/3] RDMA/umem: add support for P2P RDMA
2024-07-08 16:57 [PATCH v4 0/3] Enable P2PDMA in Userspace RDMA Martin Oliveira
2024-07-08 16:57 ` [PATCH v4 1/3] kernfs: remove page_mkwrite() from vm_operations_struct Martin Oliveira
2024-07-08 16:57 ` [PATCH v4 2/3] mm/gup: allow FOLL_LONGTERM & FOLL_PCI_P2PDMA Martin Oliveira
@ 2024-07-08 16:57 ` Martin Oliveira
2 siblings, 0 replies; 6+ messages in thread
From: Martin Oliveira @ 2024-07-08 16:57 UTC (permalink / raw)
To: linux-kernel, linux-mm, linux-rdma
Cc: Andrew Morton, Artemy Kovalyov, Greg Kroah-Hartman,
Jason Gunthorpe, Leon Romanovsky, Logan Gunthorpe,
Martin Oliveira, Michael Guralnik, Mike Marciniszyn,
Shiraz Saleem, Tejun Heo, John Hubbard, Dan Williams,
David Sloan, Jason Gunthorpe
If the device supports P2PDMA, add the FOLL_PCI_P2PDMA flag
This allows ibv_reg_mr() and friends to use P2PDMA memory that has been
mmaped into userspace for MRs in IB and RDMA transactions.
Co-developed-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Martin Oliveira <martin.oliveira@eideticom.com>
Acked-by: Jason Gunthorpe <jgg@nvidia.com>
---
drivers/infiniband/core/umem.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 07c571c7b699..b59bb6e1475e 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -208,6 +208,9 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr,
if (umem->writable)
gup_flags |= FOLL_WRITE;
+ if (ib_dma_pci_p2p_dma_supported(device))
+ gup_flags |= FOLL_PCI_P2PDMA;
+
while (npages) {
cond_resched();
pinned = pin_user_pages_fast(cur_base,
--
2.34.1
^ permalink raw reply [flat|nested] 6+ messages in thread