From: Dan Williams <dan.j.williams@intel.com>
To: akpm@linux-foundation.org
Cc: Sean Hefty <sean.hefty@intel.com>,
linux-xfs@vger.kernel.org, linux-nvdimm@lists.01.org,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
Jeff Moyer <jmoyer@redhat.com>,
hch@lst.de, Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
linux-mm@kvack.org, Doug Ledford <dledford@redhat.com>,
linux-fsdevel@vger.kernel.org,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Hal Rosenstock <hal.rosenstock@gmail.com>
Subject: [PATCH v3 09/13] IB/core: disable memory registration of fileystem-dax vmas
Date: Thu, 19 Oct 2017 19:39:45 -0700 [thread overview]
Message-ID: <150846718583.24336.7817353741483230017.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com>
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas. Device-dax vmas do not have this problem and
are explicitly allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented, and is limited to non-ODP (On Demand
Paging) capable RDMA devices.
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: <linux-rdma@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/infiniband/core/umem.c | 49 +++++++++++++++++++++++++++++++---------
1 file changed, 38 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 21e60b1e2ff4..c30d286c1f24 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -147,19 +147,21 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
umem->hugetlb = 1;
page_list = (struct page **) __get_free_page(GFP_KERNEL);
- if (!page_list) {
- put_pid(umem->pid);
- kfree(umem);
- return ERR_PTR(-ENOMEM);
- }
+ if (!page_list)
+ goto err_pagelist;
/*
- * if we can't alloc the vma_list, it's not so bad;
- * just assume the memory is not hugetlb memory
+ * If DAX is enabled we need the vma to protect against
+ * registering filesystem-dax memory. Otherwise we can tolerate
+ * a failure to allocate the vma_list and just assume that all
+ * vmas are not hugetlb-vmas.
*/
vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
- if (!vma_list)
+ if (!vma_list) {
+ if (IS_ENABLED(CONFIG_FS_DAX))
+ goto err_vmalist;
umem->hugetlb = 0;
+ }
npages = ib_umem_num_pages(umem);
@@ -199,15 +201,34 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
if (ret < 0)
goto out;
- umem->npages += ret;
cur_base += ret * PAGE_SIZE;
npages -= ret;
for_each_sg(sg_list_start, sg, ret, i) {
- if (vma_list && !is_vm_hugetlb_page(vma_list[i]))
- umem->hugetlb = 0;
+ struct vm_area_struct *vma;
+ struct inode *inode;
sg_set_page(sg, page_list[i], PAGE_SIZE, 0);
+ umem->npages++;
+
+ if (!vma_list)
+ continue;
+ vma = vma_list[i];
+
+ if (!is_vm_hugetlb_page(vma))
+ umem->hugetlb = 0;
+
+ if (!vma_is_dax(vma))
+ continue;
+
+ /* device-dax is safe for rdma... */
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ continue;
+
+ /* ...filesystem-dax is not. */
+ ret = -EOPNOTSUPP;
+ goto out;
}
/* preparing for next loop */
@@ -242,6 +263,12 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
free_page((unsigned long) page_list);
return ret < 0 ? ERR_PTR(ret) : umem;
+err_vmalist:
+ free_page((unsigned long) page_list);
+err_pagelist:
+ put_pid(umem->pid);
+ kfree(umem);
+ return ERR_PTR(-ENOMEM);
}
EXPORT_SYMBOL(ib_umem_get);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-20 2:46 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-20 2:38 [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support Dan Williams
2017-10-20 2:39 ` [PATCH v3 01/13] dax: quiet bdev_dax_supported() Dan Williams
2017-10-20 2:39 ` [PATCH v3 02/13] dax: require 'struct page' for filesystem dax Dan Williams
2017-10-20 7:57 ` Christoph Hellwig
2017-10-20 15:23 ` Dan Williams
2017-10-20 16:29 ` Christoph Hellwig
2017-10-20 22:29 ` Dan Williams
2017-10-21 3:20 ` Matthew Wilcox
2017-10-21 4:16 ` Dan Williams
2017-10-21 8:15 ` Christoph Hellwig
2017-10-23 5:18 ` Martin Schwidefsky
2017-10-23 8:55 ` Dan Williams
2017-10-23 10:44 ` Martin Schwidefsky
2017-10-23 11:20 ` Dan Williams
2017-10-20 2:39 ` [PATCH v3 03/13] dax: stop using VM_MIXEDMAP for dax Dan Williams
2017-10-20 2:39 ` [PATCH v3 04/13] dax: stop using VM_HUGEPAGE " Dan Williams
2017-10-20 2:39 ` [PATCH v3 05/13] dax: stop requiring a live device for dax_flush() Dan Williams
2017-10-20 2:39 ` [PATCH v3 06/13] dax: store pfns in the radix Dan Williams
2017-10-20 2:39 ` [PATCH v3 07/13] dax: warn if dma collides with truncate Dan Williams
2017-10-20 2:39 ` [PATCH v3 08/13] tools/testing/nvdimm: add 'bio_delay' mechanism Dan Williams
2017-10-20 2:39 ` Dan Williams [this message]
2017-10-20 2:39 ` [PATCH v3 10/13] mm: disable get_user_pages_fast() for dax Dan Williams
2017-10-20 2:39 ` [PATCH v3 11/13] fs: use smp_load_acquire in break_{layout,lease} Dan Williams
2017-10-20 12:39 ` Jeffrey Layton
2017-10-20 2:40 ` [PATCH v3 12/13] dax: handle truncate of dma-busy pages Dan Williams
2017-10-20 13:05 ` Jeff Layton
2017-10-20 15:42 ` Dan Williams
2017-10-20 16:32 ` Christoph Hellwig
2017-10-20 17:27 ` Dan Williams
2017-10-20 20:36 ` Brian Foster
2017-10-21 8:11 ` Christoph Hellwig
2017-10-20 2:40 ` [PATCH v3 13/13] xfs: wire up FL_ALLOCATED support Dan Williams
2017-10-20 7:47 ` [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support Christoph Hellwig
2017-10-20 9:31 ` Christoph Hellwig
2017-10-26 10:58 ` Jan Kara
2017-10-26 23:51 ` Williams, Dan J
2017-10-27 6:48 ` Dave Chinner
2017-10-27 11:42 ` Dan Williams
2017-10-29 21:52 ` Dave Chinner
2017-10-27 6:45 ` Christoph Hellwig
2017-10-29 23:46 ` Dan Williams
2017-10-30 2:00 ` Dave Chinner
2017-10-30 8:38 ` Jan Kara
2017-10-30 11:20 ` Dave Chinner
2017-10-30 17:51 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=150846718583.24336.7817353741483230017.stgit@dwillia2-desk3.amr.corp.intel.com \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dledford@redhat.com \
--cc=hal.rosenstock@gmail.com \
--cc=hch@lst.de \
--cc=jgunthorpe@obsidianresearch.com \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ross.zwisler@linux.intel.com \
--cc=sean.hefty@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox