From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f199.google.com (mail-pf0-f199.google.com [209.85.192.199]) by kanga.kvack.org (Postfix) with ESMTP id E279E680CE3 for ; Tue, 31 Oct 2017 19:28:55 -0400 (EDT) Received: by mail-pf0-f199.google.com with SMTP id d28so505176pfe.1 for ; Tue, 31 Oct 2017 16:28:55 -0700 (PDT) Received: from mga07.intel.com (mga07.intel.com. [134.134.136.100]) by mx.google.com with ESMTPS id a67si2605665pgc.227.2017.10.31.16.28.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Oct 2017 16:28:54 -0700 (PDT) Subject: [PATCH 10/15] IB/core: disable memory registration of fileystem-dax vmas From: Dan Williams Date: Tue, 31 Oct 2017 16:22:29 -0700 Message-ID: <150949214929.24061.10464887309708944817.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <150949209290.24061.6283157778959640151.stgit@dwillia2-desk3.amr.corp.intel.com> References: <150949209290.24061.6283157778959640151.stgit@dwillia2-desk3.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: linux-nvdimm@lists.01.org Cc: Sean Hefty , linux-xfs@vger.kernel.org, akpm@linux-foundation.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Jeff Moyer , stable@vger.kernel.org, hch@lst.de, Jason Gunthorpe , linux-mm@kvack.org, Doug Ledford , linux-fsdevel@vger.kernel.org, Ross Zwisler , Hal Rosenstock Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow RDMA to create long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented, and is limited to non-ODP (On Demand Paging) capable RDMA devices. Cc: Sean Hefty Cc: Doug Ledford Cc: Hal Rosenstock Cc: Jeff Moyer Cc: Christoph Hellwig Cc: Ross Zwisler Cc: Jason Gunthorpe Cc: Cc: Signed-off-by: Dan Williams --- drivers/infiniband/core/umem.c | 49 +++++++++++++++++++++++++++++++--------- 1 file changed, 38 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 21e60b1e2ff4..c30d286c1f24 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -147,19 +147,21 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, umem->hugetlb = 1; page_list = (struct page **) __get_free_page(GFP_KERNEL); - if (!page_list) { - put_pid(umem->pid); - kfree(umem); - return ERR_PTR(-ENOMEM); - } + if (!page_list) + goto err_pagelist; /* - * if we can't alloc the vma_list, it's not so bad; - * just assume the memory is not hugetlb memory + * If DAX is enabled we need the vma to protect against + * registering filesystem-dax memory. Otherwise we can tolerate + * a failure to allocate the vma_list and just assume that all + * vmas are not hugetlb-vmas. */ vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL); - if (!vma_list) + if (!vma_list) { + if (IS_ENABLED(CONFIG_FS_DAX)) + goto err_vmalist; umem->hugetlb = 0; + } npages = ib_umem_num_pages(umem); @@ -199,15 +201,34 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, if (ret < 0) goto out; - umem->npages += ret; cur_base += ret * PAGE_SIZE; npages -= ret; for_each_sg(sg_list_start, sg, ret, i) { - if (vma_list && !is_vm_hugetlb_page(vma_list[i])) - umem->hugetlb = 0; + struct vm_area_struct *vma; + struct inode *inode; sg_set_page(sg, page_list[i], PAGE_SIZE, 0); + umem->npages++; + + if (!vma_list) + continue; + vma = vma_list[i]; + + if (!is_vm_hugetlb_page(vma)) + umem->hugetlb = 0; + + if (!vma_is_dax(vma)) + continue; + + /* device-dax is safe for rdma... */ + inode = file_inode(vma->vm_file); + if (inode->i_mode == S_IFCHR) + continue; + + /* ...filesystem-dax is not. */ + ret = -EOPNOTSUPP; + goto out; } /* preparing for next loop */ @@ -242,6 +263,12 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, free_page((unsigned long) page_list); return ret < 0 ? ERR_PTR(ret) : umem; +err_vmalist: + free_page((unsigned long) page_list); +err_pagelist: + put_pid(umem->pid); + kfree(umem); + return ERR_PTR(-ENOMEM); } EXPORT_SYMBOL(ib_umem_get); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org