From: Dan Williams <dan.j.williams@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
linux-xfs@vger.kernel.org, Jan Kara <jack@suse.cz>,
Arnd Bergmann <arnd@arndb.de>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
linux-rdma@vger.kernel.org, Linux API <linux-api@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>,
"J. Bruce Fields" <bfields@fieldses.org>,
Linux MM <linux-mm@kvack.org>, Jeff Moyer <jmoyer@redhat.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Jeff Layton <jlayton@poochiereds.net>,
Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [PATCH v7 06/12] xfs: wire up MAP_DIRECT
Date: Mon, 9 Oct 2017 10:08:40 -0700 [thread overview]
Message-ID: <CAPcyv4i6WBxfVJ0yqWbuW2kiJ-wpi+iYRPk=Kykqt3U5Rrw7MA@mail.gmail.com> (raw)
In-Reply-To: <20171009034030.GH3666@dastard>
On Sun, Oct 8, 2017 at 8:40 PM, Dave Chinner <david@fromorbit.com> wrote:
Thanks for the review Dave.
> On Fri, Oct 06, 2017 at 03:35:49PM -0700, Dan Williams wrote:
>> MAP_DIRECT is an mmap(2) flag with the following semantics:
>>
>> MAP_DIRECT
>> When specified with MAP_SHARED_VALIDATE, sets up a file lease with the
>> same lifetime as the mapping. Unlike a typical F_RDLCK lease this lease
>> is broken when a "lease breaker" attempts to write(2), change the block
>> map (fallocate), or change the size of the file. Otherwise the mechanism
>> of a lease break is identical to the typical lease break case where the
>> lease needs to be removed (munmap) within the number of seconds
>> specified by /proc/sys/fs/lease-break-time. If the lease holder fails to
>> remove the lease in time the kernel will invalidate the mapping and
>> force all future accesses to the mapping to trigger SIGBUS.
>>
>> In addition to lease break timeouts causing faults in the mapping to
>> result in SIGBUS, other states of the file will trigger SIGBUS at fault
>> time:
>>
>> * The file is not DAX capable
>> * The file has reflinked (copy-on-write) blocks
>> * The fault would trigger the filesystem to allocate blocks
>> * The fault would trigger the filesystem to perform extent conversion
>>
>> In other words, MAP_DIRECT expects and enforces a fully allocated file
>> where faults can be satisfied without modifying block map metadata.
>>
>> An unprivileged process may establish a MAP_DIRECT mapping on a file
>> whose UID (owner) matches the filesystem UID of the process. A process
>> with the CAP_LEASE capability may establish a MAP_DIRECT mapping on
>> arbitrary files
>>
>> ERRORS
>> EACCES Beyond the typical mmap(2) conditions that trigger EACCES
>> MAP_DIRECT also requires the permission to set a file lease.
>>
>> EOPNOTSUPP The filesystem explicitly does not support the flag
>>
>> SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that
>> might require block-map updates, or the lease timed out and the
>> kernel invalidated the mapping.
>>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Arnd Bergmann <arnd@arndb.de>
>> Cc: Jeff Moyer <jmoyer@redhat.com>
>> Cc: Christoph Hellwig <hch@lst.de>
>> Cc: Dave Chinner <david@fromorbit.com>
>> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
>> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
>> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
>> Cc: Jeff Layton <jlayton@poochiereds.net>
>> Cc: "J. Bruce Fields" <bfields@fieldses.org>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> fs/xfs/Kconfig | 2 -
>> fs/xfs/xfs_file.c | 102 +++++++++++++++++++++++++++++++++++++++
>> include/linux/mman.h | 3 +
>> include/uapi/asm-generic/mman.h | 1
>> 4 files changed, 106 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
>> index f62fc6629abb..f8765653a438 100644
>> --- a/fs/xfs/Kconfig
>> +++ b/fs/xfs/Kconfig
>> @@ -112,4 +112,4 @@ config XFS_ASSERT_FATAL
>>
>> config XFS_LAYOUT
>> def_bool y
>> - depends on EXPORTFS_BLOCK_OPS
>> + depends on EXPORTFS_BLOCK_OPS || FS_DAX
>> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
>> index ebdd0bd2b261..e35518600e28 100644
>> --- a/fs/xfs/xfs_file.c
>> +++ b/fs/xfs/xfs_file.c
>> @@ -40,12 +40,22 @@
>> #include "xfs_iomap.h"
>> #include "xfs_reflink.h"
>>
>> +#include <linux/mman.h>
>> #include <linux/dcache.h>
>> #include <linux/falloc.h>
>> #include <linux/pagevec.h>
>> +#include <linux/mapdirect.h>
>> #include <linux/backing-dev.h>
>>
>> static const struct vm_operations_struct xfs_file_vm_ops;
>> +static const struct vm_operations_struct xfs_file_vm_direct_ops;
>> +
>> +static inline bool
>> +is_xfs_map_direct(
>> + struct vm_area_struct *vma)
>> +{
>> + return vma->vm_ops == &xfs_file_vm_direct_ops;
>> +}
>
> Namespacing (xfs_vma_is_direct) and whitespace damage.
Will fix.
>
>>
>> /*
>> * Clear the specified ranges to zero through either the pagecache or DAX.
>> @@ -1008,6 +1018,26 @@ xfs_file_llseek(
>> return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
>> }
>>
>> +static int
>> +xfs_vma_checks(
>> + struct vm_area_struct *vma,
>> + struct inode *inode)
>
> Exactly what are we checking for - function name doesn't tell me,
> and there's no comments, either?
Ok, I'll improve this.
>
>> +{
>> + if (!is_xfs_map_direct(vma))
>> + return 0;
>> +
>> + if (!is_map_direct_valid(vma->vm_private_data))
>> + return VM_FAULT_SIGBUS;
>> +
>> + if (xfs_is_reflink_inode(XFS_I(inode)))
>> + return VM_FAULT_SIGBUS;
>> +
>> + if (!IS_DAX(inode))
>> + return VM_FAULT_SIGBUS;
>
> And how do we get is_xfs_map_direct() set to true if we don't have a
> DAX inode or the inode has shared extents?
So, this was my way of trying to satisfy the request you made here:
https://lkml.org/lkml/2017/8/11/876
i.e. allow MAP_DIRECT on non-dax files to enable a use case of
freezing the block-map to examine which file extents are linked. If
you don't want to use MAP_DIRECT for this, we can move these checks to
mmap time.
>
>> +
>> + return 0;
>> +}
>> +
>> /*
>> * Locking for serialisation of IO during page faults. This results in a lock
>> * ordering of:
>> @@ -1024,6 +1054,7 @@ __xfs_filemap_fault(
>> enum page_entry_size pe_size,
>> bool write_fault)
>> {
>> + struct vm_area_struct *vma = vmf->vma;
>> struct inode *inode = file_inode(vmf->vma->vm_file);
>
> You missed this vmf->vma....
>
> .....
>>
>> +#define XFS_MAP_SUPPORTED (LEGACY_MAP_MASK | MAP_DIRECT)
>> +
>> +STATIC int
>> +xfs_file_mmap_validate(
>> + struct file *filp,
>> + struct vm_area_struct *vma,
>> + unsigned long map_flags,
>> + int fd)
>> +{
>> + struct inode *inode = file_inode(filp);
>> + struct xfs_inode *ip = XFS_I(inode);
>> + struct map_direct_state *mds;
>> +
>> + if (map_flags & ~(XFS_MAP_SUPPORTED))
>> + return -EOPNOTSUPP;
>> +
>> + if ((map_flags & MAP_DIRECT) == 0)
>> + return xfs_file_mmap(filp, vma);
>> +
>> + file_accessed(filp);
>> + vma->vm_ops = &xfs_file_vm_direct_ops;
>> + if (IS_DAX(inode))
>> + vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
>
> And if it isn't a DAX inode? what is MAP_DIRECT supposed to do then?
In the non-DAX case it just takes the FL_LAYOUT file lease... although
we could also just have an fcntl for that purpose. The use case of
just freezing the block map does not need a mapping.
>> + mds = map_direct_register(fd, vma);
>> + if (IS_ERR(mds))
>> + return PTR_ERR(mds);
>> +
>> + /* flush in-flight faults */
>> + xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
>> + xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
>
> Urk. That's nasty. And why is it even necessary? Please explain why
> this is necessary in the comment, because it's not at all obvious to
> me...
This is related to your other observation about i_mapdcount and adding
an iomap_can_allocate() helper. I think I can clean both of these up
by using a call to break_layout(inode, false) and bailing in
->iomap_begin() if it returns EWOULDBLOCK. This would also fix the
current problem that allocating write-faults don't start the lease
break process.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-09 17:08 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-06 22:35 [PATCH v7 00/12] MAP_DIRECT for DAX RDMA and userspace flush Dan Williams
2017-10-06 22:35 ` [PATCH v7 01/12] mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags Dan Williams
2017-10-06 22:35 ` [PATCH v7 02/12] fs, mm: pass fd to ->mmap_validate() Dan Williams
2017-10-06 22:35 ` [PATCH v7 03/12] fs: introduce i_mapdcount Dan Williams
2017-10-09 3:08 ` Dave Chinner
2017-10-06 22:35 ` [PATCH v7 04/12] fs: MAP_DIRECT core Dan Williams
2017-10-06 22:35 ` [PATCH v7 05/12] xfs: prepare xfs_break_layouts() for reuse with MAP_DIRECT Dan Williams
2017-10-06 22:35 ` [PATCH v7 06/12] xfs: wire up MAP_DIRECT Dan Williams
2017-10-09 3:40 ` Dave Chinner
2017-10-09 17:08 ` Dan Williams [this message]
2017-10-09 22:50 ` Dave Chinner
2017-10-06 22:35 ` [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Dan Williams
2017-10-06 22:45 ` David Woodhouse
2017-10-06 22:52 ` Dan Williams
2017-10-06 23:10 ` David Woodhouse
2017-10-06 23:15 ` Dan Williams
2017-10-07 11:08 ` David Woodhouse
2017-10-07 23:33 ` Dan Williams
2017-10-06 23:12 ` Dan Williams
2017-10-08 3:45 ` [PATCH v8] dma-mapping: introduce dma_get_iommu_domain() Dan Williams
2017-10-09 10:37 ` Robin Murphy
2017-10-09 17:32 ` Dan Williams
2017-10-10 14:40 ` Raj, Ashok
2017-10-09 18:58 ` [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Jason Gunthorpe
2017-10-09 19:05 ` Dan Williams
2017-10-09 19:18 ` Jason Gunthorpe
2017-10-09 19:28 ` Dan Williams
2017-10-10 17:25 ` Jason Gunthorpe
2017-10-10 17:39 ` Dan Williams
2017-10-10 18:05 ` Jason Gunthorpe
2017-10-10 20:17 ` Dan Williams
2017-10-12 18:27 ` Jason Gunthorpe
2017-10-12 20:10 ` Dan Williams
2017-10-13 6:50 ` Christoph Hellwig
2017-10-13 15:03 ` Jason Gunthorpe
2017-10-15 15:14 ` Matan Barak
2017-10-15 15:21 ` Dan Williams
2017-10-13 7:09 ` Christoph Hellwig
2017-10-06 22:36 ` [PATCH v7 08/12] fs, mapdirect: introduce ->lease_direct() Dan Williams
2017-10-06 22:36 ` [PATCH v7 09/12] xfs: wire up ->lease_direct() Dan Williams
2017-10-09 3:45 ` Dave Chinner
2017-10-09 17:10 ` Dan Williams
2017-10-06 22:36 ` [PATCH v7 10/12] device-dax: " Dan Williams
2017-10-06 22:36 ` [PATCH v7 11/12] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings Dan Williams
2017-10-08 4:02 ` [PATCH v8 1/2] iommu: up-level sg_num_pages() from amd-iommu Dan Williams
2017-10-08 4:04 ` [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings Dan Williams
2017-10-08 6:45 ` kbuild test robot
2017-10-08 15:49 ` Dan Williams
2017-10-06 22:36 ` [PATCH v7 12/12] tools/testing/nvdimm: enable rdma unit tests Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPcyv4i6WBxfVJ0yqWbuW2kiJ-wpi+iYRPk=Kykqt3U5Rrw7MA@mail.gmail.com' \
--to=dan.j.williams@intel.com \
--cc=arnd@arndb.de \
--cc=bfields@fieldses.org \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jlayton@poochiereds.net \
--cc=jmoyer@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ross.zwisler@linux.intel.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox