From: Andy Lutomirski <luto@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
andy.rudoff@intel.com
Cc: Andy Lutomirski <luto@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>, linux-nvdimm <linux-nvdimm@lists.01.org>,
Linux API <linux-api@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Jeff Moyer <jmoyer@redhat.com>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem
Date: Sat, 17 Jun 2017 22:05:45 -0700 [thread overview]
Message-ID: <CALCETrVY38h2ajpod2U_2pdHSp8zO4mG2p19h=OnnHmhGTairw@mail.gmail.com> (raw)
In-Reply-To: <CAPcyv4iPb69e+rE3fJUzm9U_P_dLfhantU9mvYmV-R0oQee4rA@mail.gmail.com>
On Sat, Jun 17, 2017 at 8:15 PM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Sat, Jun 17, 2017 at 4:50 PM, Andy Lutomirski <luto@kernel.org> wrote:
>> My other objection is that the syscall intentionally leaks a reference
>> to the file. This means it needs overflow protection and it probably
>> shouldn't ever be allowed to use it without privilege.
>
> We only hold the one reference while S_DAXFILE is set, so I think the
> protection is there, and per Dave's original proposal this requires
> CAP_LINUX_IMMUTABLE.
>
>> Why can't the underlying issue be easily fixed, though? Could
>> .page_mkwrite just make sure that metadata is synced when the FS uses
>> DAX?
>
> Yes, it most definitely could and that idea has been floated.
>
>> On a DAX fs, syncing metadata should be extremely fast. This
>> could be conditioned on an madvise or mmap flag if performance might
>> be an issue. As far as I know, this change alone should be
>> sufficient.
>
> The hang up is that it requires per-fs enabling as it needs to be
> careful to manage mmap_sem vs fs journal locks for example. I know the
> in-development NOVA [1] filesystem is planning to support this out of
> the gate. ext4 would be open to implementing it, but I think xfs is
> cold on the idea. Christoph originally proposed it here [2], before
> Dave went on to propose immutable semantics.
Hmm. Given a choice between a very clean API that works without
privilege but is awkward to implement on XFS and an awkward-to-use
API, I'd personally choose the former.
Dave, even with the lock ordering issue, couldn't XFS implement
MAP_PMEM_AWARE by having .page_mkwrite work roughly like this:
if (metadata is dirty) {
up_write(&mmap_sem);
sync the metadata;
down_write(&mmap_sem);
return 0; /* retry the fault */
} else {
return whatever success code;
}
This might require returning VM_FAULT_RETRY instead of 0 and it might
require auditing the core mm code to make sure that it can handle
mmap_sem being dropped like this. I don't see why it couldn't work in
principle, though.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-06-18 5:06 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-17 1:15 [RFC PATCH 0/2] daxfile: enable " Dan Williams
2017-06-17 1:15 ` [RFC PATCH 1/2] mm: introduce bmap_walk() Dan Williams
2017-06-17 5:22 ` Christoph Hellwig
2017-06-17 12:29 ` Dan Williams
2017-06-18 7:51 ` Christoph Hellwig
2017-06-19 16:18 ` Darrick J. Wong
2017-06-19 18:19 ` Al Viro
2017-06-20 7:34 ` Christoph Hellwig
2017-06-17 1:15 ` [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem Dan Williams
2017-06-17 16:25 ` Andy Lutomirski
2017-06-17 21:52 ` Dan Williams
2017-06-17 23:50 ` Andy Lutomirski
2017-06-18 3:15 ` Dan Williams
2017-06-18 5:05 ` Andy Lutomirski [this message]
2017-06-19 13:21 ` Dave Chinner
2017-06-19 15:22 ` Andy Lutomirski
2017-06-20 0:46 ` Dave Chinner
2017-06-20 5:53 ` Andy Lutomirski
2017-06-20 8:49 ` Christoph Hellwig
2017-06-20 16:17 ` Dan Williams
2017-06-20 16:26 ` Andy Lutomirski
2017-06-20 23:53 ` Dave Chinner
2017-06-21 1:24 ` Darrick J. Wong
2017-06-21 2:19 ` Dave Chinner
2017-06-20 10:11 ` Dave Chinner
2017-06-20 16:14 ` Andy Lutomirski
2017-06-21 1:40 ` Dave Chinner
2017-06-21 5:18 ` Andy Lutomirski
2017-06-22 0:02 ` Dave Chinner
2017-06-22 4:07 ` Andy Lutomirski
2017-06-23 0:52 ` Dave Chinner
2017-06-23 3:07 ` Andy Lutomirski
2017-06-18 8:18 ` Christoph Hellwig
2017-06-19 1:51 ` Dan Williams
2017-06-20 5:22 ` Darrick J. Wong
2017-06-20 15:42 ` Ross Zwisler
2017-06-22 7:09 ` Darrick J. Wong
2017-06-21 23:37 ` Dave Chinner
2017-06-22 7:23 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALCETrVY38h2ajpod2U_2pdHSp8zO4mG2p19h=OnnHmhGTairw@mail.gmail.com' \
--to=luto@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andy.rudoff@intel.com \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jmoyer@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=ross.zwisler@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox