linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Oleg Nesterov <oleg@redhat.com>, linux-mm <linux-mm@kvack.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Tue, 23 Feb 2016 08:50:55 +1100	[thread overview]
Message-ID: <20160222215055.GJ25832@dastard> (raw)
In-Reply-To: <x49fuwk7o8a.fsf@segfault.boston.devel.redhat.com>

On Mon, Feb 22, 2016 at 10:34:45AM -0500, Jeff Moyer wrote:
> Hi, Dave,
> 
> Dave Chinner <david@fromorbit.com> writes:
> 
> >> Another potential issue is that MAP_PMEM_AWARE is not enough on its
> >> own.  If the filesystem or inode does not support DAX the application
> >> needs to assume page cache semantics.  At a minimum MAP_PMEM_AWARE
> >> requests would need to fail if DAX is not available.
> >
> > They will always still need to call msync()/fsync() to guarantee
> > data integrity, because the filesystem metadata that indexes the
> > data still needs to be committed before data integrity can be
> > guaranteed. i.e. MAP_PMEM_AWARE by itself it not sufficient for data
> > integrity, and so the app will have to be written like any other app
> > that uses page cache based mmap().
> >
> > Indeed, the application cannot even assume that a fully allocated
> > file does not require msync/fsync because the filesystem may be
> > doing things like dedupe, defrag, copy on write, etc behind the back
> > of the application and so file metadata changes may still be in
> > volatile RAM even though the application has flushed it's data.
> 
> Once you hand out a persistent memory mapping, you sure as heck can't
> switch blocks around behind the back of the application.

Yes we can. All we need to do is lock out page faults, invalidate
the mappings, and change the underlying blocks.  The app using mmap
will refault on it's next access, and get the new block mapped into
it's address space.

I'll point to hole punching as an example of how we do these
invalidate/modify operations right now, and we expect them to work
and not result in data corruption. We even have tests (e.g. fsx in
xfstests has all these operations enabled) to make sure it works.

> That aside, let me see if I understand you correctly.
> 
> An application creates a file and writes to every single block in the
> thing, sync's it, closes it.  It then opens it back up, calls mmap with
> this new MAP_DAX flag or on a file system mounted with -o dax, and
> proceeds to access the file using loads and stores.  It persists its
> data by using non-temporal stores, flushing and fencing cpu
> instructions.

The moment the app does a write to the file data, we can no longer
assume the filesystem metadata references to the file data are
durable.

> If I understand you correctly, you're saying that that application is
> not written correctly, because it needs to call fsync to persist
> metadata (that it presumably did not modify).  Is that right?

Yes, though fdatasync() would be sufficient because the app only
modified data.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-02-22 21:51 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-21 17:03 Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24   ` Boaz Harrosh
2016-02-21 20:57     ` Dan Williams
2016-02-21 21:23       ` Boaz Harrosh
2016-02-21 22:03         ` Dan Williams
2016-02-21 22:31           ` Dave Chinner
2016-02-22  9:57             ` Boaz Harrosh
2016-02-22 15:34             ` Jeff Moyer
2016-02-22 17:44               ` Christoph Hellwig
2016-02-22 17:58                 ` Jeff Moyer
2016-02-22 18:03                   ` Christoph Hellwig
2016-02-22 18:52                     ` Jeff Moyer
2016-02-23  9:45                       ` Christoph Hellwig
2016-02-22 20:05                 ` Rudoff, Andy
2016-02-23  9:52                   ` Christoph Hellwig
2016-02-23 10:07                     ` Rudoff, Andy
2016-02-23 12:06                       ` Dave Chinner
2016-02-23 17:10                         ` Ross Zwisler
2016-02-23 21:47                           ` Dave Chinner
2016-02-23 22:15                             ` Boaz Harrosh
2016-02-23 23:28                               ` Dave Chinner
2016-02-24  0:08                                 ` Boaz Harrosh
2016-02-23 14:10                     ` Boaz Harrosh
2016-02-23 16:56                       ` Dan Williams
2016-02-23 17:05                         ` Ross Zwisler
2016-02-23 17:26                           ` Dan Williams
2016-02-23 21:55                         ` Boaz Harrosh
2016-02-23 22:33                           ` Dan Williams
2016-02-23 23:07                             ` Boaz Harrosh
2016-02-23 23:23                               ` Dan Williams
2016-02-23 23:40                                 ` Boaz Harrosh
2016-02-24  0:08                                   ` Dave Chinner
2016-02-23 23:28                             ` Jeff Moyer
2016-02-23 23:34                               ` Dan Williams
2016-02-23 23:43                                 ` Jeff Moyer
2016-02-23 23:56                                   ` Dan Williams
2016-02-24  4:09                                     ` Ross Zwisler
2016-02-24 19:30                                       ` Ross Zwisler
2016-02-25  9:46                                         ` Jan Kara
2016-02-25  7:44                                       ` Boaz Harrosh
2016-02-24 15:02                                     ` Jeff Moyer
2016-02-24 22:56                                       ` Dave Chinner
2016-02-25 16:24                                         ` Jeff Moyer
2016-02-25 19:11                                           ` Jeff Moyer
2016-02-25 20:15                                             ` Dave Chinner
2016-02-25 20:57                                               ` Jeff Moyer
2016-02-25 22:27                                                 ` Dave Chinner
2016-02-26  4:02                                                   ` Dan Williams
2016-02-26 10:04                                                     ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17                                                       ` Boaz Harrosh
2016-03-03 17:38                                                         ` Howard Chu
2016-02-29 20:25                                                   ` Jeff Moyer
2016-02-25 21:08                                               ` Phil Terry
2016-02-25 21:39                                                 ` Dave Chinner
2016-02-25 21:20                                           ` Dave Chinner
2016-02-29 20:32                                             ` Jeff Moyer
2016-02-23 17:25                       ` Ross Zwisler
2016-02-23 22:47                         ` Boaz Harrosh
2016-02-22 21:50               ` Dave Chinner [this message]
2016-02-23 13:51               ` Boaz Harrosh
2016-02-23 14:22                 ` Jeff Moyer
2016-02-22 11:05           ` Boaz Harrosh
2016-03-11  6:44 ` Andy Lutomirski
2016-03-11 19:07   ` Dan Williams
2016-03-11 19:10     ` Andy Lutomirski
2016-03-11 23:02       ` Rudoff, Andy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160222215055.GJ25832@dastard \
    --to=david@fromorbit.com \
    --cc=arnd@arndb.de \
    --cc=dan.j.williams@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=jmoyer@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox