linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Boaz Harrosh <boaz@plexistor.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Matthew Wilcox <willy@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Oleg Nesterov <oleg@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm <linux-mm@kvack.org>, Arnd Bergmann <arnd@arndb.de>
Subject: Re: [RFC 0/2] New MAP_PMEM_AWARE mmap flag
Date: Sun, 21 Feb 2016 14:03:43 -0800	[thread overview]
Message-ID: <CAPcyv4gQV9Oh9OpHTGuGfTJ_s1C_L7J-VGyto3JMdAcgqyVeAw@mail.gmail.com> (raw)
In-Reply-To: <56CA2AC9.7030905@plexistor.com>

On Sun, Feb 21, 2016 at 1:23 PM, Boaz Harrosh <boaz@plexistor.com> wrote:
> On 02/21/2016 10:57 PM, Dan Williams wrote:
>> On Sun, Feb 21, 2016 at 12:24 PM, Boaz Harrosh <boaz@plexistor.com> wrote:
>>> On 02/21/2016 09:51 PM, Dan Williams wrote:
>>> <>
>>>>> Please advise?
>>>>
>>>> When this came up a couple weeks ago [1], the conclusion I came away
>>>> with is
>>>
>>> I think I saw that talk, no this was not suggested. What was suggested
>>> was an FS / mount knob. That would break semantics, this here does not
>>> break anything.
>>
>> No, it was a MAP_DAX mmap flag, similar to this proposal.  The
>> difference being that MAP_DAX was all or nothing (DAX vs page cache)
>> to address MAP_SHARED semantics.
>>
>
> Big difference no? I'm not talking about cached access at all.
>
>>>
>>>> that if an application wants to avoid the overhead of DAX
>>>> semantics it needs to use an alternative to DAX access methods.  Maybe
>>>> a new pmem aware fs like Nova [2], or some other mechanism that
>>>> bypasses the semantics that existing applications on top of ext4 and
>>>> xfs expect.
>>>>
>>>
>>> But my suggestion does not break any "existing applications" and does
>>> not break any semantics of ext4 or xfs. (That I can see)
>>>
>>> As I said above it perfectly co exists with existing applications and
>>> is the best of both worlds. The both applications can write to the
>>> same page and will not break any of application's expectation. Old or
>>> new.
>>>
>>> Please point me to where I'm wrong in the code submitted?
>>>
>>> Besides even an FS like Nova will need a flag per vma like this,
>>> it will need to sort out the different type of application. So
>>> here is how this is communicated, on the mmap call, how else?
>>> And also works for xfs or ext4
>>>
>>> Do you not see how this is entirely different then what was
>>> proposed? or am I totally missing something? Again please show
>>> me how this breaks anything's expectations.
>>>
>>
>> What happens for MAP_SHARED mappings with mixed pmem aware/unaware
>> applications?  Does MAP_PMEM_AWARE also imply awareness of other
>> applications that may be dirtying cachelines without taking
>> responsibility for making them persistent?
>>
>
> Sure. please have a look. What happens is that the legacy app
> will add the page to the radix tree, come the fsync it will be
> flushed. Even though a "new-type" app might fault on the same page
> before or after, which did not add it to the radix tree.
> So yes, all pages faulted by legacy apps will be flushed.
>
> I have manually tested all this and it seems to work. Can you see
> a theoretical scenario where it would not?

I'm worried about the scenario where the pmem aware app assumes that
none of the cachelines in its mapping are dirty when it goes to issue
pcommit.  We'll have two applications with different perceptions of
when writes are durable.  Maybe it's not a problem in practice, at
least current generation x86 cpus flush existing dirty cachelines when
performing non-temporal stores.  However, it bothers me that there are
cpus where a pmem-unaware app could prevent a pmem-aware app from
making writes durable.  It seems if one app has established a
MAP_PMEM_AWARE mapping it needs guarantees that all apps participating
in that shared mapping have the same awareness.

Another potential issue is that MAP_PMEM_AWARE is not enough on its
own.  If the filesystem or inode does not support DAX the application
needs to assume page cache semantics.  At a minimum MAP_PMEM_AWARE
requests would need to fail if DAX is not available.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-21 22:03 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-21 17:03 Boaz Harrosh
2016-02-21 17:04 ` [RFC 1/2] mmap: Define a new " Boaz Harrosh
2016-02-21 17:06 ` [RFC 2/2] dax: Support " Boaz Harrosh
2016-02-21 19:51 ` [RFC 0/2] New " Dan Williams
2016-02-21 20:24   ` Boaz Harrosh
2016-02-21 20:57     ` Dan Williams
2016-02-21 21:23       ` Boaz Harrosh
2016-02-21 22:03         ` Dan Williams [this message]
2016-02-21 22:31           ` Dave Chinner
2016-02-22  9:57             ` Boaz Harrosh
2016-02-22 15:34             ` Jeff Moyer
2016-02-22 17:44               ` Christoph Hellwig
2016-02-22 17:58                 ` Jeff Moyer
2016-02-22 18:03                   ` Christoph Hellwig
2016-02-22 18:52                     ` Jeff Moyer
2016-02-23  9:45                       ` Christoph Hellwig
2016-02-22 20:05                 ` Rudoff, Andy
2016-02-23  9:52                   ` Christoph Hellwig
2016-02-23 10:07                     ` Rudoff, Andy
2016-02-23 12:06                       ` Dave Chinner
2016-02-23 17:10                         ` Ross Zwisler
2016-02-23 21:47                           ` Dave Chinner
2016-02-23 22:15                             ` Boaz Harrosh
2016-02-23 23:28                               ` Dave Chinner
2016-02-24  0:08                                 ` Boaz Harrosh
2016-02-23 14:10                     ` Boaz Harrosh
2016-02-23 16:56                       ` Dan Williams
2016-02-23 17:05                         ` Ross Zwisler
2016-02-23 17:26                           ` Dan Williams
2016-02-23 21:55                         ` Boaz Harrosh
2016-02-23 22:33                           ` Dan Williams
2016-02-23 23:07                             ` Boaz Harrosh
2016-02-23 23:23                               ` Dan Williams
2016-02-23 23:40                                 ` Boaz Harrosh
2016-02-24  0:08                                   ` Dave Chinner
2016-02-23 23:28                             ` Jeff Moyer
2016-02-23 23:34                               ` Dan Williams
2016-02-23 23:43                                 ` Jeff Moyer
2016-02-23 23:56                                   ` Dan Williams
2016-02-24  4:09                                     ` Ross Zwisler
2016-02-24 19:30                                       ` Ross Zwisler
2016-02-25  9:46                                         ` Jan Kara
2016-02-25  7:44                                       ` Boaz Harrosh
2016-02-24 15:02                                     ` Jeff Moyer
2016-02-24 22:56                                       ` Dave Chinner
2016-02-25 16:24                                         ` Jeff Moyer
2016-02-25 19:11                                           ` Jeff Moyer
2016-02-25 20:15                                             ` Dave Chinner
2016-02-25 20:57                                               ` Jeff Moyer
2016-02-25 22:27                                                 ` Dave Chinner
2016-02-26  4:02                                                   ` Dan Williams
2016-02-26 10:04                                                     ` Thanumalayan Sankaranarayana Pillai
2016-02-28 10:17                                                       ` Boaz Harrosh
2016-03-03 17:38                                                         ` Howard Chu
2016-02-29 20:25                                                   ` Jeff Moyer
2016-02-25 21:08                                               ` Phil Terry
2016-02-25 21:39                                                 ` Dave Chinner
2016-02-25 21:20                                           ` Dave Chinner
2016-02-29 20:32                                             ` Jeff Moyer
2016-02-23 17:25                       ` Ross Zwisler
2016-02-23 22:47                         ` Boaz Harrosh
2016-02-22 21:50               ` Dave Chinner
2016-02-23 13:51               ` Boaz Harrosh
2016-02-23 14:22                 ` Jeff Moyer
2016-02-22 11:05           ` Boaz Harrosh
2016-03-11  6:44 ` Andy Lutomirski
2016-03-11 19:07   ` Dan Williams
2016-03-11 19:10     ` Andy Lutomirski
2016-03-11 23:02       ` Rudoff, Andy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4gQV9Oh9OpHTGuGfTJ_s1C_L7J-VGyto3JMdAcgqyVeAw@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=arnd@arndb.de \
    --cc=boaz@plexistor.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox