linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave@sr71.net>, Toshi Kani <toshi.kani@hpe.com>,
	David Airlie <airlied@linux.ie>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>, Linux MM <linux-mm@kvack.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Christoph Hellwig <hch@lst.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	kbuild test robot <lkp@intel.com>,
	linux-nvdimm <linux-nvdimm@ml01.01.org>,
	Richard Weinberger <richard@nod.at>, X86 ML <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jeff Dike <jdike@addtoit.com>, Jens Axboe <axboe@fb.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	Jan Kara <jack@suse.com>, Paolo Bonzini <pbonzini@redhat.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [-mm PATCH v2 00/25] get_user_pages() for dax pte and pmd mappings
Date: Thu, 10 Dec 2015 18:03:31 -0800	[thread overview]
Message-ID: <CAPcyv4jtF2LwK3jbsjPHB7=JE1O0-TkRQGQcMSrB9bPZVdFd8A@mail.gmail.com> (raw)
In-Reply-To: <x49fuzat8k9.fsf@segfault.boston.devel.redhat.com>

On Thu, Dec 10, 2015 at 11:20 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> Dan Williams <dan.j.williams@intel.com> writes:
>
>> On Thu, Dec 10, 2015 at 10:08 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
>>> Dan Williams <dan.j.williams@intel.com> writes:
>>>
>>>> Summary:
>>>>
>>>> To date, we have implemented two I/O usage models for persistent memory,
>>>> PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
>>>> userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
>>>> to be the target of direct-i/o.  It allows userspace to coordinate
>>>> DMA/RDMA from/to persistent memory.
>>>>
>>>> The implementation leverages the ZONE_DEVICE mm-zone that went into
>>>> 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
>>>> and dynamically mapped by a device driver.  The pmem driver, after
>>>> mapping a persistent memory range into the system memmap via
>>>> devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
>>>> page-backed pmem-pfns via flags in the new pfn_t type.
>>>
>>> So, this basically means that an admin has to decide whether or not DMA
>>> will be used on a given device before making a file system on it.  That
>>> seems like an odd requirement.  There's also a configuration option of
>>> whether to put those backing struct pages into DRAM or PMEM (which, of
>>> course, will be dictated by the size of pmem).  I really think we should
>>> reconsider this approach.
>>>
>>> First, the admin shouldn't have to choose whether or not DMA will be
>>> done on the file system.
>>
>> To be clear it's not "whether or not DMA will be done on the file
>> system", it's whether or not both DMA and DAX will be done
>> simultaneously on the filesystem.
>
> Fair point, but I'd view one of those configurations as not recommended.
> To be clear, if you're just going to use the device for block based
> access, using btt is the safer option.

Speaking of btt, the mechanism for setting up a btt is identical to
specifying a reserved area for the memmap.  I.e. write an info block
to the namespace to specify a new mode of operation.

>> DAX is already a capability that an admin can inadvertently disable by
>> mis-configuring the alignment of a partition [1].
>
> Heh, using my own commit against me? ;-) Anyway, the commit message
> suggests that dax *could* be supported on misaligned partitions.

All's fair in love, war, and code defense. :-)

>> Why not also disable it when DMA support is not configured and force
>> the fs back to page-cache?  Namespace creation tooling in userspace
>> can default to enabling DAX + DMA.
>
> Well, the only reason I can come up with is manufactured:  we've forced
> the admin to decide between having that extra space for storage and
> doing DMA, and he or she opted for more space.

Is this any worse than the "forcing" we're imposing in the btt /
no-btt decision that impacts DAX?  This additional configuration
flexibility for whether / where to store a memmap array is merely
incremental, not fatal.  It's also a configuration decision we can
stop asking an admin to make when / if we ever re-write the kernel to
reduce its dependency on struct page.

In the meantime, I expect some would say DAX is a toy as long as it
continues to fail at DMA.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-12-11  2:03 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-10  2:37 Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 01/25] pmem, dax: clean up clear_pmem() Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 02/25] dax: increase granularity of dax_clear_blocks() operations Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 03/25] dax: guarantee page aligned results from bdev_direct_access() Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 04/25] dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() Dan Williams
2015-12-11 18:11   ` [-mm PATCH v3 " Dan Williams
2015-12-17 22:00     ` Ross Zwisler
2015-12-17 22:16       ` Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 05/25] mm, dax: fix livelock, allow dax pmd mappings to become writeable Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 06/25] dax: Split pmd map when fallback on COW Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 07/25] um: kill pfn_t Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 08/25] kvm: rename pfn_t to kvm_pfn_t Dan Williams
2015-12-10  2:37 ` [-mm PATCH v2 09/25] mm, dax, pmem: introduce pfn_t Dan Williams
2015-12-11 18:22   ` [-mm PATCH v3 " Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 10/25] mm: introduce find_dev_pagemap() Dan Williams
2015-12-11 18:27   ` [-mm PATCH v3 " Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 11/25] x86, mm: introduce vmem_altmap to augment vmemmap_populate() Dan Williams
2015-12-15 16:50   ` Dan Williams
2015-12-15 23:28   ` Andrew Morton
2015-12-15 23:37     ` Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 12/25] libnvdimm, pfn, pmem: allocate memmap array in persistent memory Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 13/25] avr32: convert to asm-generic/memory_model.h Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 14/25] hugetlb: fix compile error on tile Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 15/25] frv: fix compiler warning from definition of __pmd() Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 16/25] x86, mm: introduce _PAGE_DEVMAP Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 17/25] mm, dax, gpu: convert vm_insert_mixed to pfn_t Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 18/25] mm, dax: convert vmf_insert_pfn_pmd() " Dan Williams
2015-12-10  2:38 ` [-mm PATCH v2 19/25] list: introduce list_del_poison() Dan Williams
2015-12-15 23:41   ` Andrew Morton
2015-12-16  0:17     ` Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 20/25] libnvdimm, pmem: move request_queue allocation earlier in probe Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 21/25] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup Dan Williams
2015-12-15 23:46   ` Andrew Morton
2015-12-10  2:39 ` [-mm PATCH v2 22/25] mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 23/25] mm, x86: get_user_pages() for dax mappings Dan Williams
2015-12-16  0:14   ` Andrew Morton
2015-12-16  2:18     ` Dan Williams
2015-12-18  0:09       ` Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 24/25] dax: provide diagnostics for pmd mapping failures Dan Williams
2015-12-10  2:39 ` [-mm PATCH v2 25/25] dax: re-enable dax pmd mappings Dan Williams
2015-12-10 18:08 ` [-mm PATCH v2 00/25] get_user_pages() for dax pte and " Jeff Moyer
2015-12-10 18:56   ` Dan Williams
2015-12-10 19:20     ` Jeff Moyer
2015-12-11  2:03       ` Dan Williams [this message]
2015-12-14 14:52         ` Jeff Moyer
2015-12-14 16:44           ` Dan Williams
2015-12-11 18:44 ` Dan Williams
2015-12-15  1:59   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4jtF2LwK3jbsjPHB7=JE1O0-TkRQGQcMSrB9bPZVdFd8A@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@linux.ie \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@fb.com \
    --cc=christoffer.dall@linaro.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@sr71.net \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=jack@suse.com \
    --cc=jdike@addtoit.com \
    --cc=jmoyer@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=lkp@intel.com \
    --cc=logang@deltatee.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richard@nod.at \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tglx@linutronix.de \
    --cc=toshi.kani@hpe.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox