Re: [HMM v16 04/15] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory v2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [HMM v16 04/15] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory v2
Date: Mon, 16 Jan 2017 16:58:24 -0800	[thread overview]
Message-ID: <CAPcyv4gLrykv-Dn9dKM-8kDVdYwtRU4XDXt+OndYAnrzP73U6g@mail.gmail.com> (raw)
In-Reply-To: <20170116201311.GB4182@redhat.com>

On Mon, Jan 16, 2017 at 12:13 PM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Mon, Jan 16, 2017 at 11:31:39AM -0800, Dan Williams wrote:
[..]
>> >> dev_pagemap is only meant for get_user_pages() to do lookups of ptes
>> >> with _PAGE_DEVMAP and take a reference against the hosting device..
>> >
>> > And i want to build on top of that to extend _PAGE_DEVMAP to support
>> > a new usecase for unaddressable device memory.
>> >
>> >>
>> >> Why can't HMM use the typical vm_operations_struct fault path and push
>> >> more of these details to a driver rather than the core?
>> >
>> > Because the vm_operations_struct has nothing to do with the device.
>> > We are talking about regular vma here. Think malloc, mmap, share
>> > memory, ...  not about mmap(/dev/thedevice,...)
>> >
>> > So the vm_operations_struct is never under device control and we can
>> > not, nor want to, rely on that.
>>
>> Can you explain more what's behind that "can not, nor want to"
>> statement? It seems to me that any awkwardness of moving to a
>> standalone device file interface is less than a maintaining a new /
>> parallel mm fault path through dev_pagemap.
>
> The whole point of HMM is to allow transparent usage of process address
> space on to a device like GPU. So it imply any vma (vm_area_struct) that
> result from usual mmap (ie any mmap either PRIVATE or SHARE as long as it
> is not a an mmap of a device file).
>
> It means that application can use malloc or the usual memory allocation
> primitive of the langage (c++, rust, python, ...) and directly use the
> memory it gets from that with the device.

So you need 100% support of all these mm paths for this hardware to be
useful at all? Does a separate device-driver and a userpace helper
library get you something like 80% of the functionality and then we
can debate the core mm changes to get the final 20%? Or am I just
completely off base with how people want to use this hardware?

> Device like GPU have a large pool of device memory that is not accessible
> by the CPU. This device memory has 10 times more bandwidth than system
> memory and has better latency then PCIE. Hence for the whole thing to
> make sense you need to allow to use it.
>
> For that you need to allow migration from system memory to device memory.
> Because you can not rely on special userspace allocator you have to
> assume that the vma (vm_area_struct) is a regular one. So we are left
> with having struct page for the device memory to allow migration to
> work without requiring too much changes to existing mm.
>
> Because device memory is not accessible by the CPU, you can not allow
> anyone to pin it and thus get_user_page* must trigger a migration back
> as CPU page fault would.
>
>
>> > So what we looking for here is struct page that can behave mostly
>> > like anyother except that we do not want to allow GUP to take a
>> > reference almost exactly what ZONE_DEVICE already provide.
>> >
>> > So do you have any fundamental objections to this patchset ? And if
>> > so, how do you propose i solve the problem i am trying to address ?
>> > Because hardware exist today and without something like HMM we will
>> > not be able to support such hardware.
>>
>> My pushback stems from it being a completely different use case for
>> devm_memremap_pages(), as evidenced by it growing from 4 arguments to
>> 9, and the ongoing maintenance overhead of understanding HMM
>> requirements when updating the pmem usage of ZONE_DEVICE.
>
> I rather reuse something existing and modify it to support more use case
> than try to add ZONE_DEVICE2 or ZONE_DEVICE_I_AM_DIFFERENT. I have made
> sure that my modifications to ZONE_DEVICE can be use without HMM. It is
> just a generic interface to support page fault and to allow to track last
> user of a device page. Both can be use indepentently from each other.
>
> To me the whole point of kernel is trying to share infrastructure accross
> as many hardware as possible and i am doing just that. I do not think HMM
> should be block because something that use to be for one specific use case
> now support 2 use cases. I am not breaking anything existing. Is it more
> work for you ? Maybe, but at Red Hat we intend to support it for as long
> as it is needed so you always have some one to talk to if you want to
> update ZONE_DEVICE.

Sharing infrastructure should not come at the expense of type safety
and clear usage rules.

For example the pmem case, before exposing ZONE_DEVICE memory to other
parts of the kernel, introduced the pfn_t type to distinguish DMA
capable pfns from other raw pfns. All programmatic ways of discovering
if a pmem range can support DMA use this type and explicit flags.

While we may not need ZONE_DEVICE2 we obviously need a different
wrapper around arch_add_memory() than devm_memremap_pages() for HMM
and likely a different physical address radix than pgmap_radix because
they are servicing 2 distinct purposes. For example, I don't think HMM
should be using unmodified arch_add_memory(). We shouldn't add
unaddressable memory to the linear address mappings when we know there
is nothing behind it, especially when it seems all you need from
arch_add_memory() is pfn_to_page() to be valid.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-01-17  0:58 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12 16:30 [HMM v16 00/16] HMM (Heterogeneous Memory Management) v16 Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 01/15] mm/memory/hotplug: convert device bool to int to allow for more flags v2 Jérôme Glisse
2017-01-13 13:57   ` Balbir Singh
2017-01-13 14:45     ` Jerome Glisse
2017-01-12 16:30 ` [HMM v16 02/15] mm/ZONE_DEVICE/devmem_pages_remove: allow early removal of device memory v2 Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 03/15] mm/ZONE_DEVICE/free-page: callback when page is freed Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 04/15] mm/ZONE_DEVICE/unaddressable: add support for un-addressable device memory v2 Jérôme Glisse
2017-01-16  7:05   ` Dan Williams
2017-01-16 15:17     ` Jerome Glisse
2017-01-16 19:31       ` Dan Williams
2017-01-16 20:13         ` Jerome Glisse
2017-01-17  0:58           ` Dan Williams [this message]
2017-01-17  2:00             ` Jerome Glisse
2017-01-17  2:57               ` Dan Williams
2017-01-12 16:30 ` [HMM v16 05/15] mm/ZONE_DEVICE/x86: add support for un-addressable device memory Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 06/15] mm/hmm: heterogeneous memory management (HMM for short) Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 07/15] mm/hmm/mirror: mirror process address space on device with HMM helpers Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 08/15] mm/hmm/mirror: helper to snapshot CPU page table Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 09/15] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 10/15] mm/hmm/migrate: support un-addressable ZONE_DEVICE page in migration Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 11/15] mm/hmm/migrate: add new boolean copy flag to migratepage() callback Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 12/15] mm/hmm/migrate: new memory migration helper for use with device memory v2 Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 13/15] mm/hmm/migrate: optimize page map once in vma being migrated Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 14/15] mm/hmm/devmem: device driver helper to hotplug ZONE_DEVICE memory v2 Jérôme Glisse
2017-01-12 16:30 ` [HMM v16 15/15] mm/hmm/devmem: dummy HMM device as an helper for ZONE_DEVICE memory Jérôme Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4gLrykv-Dn9dKM-8kDVdYwtRU4XDXt+OndYAnrzP73U6g@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox