linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Figo.zhang" <figo1802@gmail.com>
To: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	David Nellans <dnellans@nvidia.com>
Subject: Re: [HMM 00/16] HMM (Heterogeneous Memory Management) v19
Date: Thu, 6 Apr 2017 11:22:12 +0800	[thread overview]
Message-ID: <CAF7GXvptCfV89rAi=j1cy1df12039GDpq_DHOyx+_xk0FjBDPg@mail.gmail.com> (raw)
In-Reply-To: <20170405204026.3940-1-jglisse@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6191 bytes --]

>
>
>
>
> Heterogeneous Memory Management (HMM) (description and justification)
>
> Today device driver expose dedicated memory allocation API through their
> device file, often relying on a combination of IOCTL and mmap calls. The
> device can only access and use memory allocated through this API. This
> effectively split the program address space into object allocated for the
> device and useable by the device and other regular memory (malloc, mmap
> of a file, share memory, …) only accessible by CPU (or in a very limited
> way by a device by pinning memory).
>
> Allowing different isolated component of a program to use a device thus
> require duplication of the input data structure using device memory
> allocator. This is reasonable for simple data structure (array, grid,
> image, …) but this get extremely complex with advance data structure
> (list, tree, graph, …) that rely on a web of memory pointers. This is
> becoming a serious limitation on the kind of work load that can be
> offloaded to device like GPU.
>

how handle it by current  GPU software stack? maintain a complex middle
firmwork/HAL?


>
> New industry standard like C++, OpenCL or CUDA are pushing to remove this
> barrier. This require a shared address space between GPU device and CPU so
> that GPU can access any memory of a process (while still obeying memory
> protection like read only).


GPU can access the whole process VMAs or any VMAs which backing system
memory has migrate to GPU page table?



> This kind of feature is also appearing in
> various other operating systems.
>
> HMM is a set of helpers to facilitate several aspects of address space
> sharing and device memory management. Unlike existing sharing mechanism
> that rely on pining pages use by a device, HMM relies on mmu_notifier to
> propagate CPU page table update to device page table.
>
> Duplicating CPU page table is only one aspect necessary for efficiently
> using device like GPU. GPU local memory have bandwidth in the TeraBytes/
> second range but they are connected to main memory through a system bus
> like PCIE that is limited to 32GigaBytes/second (PCIE 4.0 16x). Thus it
> is necessary to allow migration of process memory from main system memory
> to device memory. Issue is that on platform that only have PCIE the device
> memory is not accessible by the CPU with the same properties as main
> memory (cache coherency, atomic operations, …).
>
> To allow migration from main memory to device memory HMM provides a set
> of helper to hotplug device memory as a new type of ZONE_DEVICE memory
> which is un-addressable by CPU but still has struct page representing it.
> This allow most of the core kernel logic that deals with a process memory
> to stay oblivious of the peculiarity of device memory.
>
> When page backing an address of a process is migrated to device memory
> the CPU page table entry is set to a new specific swap entry. CPU access
> to such address triggers a migration back to system memory, just like if
> the page was swap on disk. HMM also blocks any one from pinning a
> ZONE_DEVICE page so that it can always be migrated back to system memory
> if CPU access it. Conversely HMM does not migrate to device memory any
> page that is pin in system memory.
>

the purpose of  migrate the system pages to device is that device can read
the system memory?
if the CPU/programs want read the device data, it need pin/mapping the
device memory to the process address space?
if multiple applications want to read the same device memory region
concurrently, how to do it?

it is better a graph to show how CPU and GPU share the address space.


>
> To allow efficient migration between device memory and main memory a new
> migrate_vma() helpers is added with this patchset. It allows to leverage
> device DMA engine to perform the copy operation.
>
> This feature will be use by upstream driver like nouveau mlx5 and probably
> other in the future (amdgpu is next suspect  in line). We are actively
> working on nouveau and mlx5 support. To test this patchset we also worked
> with NVidia close source driver team, they have more resources than us to
> test this kind of infrastructure and also a bigger and better userspace
> eco-system with various real industry workload they can be use to test and
> profile HMM.
>
> The expected workload is a program builds a data set on the CPU (from disk,
> from network, from sensors, …). Program uses GPU API (OpenCL, CUDA, ...)
> to give hint on memory placement for the input data and also for the output
> buffer. Program call GPU API to schedule a GPU job, this happens using
> device driver specific ioctl. All this is hidden from programmer point of
> view in case of C++ compiler that transparently offload some part of a
> program to GPU. Program can keep doing other stuff on the CPU while the
> GPU is crunching numbers.
>
> It is expected that CPU will not access the same data set as the GPU while
> GPU is working on it, but this is not mandatory. In fact we expect some
> small memory object to be actively access by both GPU and CPU concurrently
> as synchronization channel and/or for monitoring purposes. Such object will
> stay in system memory and should not be bottlenecked by system bus
> bandwidth (rare write and read access from both CPU and GPU).
>
> As we are relying on device driver API, HMM does not introduce any new
> syscall nor does it modify any existing ones. It does not change any POSIX
> semantics or behaviors. For instance the child after a fork of a process
> that is using HMM will not be impacted in anyway, nor is there any data
> hazard between child COW or parent COW of memory that was migrated to
> device prior to fork.
>
> HMM assume a numbers of hardware features. Device must allow device page
> table to be updated at any time (ie device job must be preemptable). Device
> page table must provides memory protection such as read only. Device must
> track write access (dirty bit). Device must have a minimum granularity that
> match PAGE_SIZE (ie 4k).
>
>
>
>

[-- Attachment #2: Type: text/html, Size: 7138 bytes --]

  parent reply	other threads:[~2017-04-06  3:22 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-05 20:40 Jérôme Glisse
2017-04-05 20:40 ` [HMM 01/16] mm/memory/hotplug: add memory type parameter to arch_add/remove_memory Jérôme Glisse
2017-04-06  9:45   ` Anshuman Khandual
2017-04-06 13:58     ` Jerome Glisse
2017-04-07 12:13   ` Michal Hocko
2017-04-07 14:32     ` Jerome Glisse
2017-04-07 14:45       ` Michal Hocko
2017-04-07 14:57         ` Jerome Glisse
2017-04-07 15:11           ` Michal Hocko
2017-04-07 16:10             ` Jerome Glisse
2017-04-07 16:37               ` Michal Hocko
2017-04-07 17:10                 ` Jerome Glisse
2017-04-07 17:59                   ` Michal Hocko
2017-04-07 18:27                     ` Jerome Glisse
2017-04-05 20:40 ` [HMM 02/16] mm/put_page: move ZONE_DEVICE page reference decrement v2 Jérôme Glisse
2017-04-05 20:40 ` [HMM 03/16] mm/unaddressable-memory: new type of ZONE_DEVICE for unaddressable memory Jérôme Glisse
2017-04-05 20:40 ` [HMM 04/16] mm/ZONE_DEVICE/x86: add support for un-addressable device memory Jérôme Glisse
2017-04-05 20:40 ` [HMM 05/16] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-04-05 20:40 ` [HMM 06/16] mm/migrate: new memory migration helper for use with device memory v4 Jérôme Glisse
2017-04-05 20:40 ` [HMM 07/16] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-04-05 20:40 ` [HMM 08/16] mm/hmm: heterogeneous memory management (HMM for short) Jérôme Glisse
2017-04-05 20:40 ` [HMM 09/16] mm/hmm/mirror: mirror process address space on device with HMM helpers Jérôme Glisse
2017-04-05 20:40 ` [HMM 10/16] mm/hmm/mirror: helper to snapshot CPU page table v2 Jérôme Glisse
2017-04-10  8:35   ` Michal Hocko
2017-04-10  8:43   ` Michal Hocko
2017-04-10 22:10     ` Andrew Morton
2017-04-11  1:33       ` Jerome Glisse
2017-04-11 20:33         ` Andrew Morton
2017-04-05 20:40 ` [HMM 11/16] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-04-05 20:40 ` [HMM 12/16] mm/migrate: support un-addressable ZONE_DEVICE page in migration Jérôme Glisse
2017-04-05 20:40 ` [HMM 13/16] mm/migrate: allow migrate_vma() to alloc new page on empty entry Jérôme Glisse
2017-04-05 20:40 ` [HMM 14/16] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE Jérôme Glisse
2017-04-06 21:22   ` Jerome Glisse
2017-04-07  1:37   ` Balbir Singh
2017-04-07  2:02     ` Jerome Glisse
2017-04-07 16:26       ` Jerome Glisse
2017-04-10  4:31         ` Balbir Singh
2017-04-05 20:40 ` [HMM 15/16] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v2 Jérôme Glisse
2017-04-05 20:40 ` [HMM 16/16] hmm: heterogeneous memory management documentation Jérôme Glisse
2017-04-06  3:22 ` Figo.zhang [this message]
2017-04-06  4:59   ` [HMM 00/16] HMM (Heterogeneous Memory Management) v19 Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF7GXvptCfV89rAi=j1cy1df12039GDpq_DHOyx+_xk0FjBDPg@mail.gmail.com' \
    --to=figo1802@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dnellans@nvidia.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox