Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <j.glisse@gmail.com>
To: Simon Jeons <simon.jeons@gmail.com>
Cc: Michel Lespinasse <walken@google.com>,
	Shachar Raindel <raindel@mellanox.com>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Roland Dreier <roland@purestorage.com>,
	Haggai Eran <haggaie@mellanox.com>,
	Or Gerlitz <ogerlitz@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>,
	Liran Liss <liranl@mellanox.com>
Subject: Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes
Date: Fri, 12 Apr 2013 09:32:46 -0400	[thread overview]
Message-ID: <CAH3drwaC8TgSGfhPxAe-6VVCMPquJ0dG+Fu_6RDr7B_tH91-Hg@mail.gmail.com> (raw)
In-Reply-To: <51679F46.7030901@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6419 bytes --]

On Fri, Apr 12, 2013 at 1:44 AM, Simon Jeons <simon.jeons@gmail.com> wrote:

>  Hi Jerome,
>
> On 04/12/2013 10:57 AM, Jerome Glisse wrote:
>
> On Thu, Apr 11, 2013 at 9:54 PM, Simon Jeons <simon.jeons@gmail.com>wrote:
>
>> Hi Jerome,
>>
>> On 04/12/2013 02:38 AM, Jerome Glisse wrote:
>>
>>> On Thu, Apr 11, 2013 at 11:42:05AM +0800, Simon Jeons wrote:
>>>
>>>> Hi Jerome,
>>>> On 04/11/2013 04:45 AM, Jerome Glisse wrote:
>>>>
>>>>> On Wed, Apr 10, 2013 at 09:41:57AM +0800, Simon Jeons wrote:
>>>>>
>>>>>> Hi Jerome,
>>>>>> On 04/09/2013 10:21 PM, Jerome Glisse wrote:
>>>>>>
>>>>>>> On Tue, Apr 09, 2013 at 04:28:09PM +0800, Simon Jeons wrote:
>>>>>>>
>>>>>>>> Hi Jerome,
>>>>>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote:
>>>>>>>>
>>>>>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <
>>>>>>>>> walken@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <
>>>>>>>>>> raindel@mellanox.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> We would like to present a reference implementation for safely
>>>>>>>>>>> sharing
>>>>>>>>>>> memory pages from user space with the hardware, without pinning.
>>>>>>>>>>>
>>>>>>>>>>> We will be happy to hear the community feedback on our prototype
>>>>>>>>>>> implementation, and suggestions for future improvements.
>>>>>>>>>>>
>>>>>>>>>>> We would also like to discuss adding features to the core MM
>>>>>>>>>>> subsystem to
>>>>>>>>>>> assist hardware access to user memory without pinning.
>>>>>>>>>>>
>>>>>>>>>> This sounds kinda scary TBH; however I do understand the need for
>>>>>>>>>> such
>>>>>>>>>> technology.
>>>>>>>>>>
>>>>>>>>>> I think one issue is that many MM developers are insufficiently
>>>>>>>>>> aware
>>>>>>>>>> of such developments; having a technology presentation would
>>>>>>>>>> probably
>>>>>>>>>> help there; but traditionally LSF/MM sessions are more interactive
>>>>>>>>>> between developers who are already quite familiar with the
>>>>>>>>>> technology.
>>>>>>>>>> I think it would help if you could send in advance a detailed
>>>>>>>>>> presentation of the problem and the proposed solutions (and then
>>>>>>>>>> what
>>>>>>>>>> they require of the MM layer) so people can be better prepared.
>>>>>>>>>>
>>>>>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already
>>>>>>>>>> largely
>>>>>>>>>> solve this problem ? (probably a dumb question, but that just
>>>>>>>>>> tells
>>>>>>>>>> you how much you need to explain :)
>>>>>>>>>>
>>>>>>>>> For GPU the motivation is three fold. With the advance of GPU
>>>>>>>>> compute
>>>>>>>>> and also with newer graphic program we see a massive increase in
>>>>>>>>> GPU
>>>>>>>>> memory consumption. We easily can reach buffer that are bigger than
>>>>>>>>> 1gbytes. So the first motivation is to directly use the memory the
>>>>>>>>> user allocated through malloc in the GPU this avoid copying
>>>>>>>>> 1gbytes of
>>>>>>>>> data with the cpu to the gpu buffer. The second and mostly
>>>>>>>>> important
>>>>>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order
>>>>>>>>> to
>>>>>>>>> achieve this you want the programmer to have a single address
>>>>>>>>> space on
>>>>>>>>> the CPU and GPU. So that the same address point to the same object
>>>>>>>>> on
>>>>>>>>> GPU as on the CPU. This would also be a tremendous cleaner design
>>>>>>>>> from
>>>>>>>>> driver point of view toward memory management.
>>>>>>>>>
>>>>>>>>> And last, the most important, with such big buffer (>1gbytes) the
>>>>>>>>> memory pinning is becoming way to expensive and also drastically
>>>>>>>>> reduce the freedom of the mm to free page for other process. Most
>>>>>>>>> of
>>>>>>>>> the time a small window (every thing is relative the window can be
>>>>>>>>> >
>>>>>>>>> 100mbytes not so small :)) of the object will be in use by the
>>>>>>>>> hardware. The hardware pagefault support would avoid the necessity
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> What's the meaning of hardware pagefault?
>>>>>>>>
>>>>>>> It's a PCIE extension (well it's a combination of extension that
>>>>>>> allow
>>>>>>> that see http://www.pcisig.com/specifications/iov/ats/). Idea is
>>>>>>> that the
>>>>>>> iommu can trigger a regular pagefault inside a process address space
>>>>>>> on
>>>>>>> behalf of the hardware. The only iommu supporting that right now is
>>>>>>> the
>>>>>>> AMD iommu v2 that you find on recent AMD platform.
>>>>>>>
>>>>>> Why need hardware page fault? regular page fault is trigger by cpu
>>>>>> mmu, correct?
>>>>>>
>>>>> Well here i abuse regular page fault term. Idea is that with hardware
>>>>> page
>>>>> fault you don't need to pin memory or take reference on page for
>>>>> hardware to
>>>>> use it. So that kernel can free as usual page that would otherwise
>>>>> have been
>>>>>
>>>> For the case when GPU need to pin memory, why GPU need grap the
>>>> memory of normal process instead of allocating for itself?
>>>>
>>> Pin memory is today world where gpu allocate its own memory (GB of
>>> memory)
>>> that disappear from kernel control ie kernel can no longer reclaim this
>>> memory it's lost memory (i had complain about that already from user than
>>> saw GB of memory vanish and couldn't understand why the GPU was using so
>>> much).
>>>
>>> Tomorrow world we want gpu to be able to access memory that the
>>> application
>>> allocated through a simple malloc and we want the kernel to be able to
>>> recycly any page at any time because of memory pressure or because kernel
>>> decide to do so.
>>>
>>> That's just what we want to do. To achieve so we are getting hw that can
>>> do
>>> pagefault. No change to kernel core mm code (some improvement might be
>>> made).
>>>
>>
>>  The memory disappear since you have a reference(gup) against it,
>> correct? Tomorrow world you want the page fault trigger through iommu
>> driver that call get_user_pages, it also will take a reference(since gup is
>> called), isn't it? Anyway, assume tomorrow world doesn't take a reference,
>> we don't need care page which used by GPU is reclaimed?
>>
>>
> Right now code use gup because it's convenient but it drop the reference
> right after the fault. So reference is hold only for short period of time.
>
>
> Are you sure gup will drop the reference right after the fault? I redig
> the codes and fail verify it. Could you point out to me?
>
>
In amd_iommu_v2.c:do_fault get_user_pages followed by put_page


Cheers,
Jerome

[-- Attachment #2: Type: text/html, Size: 12109 bytes --]

next prev parent reply	other threads:[~2013-04-12 13:32 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 11:18 Shachar Raindel
2013-02-08 15:21 ` Jerome Glisse
2013-04-16  7:03   ` Simon Jeons
2013-04-16 16:27     ` Jerome Glisse
2013-04-16 23:50       ` Simon Jeons
2013-04-17 14:01         ` Jerome Glisse
2013-04-17 23:48           ` Simon Jeons
2013-04-18  1:02             ` Jerome Glisse
2013-02-09  6:05 ` Michel Lespinasse
2013-02-09 16:29   ` Jerome Glisse
2013-04-09  8:28     ` Simon Jeons
2013-04-09 14:21       ` Jerome Glisse
2013-04-10  1:41         ` Simon Jeons
2013-04-10 20:45           ` Jerome Glisse
2013-04-11  3:42             ` Simon Jeons
2013-04-11 18:38               ` Jerome Glisse
2013-04-12  1:54                 ` Simon Jeons
2013-04-12  2:11                   ` [Lsf-pc] " Rik van Riel
2013-04-12  2:57                   ` Jerome Glisse
2013-04-12  5:44                     ` Simon Jeons
2013-04-12 13:32                       ` Jerome Glisse [this message]
2013-04-10  1:57     ` Simon Jeons
2013-04-10 20:55       ` Jerome Glisse
2013-04-11  3:37         ` Simon Jeons
2013-04-11 18:48           ` Jerome Glisse
2013-04-12  3:13             ` Simon Jeons
2013-04-12  3:21               ` Jerome Glisse
2013-04-15  8:39     ` Simon Jeons
2013-04-15 15:38       ` Jerome Glisse
2013-04-16  4:20         ` Simon Jeons
2013-04-16 16:19           ` Jerome Glisse
2013-02-10  7:54   ` Shachar Raindel
2013-04-09  8:17 ` Simon Jeons
2013-04-10  1:48   ` Simon Jeons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAH3drwaC8TgSGfhPxAe-6VVCMPquJ0dG+Fu_6RDr7B_tH91-Hg@mail.gmail.com \
    --to=j.glisse@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=haggaie@mellanox.com \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=ogerlitz@mellanox.com \
    --cc=raindel@mellanox.com \
    --cc=roland@purestorage.com \
    --cc=sagig@mellanox.com \
    --cc=simon.jeons@gmail.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox