From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx165.postini.com [74.125.245.165]) by kanga.kvack.org (Postfix) with SMTP id C86226B0006 for ; Thu, 11 Apr 2013 23:21:18 -0400 (EDT) Received: by mail-qe0-f45.google.com with SMTP id 1so1315622qee.32 for ; Thu, 11 Apr 2013 20:21:17 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <51677BCA.2050002@gmail.com> References: <5114DF05.7070702@mellanox.com> <5164C6EE.7020502@gmail.com> <20130410205557.GB3958@gmail.com> <51662FFF.10103@gmail.com> <20130411184806.GB6696@gmail.com> <51677BCA.2050002@gmail.com> Date: Thu, 11 Apr 2013 23:21:17 -0400 Message-ID: Subject: Re: [LSF/MM TOPIC] Hardware initiated paging of user process pages, hardware access to the CPU page tables of user processes From: Jerome Glisse Content-Type: multipart/alternative; boundary=047d7b5db86c7453b104da216565 Sender: owner-linux-mm@kvack.org List-ID: To: Simon Jeons Cc: Michel Lespinasse , Shachar Raindel , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Andrea Arcangeli , Roland Dreier , Haggai Eran , Or Gerlitz , Sagi Grimberg , Liran Liss --047d7b5db86c7453b104da216565 Content-Type: text/plain; charset=ISO-8859-1 On Thu, Apr 11, 2013 at 11:13 PM, Simon Jeons wrote: > Hi Jerome, > > On 04/12/2013 02:48 AM, Jerome Glisse wrote: > >> On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote: >> >>> Hi Jerome, >>> On 04/11/2013 04:55 AM, Jerome Glisse wrote: >>> >>>> On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote: >>>> >>>>> Hi Jerome, >>>>> On 02/10/2013 12:29 AM, Jerome Glisse wrote: >>>>> >>>>>> On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse >>>>>> wrote: >>>>>> >>>>>>> On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel < >>>>>>> raindel@mellanox.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We would like to present a reference implementation for safely >>>>>>>> sharing >>>>>>>> memory pages from user space with the hardware, without pinning. >>>>>>>> >>>>>>>> We will be happy to hear the community feedback on our prototype >>>>>>>> implementation, and suggestions for future improvements. >>>>>>>> >>>>>>>> We would also like to discuss adding features to the core MM >>>>>>>> subsystem to >>>>>>>> assist hardware access to user memory without pinning. >>>>>>>> >>>>>>> This sounds kinda scary TBH; however I do understand the need for >>>>>>> such >>>>>>> technology. >>>>>>> >>>>>>> I think one issue is that many MM developers are insufficiently aware >>>>>>> of such developments; having a technology presentation would probably >>>>>>> help there; but traditionally LSF/MM sessions are more interactive >>>>>>> between developers who are already quite familiar with the >>>>>>> technology. >>>>>>> I think it would help if you could send in advance a detailed >>>>>>> presentation of the problem and the proposed solutions (and then what >>>>>>> they require of the MM layer) so people can be better prepared. >>>>>>> >>>>>>> And first I'd like to ask, aren't IOMMUs supposed to already largely >>>>>>> solve this problem ? (probably a dumb question, but that just tells >>>>>>> you how much you need to explain :) >>>>>>> >>>>>> For GPU the motivation is three fold. With the advance of GPU compute >>>>>> and also with newer graphic program we see a massive increase in GPU >>>>>> memory consumption. We easily can reach buffer that are bigger than >>>>>> 1gbytes. So the first motivation is to directly use the memory the >>>>>> user allocated through malloc in the GPU this avoid copying 1gbytes of >>>>>> data with the cpu to the gpu buffer. The second and mostly important >>>>>> to GPU compute is the use of GPU seamlessly with the CPU, in order to >>>>>> achieve this you want the programmer to have a single address space on >>>>>> the CPU and GPU. So that the same address point to the same object on >>>>>> GPU as on the CPU. This would also be a tremendous cleaner design from >>>>>> driver point of view toward memory management. >>>>>> >>>>> When GPU will comsume memory? >>>>> >>>>> The userspace process like mplayer will have video datas and GPU >>>>> will play this datas and use memory of mplayer since these video >>>>> datas load in mplayer process's address space? So GPU codes will >>>>> call gup to take a reference of memory? Please correct me if my >>>>> understanding is wrong. ;-) >>>>> >>>> First target is not thing such as video decompression, however they >>>> could >>>> too benefit from it given updated driver kernel API. In case of using >>>> iommu hardware page fault we don't call get_user_pages (gup) those we >>>> don't take a reference on the page. That's the whole point of the >>>> hardware >>>> pagefault, not taking reference on the page. >>>> >>> mplayer process is running on normal CPU or GPU? >>> chipset_integrated graphics will use normal memory and discrete >>> graphics will use its own memory, correct? So the memory used by >>> discrete graphics won't need gup, correct? >>> >> mplayer can decode video in software an only use the cpu. It can also use >> one of the accleration API such as VDPAU. In any case mplayer is still >> opening >> the video file allocating some memory with malloc, reading from file into >> this memory eventually do some preprocessing on that memory and then >> memcpy from this memory to memory allocated by the gpu driver. >> >> No imagine a world where you don't have to memcpy so that the gpu can >> access >> it. Even if it's doable today it's really not something you want todo, ie >> gup on page and not releasing page for minutes. >> >> There is two kind of integrated GPU, on x86 integrated GPU should be >> considered >> as discrete GPU because BIOS steal a chunk of system ram and transform it >> in >> fake vram. This stolen chunk is never ever under the control of the linux >> kernel >> (from mm pov the gpu kernel driver is in charge of it). >> > > I configure integrated GPU in BIOS during system boot, it's seems that we > can preallocate memory for integrated GPU, is this the memory you mentioned > ? Most likely it's > In any case both discrete GPU and integrated GPU have their own page table >> or >> > > Discrete GPU will not use normal memory even if their own memory is > exhaused, correct? > > They will consume normal memory, right now you can see that on heavy load hugue chunk of your system memory disappear, it's the gpu driver that is using it, it get mapped into gpu address space and from gpu unit pov it's just like any other memory (ie vram or sram looks the same to the gpu acceleration core, sram is just slower). Cheers Jerome --047d7b5db86c7453b104da216565 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Thu, Apr 11, 2013 at 11:13 PM, Simon Jeons <simon.jeons@gmail.com> wrote:
Hi Jerome,

On 04/12/2013 02:48 AM, Jerome Glisse wrote:
On Thu, Apr 11, 2013 at 11:37:35AM +0800, Simon Jeons wrote:
Hi Jerome,
On 04/11/2013 04:55 AM, Jerome Glisse wrote:
On Wed, Apr 10, 2013 at 09:57:02AM +0800, Simon Jeons wrote:
Hi Jerome,
On 02/10/2013 12:29 AM, Jerome Glisse wrote:
On Sat, Feb 9, 2013 at 1:05 AM, Michel Lespinasse <walken@google.com> wrote:
On Fri, Feb 8, 2013 at 3:18 AM, Shachar Raindel <raindel@mellanox.com> wrote:
Hi,

We would like to present a reference implementation for safely sharing
memory pages from user space with the hardware, without pinning.

We will be happy to hear the community feedback on our prototype
implementation, and suggestions for future improvements.

We would also like to discuss adding features to the core MM subsystem to assist hardware access to user memory without pinning.
This sounds kinda scary TBH; however I do understand the need for such
technology.

I think one issue is that many MM developers are insufficiently aware
of such developments; having a technology presentation would probably
help there; but traditionally LSF/MM sessions are more interactive
between developers who are already quite familiar with the technology.
I think it would help if you could send in advance a detailed
presentation of the problem and the proposed solutions (and then what
they require of the MM layer) so people can be better prepared.

And first I'd like to ask, aren't IOMMUs supposed to already largel= y
solve this problem ? (probably a dumb question, but that just tells
you how much you need to explain :)
For GPU the motivation is three fold. With the advance of GPU compute
and also with newer graphic program we see a massive increase in GPU
memory consumption. We easily can reach buffer that are bigger than
1gbytes. So the first motivation is to directly use the memory the
user allocated through malloc in the GPU this avoid copying 1gbytes of
data with the cpu to the gpu buffer. The second and mostly important
to GPU compute is the use of GPU seamlessly with the CPU, in order to
achieve this you want the programmer to have a single address space on
the CPU and GPU. So that the same address point to the same object on
GPU as on the CPU. This would also be a tremendous cleaner design from
driver point of view toward memory management.
When GPU will comsume memory?

The userspace process like mplayer will have video datas and GPU
will play this datas and use memory of mplayer since these video
datas load in mplayer process's address space? So GPU codes will
call gup to take a reference of memory? Please correct me if my
understanding is wrong. ;-)
First target is not thing such as video decompression, however they could too benefit from it given updated driver kernel API. In case of using
iommu hardware page fault we don't call get_user_pages (gup) those we don't take a reference on the page. That's the whole point of the h= ardware
pagefault, not taking reference on the page.
mplayer process is running on normal CPU or GPU?
chipset_integrated graphics will use normal memory and discrete
graphics will use its own memory, correct? So the memory used by
discrete graphics won't need gup, correct?
mplayer can decode video in software an only use the cpu. It can also use one of the accleration API such as VDPAU. In any case mplayer is still open= ing
the video file allocating some memory with malloc, reading from file into this memory eventually do some preprocessing on that memory and then
memcpy from this memory to memory allocated by the gpu driver.

No imagine a world where you don't have to memcpy so that the gpu can a= ccess
it. Even if it's doable today it's really not something you want to= do, ie
gup on page and not releasing page for minutes.

There is two kind of integrated GPU, on x86 integrated GPU should be consid= ered
as discrete GPU because BIOS steal a chunk of system ram and transform it i= n
fake vram. This stolen chunk is never ever under the control of the linux k= ernel
(from mm pov the gpu kernel driver is in charge of it).

I configure integrated GPU in BIOS during system boot, it's seems that = we can preallocate memory for integrated GPU, is this the memory you mentio= ned ?

Most likely it's
=A0
In any case both discrete GPU and integrated GPU have their own page table = or

Discrete GPU will not use normal memory even if their own memory is exhause= d, correct?


They will consume normal memory, right now you can see that o= n heavy load hugue chunk of your system memory disappear, it's the gpu = driver that is using it, it get mapped into gpu address space and from gpu = unit pov it's just like any other memory (ie vram or sram looks the sam= e to the gpu acceleration core, sram is just slower).

Cheers
Jerome
--047d7b5db86c7453b104da216565-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org