On Tue, Apr 16, 2013 at 3:03 AM, Simon Jeons <simon.jeons@gmail.com> wrote:

> Hi Jerome,
>
> On 02/08/2013 11:21 PM, Jerome Glisse wrote:
>
>> On Fri, Feb 8, 2013 at 6:18 AM, Shachar Raindel <raindel@mellanox.com>
>> wrote:
>>
>>> Hi,
>>>
>>> We would like to present a reference implementation for safely sharing
>>> memory pages from user space with the hardware, without pinning.
>>>
>>> We will be happy to hear the community feedback on our prototype
>>> implementation, and suggestions for future improvements.
>>>
>>> We would also like to discuss adding features to the core MM subsystem to
>>> assist hardware access to user memory without pinning.
>>>
>>> Following is a longer motivation and explanation on the technology
>>> presented:
>>>
>>> Many application developers would like to be able to be able to
>>> communicate
>>> directly with the hardware from the userspace.
>>>
>>> Use cases for that includes high performance networking API such as
>>> InfiniBand, RoCE and iWarp and interfacing with GPUs.
>>>
>>> Currently, if the user space application wants to share system memory
>>> with
>>> the hardware device, the kernel component must pin the memory pages in
>>> RAM,
>>> using get_user_pages.
>>>
>>> This is a hurdle, as it usually makes large portions the application
>>> memory
>>> unmovable. This pinning also makes the user space development model very
>>> complicated – one needs to register memory before using it for
>>> communication
>>> with the hardware.
>>>
>>> We use the mmu-notifiers [1] mechanism to inform the hardware when the
>>> mapping of a page is changed. If the hardware tries to access a page
>>> which
>>> is not yet mapped for the hardware, it requests a resolution for the page
>>> address from the kernel.
>>>
>>> This mechanism allows the hardware to access the entire address space of
>>> the
>>> user application, without pinning even a single page.
>>>
>>> We would like to use the LSF/MM forum opportunity to discuss open issues
>>> we
>>> have for further development, such as:
>>>
>>> -Allowing the hardware to perform page table walk, similar to
>>> get_user_pages_fast to resolve user pages that are already in RAM.
>>>
>>
> get_user_pages_fast just get page reference count instead of populate the
> pte to page table, correct? Then how can GPU driver use iommu to access the
> page?
>

As i said this is for pre-filling already present entry, ie pte that are
present with a valid page (no special bit set). This is an optimization so
that the GPU can pre-fill its tlb without having to take any mmap_sem. Hope
is that in most common case this will be enough, but in some case you will
have to go through the lengthy non fast gup.

Cheers,
Jerome