[PATCH 0/3]HTLB mapping for drivers (take 2)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3]HTLB mapping for drivers (take 2)
@ 2009-08-17 22:24 Alexey Korolev
  2009-08-18 10:29 ` Eric Munson
  0 siblings, 1 reply; 15+ messages in thread
From: Alexey Korolev @ 2009-08-17 22:24 UTC (permalink / raw)
  To: Mel Gorman, linux-mm, linux-kernel

Hi,

The patch set listed below provides device drivers with the ability to
map memory regions to user space via HTLB interfaces.

Why we need it? 
Device drivers often need to map memory regions to user-space to allow
efficient data handling in user mode. Involving hugetlb mapping may
bring performance gain if mapped regions are relatively large. Our tests
showed that it is possible to gain up to 7% performance if hugetlb
mapping is enabled. In my case involving hugetlb starts to make sense if
buffer is more or equal to 4MB. Since typically, device throughput
increase over time there are more and more reasons to involve huge pages
to remap large regions.
For example hugetlb remapping could be important for performance of Data
acquisition systems (logic analyzers, DSO), Network monitoring systems
(packet capture), HD video capture/frame buffer  and probably other. 

How it is implemented?
Implementation and idea is very close to what is already done in
ipc/shm.c. 
We create file on hugetlbfs vfsmount point and populate file with pages
we want to mmap. Then we associate hugetlbfs file mapping with file
mapping we want to access. 

So typical procedure for mapping of huge pages to userspace by drivers
should be:
1 Allocate some huge pages
2 Create file on vfs mount of hugetlbfs
3 Add pages to page cache of mapping associated with hugetlbfs file 
4 Replace file's mapping with the hugetlbfs file mapping
..............
5 Remove pages from page cache
6 Remove hugetlbfs file
7 Free pages
(Please find example in following messages)

Detailed description is given in the following messages.
Thanks a lot to Mel Gorman who gave good advice and code prototype and
Stephen Donnelly for assistance in description composing.

Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-17 22:24 [PATCH 0/3]HTLB mapping for drivers (take 2) Alexey Korolev
@ 2009-08-18 10:29 ` Eric Munson
  2009-08-19  5:48   ` Alexey Korolev
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Munson @ 2009-08-18 10:29 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: Mel Gorman, linux-mm, linux-kernel

On Mon, Aug 17, 2009 at 11:24 PM, Alexey Korolev<akorolev@infradead.org> wrote:
> Hi,
>
> The patch set listed below provides device drivers with the ability to
> map memory regions to user space via HTLB interfaces.
>
> Why we need it?
> Device drivers often need to map memory regions to user-space to allow
> efficient data handling in user mode. Involving hugetlb mapping may
> bring performance gain if mapped regions are relatively large. Our tests
> showed that it is possible to gain up to 7% performance if hugetlb
> mapping is enabled. In my case involving hugetlb starts to make sense if
> buffer is more or equal to 4MB. Since typically, device throughput
> increase over time there are more and more reasons to involve huge pages
> to remap large regions.
> For example hugetlb remapping could be important for performance of Data
> acquisition systems (logic analyzers, DSO), Network monitoring systems
> (packet capture), HD video capture/frame buffer  and probably other.
>
> How it is implemented?
> Implementation and idea is very close to what is already done in
> ipc/shm.c.
> We create file on hugetlbfs vfsmount point and populate file with pages
> we want to mmap. Then we associate hugetlbfs file mapping with file
> mapping we want to access.
>
> So typical procedure for mapping of huge pages to userspace by drivers
> should be:
> 1 Allocate some huge pages
> 2 Create file on vfs mount of hugetlbfs
> 3 Add pages to page cache of mapping associated with hugetlbfs file
> 4 Replace file's mapping with the hugetlbfs file mapping
> ..............
> 5 Remove pages from page cache
> 6 Remove hugetlbfs file
> 7 Free pages
> (Please find example in following messages)
>
> Detailed description is given in the following messages.
> Thanks a lot to Mel Gorman who gave good advice and code prototype and
> Stephen Donnelly for assistance in description composing.
>
> Alexey

It sounds like this patch set working towards the same goal as my
MAP_HUGETLB set.  The only difference I see is you allocate huge page
at a time and (if I am understanding the patch) fault the page in
immediately, where MAP_HUGETLB only faults pages as needed.  Does the
MAP_HUGETLB patch set provide the functionality that you need, and if
not, what can be done to provide what you need?

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-18 10:29 ` Eric Munson
@ 2009-08-19  5:48   ` Alexey Korolev
  2009-08-19 10:05     ` Mel Gorman
  0 siblings, 1 reply; 15+ messages in thread
From: Alexey Korolev @ 2009-08-19  5:48 UTC (permalink / raw)
  To: Eric Munson; +Cc: Alexey Korolev, Mel Gorman, linux-mm, linux-kernel

Hi,
>
> It sounds like this patch set working towards the same goal as my
> MAP_HUGETLB set.  The only difference I see is you allocate huge page
> at a time and (if I am understanding the patch) fault the page in
> immediately, where MAP_HUGETLB only faults pages as needed.  Does the
> MAP_HUGETLB patch set provide the functionality that you need, and if
> not, what can be done to provide what you need?
>
> Eric
>
Thanks a lot for willing to help. I'll be much appreciate if you have
an interesting idea how HTLB mapping for drivers can be done.

It is better to describe use case in order to make it clear what needs
to be done.
Driver provides mapping of device DMA buffers to user level
applications. User level applications process the data.
Device is using a master DMA to send data to the user buffer, buffer
size can be >1GB and performance is very important. (So huge pages
mapping really makes sense.)

In addition we have to mention that:
1. It is hard for user to tell how much huge pages needs to be
reserved by the driver.
2. Devices add constrains on memory regions. For example it needs to
be contiguous with in the physical address space. It is necessary to
have ability to specify special gfp flags.
3 The HW needs to access physical memory before the user level
software can access it. (Hugetlbfs picks up pages on page fault from
pool).
It means memory allocation needs to be driven by device driver.

Original idea was: create hugetlbfs file which has common mapping with
device file. Allocate memory. Populate page cache of hugetlbfs file
with allocated pages.
When fault occurs, page will be taken from page cache and then
remapped to user space by hugetlbfs.

Another possible approach is described here:
http://marc.info/?l=linux-mm&m=125065257431410&w=2
But currently not sure  will it work or not.

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-19  5:48   ` Alexey Korolev
@ 2009-08-19 10:05     ` Mel Gorman
  2009-08-19 10:35       ` Eric B Munson
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mel Gorman @ 2009-08-19 10:05 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Wed, Aug 19, 2009 at 05:48:11PM +1200, Alexey Korolev wrote:
> Hi,
> >
> > It sounds like this patch set working towards the same goal as my
> > MAP_HUGETLB set.  The only difference I see is you allocate huge page
> > at a time and (if I am understanding the patch) fault the page in
> > immediately, where MAP_HUGETLB only faults pages as needed.  Does the
> > MAP_HUGETLB patch set provide the functionality that you need, and if
> > not, what can be done to provide what you need?
> >
>
> Thanks a lot for willing to help. I'll be much appreciate if you have
> an interesting idea how HTLB mapping for drivers can be done.
> 
> It is better to describe use case in order to make it clear what needs
> to be done.
> Driver provides mapping of device DMA buffers to user level
> applications.

Ok, so the buffer is in normal memory. When mmap() is called, the buffer
is already populated by data DMA'd from the device. That scenario rules out
calling mmap(MAP_ANONYMOUS|MAP_HUGETLB) because userspace has access to the
buffer before it is populated by data from the device.

However, it does not rule out mmap(MAP_ANONYMOUS|MAP_HUGETLB) when userspace
is responsible for populating a buffer for sending to a device. i.e. whether it
is suitable or not depends on when the buffer is populated and who is doing it.

> User level applications process the data.
> Device is using a master DMA to send data to the user buffer, buffer
> size can be >1GB and performance is very important. (So huge pages
> mapping really makes sense.)
> 

Ok, so the DMA may be faster because you have to do less scatter/gather
and can DMA in larger chunks and and reading from userspace may be faster
because there is less translation overhead. Right?

> In addition we have to mention that:
> 1. It is hard for user to tell how much huge pages needs to be
>    reserved by the driver.

I think you have this problem either way. If the buffer is allocated and
populated before mmap(), then the driver is going to have to guess how many
pages it needs. If the DMA occurs as a result of mmap(), it's easier because
you know the number of huge pages to be reserved at that point and you have
the option of falling back to small pages if necessary.

> 2. Devices add constrains on memory regions. For example it needs to
>    be contiguous with in the physical address space. It is necessary to
>   have ability to specify special gfp flags.

The contiguity constraints are the same for huge pages. Do you mean there
are zone restrictions? If so, the hugetlbfs_file_setup() function could be
extended to specify a GFP mask that is used for the allocation of hugepages
and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask
mask that is applied to some additional flags so htlb_alloc_mask would be
the default mask unless otherwise specified.

> 3 The HW needs to access physical memory before the user level
> software can access it. (Hugetlbfs picks up pages on page fault from
> pool).
> It means memory allocation needs to be driven by device driver.
> 

How about;

	o Extend Eric's helper slightly to take a GFP mask that is
	  associated with the inode and used for allocations from
	  outside the hugepage pool
	o A helper that returns the page at a given offset within
	  a hugetlbfs file for population before the page has been
	  faulted.

I know this is a bit hand-wavy, but it would allow significant sharing
of the existing code and remove much of the hugetlbfs-awareness from
your current driver.

> Original idea was: create hugetlbfs file which has common mapping with
> device file. Allocate memory. Populate page cache of hugetlbfs file
> with allocated pages.
> When fault occurs, page will be taken from page cache and then
> remapped to user space by hugetlbfs.
> 
> Another possible approach is described here:
> http://marc.info/?l=linux-mm&m=125065257431410&w=2
> But currently not sure  will it work or not.
> 
> 
> Thanks,
> Alexey
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-19 10:05     ` Mel Gorman
@ 2009-08-19 10:35       ` Eric B Munson
  2009-08-20  7:03       ` Alexey Korolev
  2009-08-24  6:14       ` Alexey Korolev
  2 siblings, 0 replies; 15+ messages in thread
From: Eric B Munson @ 2009-08-19 10:35 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Alexey Korolev, Alexey Korolev, linux-mm, linux-kernel

On Wed, Aug 19, 2009 at 11:05 AM, Mel Gorman<mel@csn.ul.ie> wrote:
> On Wed, Aug 19, 2009 at 05:48:11PM +1200, Alexey Korolev wrote:
>> Hi,
>> >
>> > It sounds like this patch set working towards the same goal as my
>> > MAP_HUGETLB set.  The only difference I see is you allocate huge page
>> > at a time and (if I am understanding the patch) fault the page in
>> > immediately, where MAP_HUGETLB only faults pages as needed.  Does the
>> > MAP_HUGETLB patch set provide the functionality that you need, and if
>> > not, what can be done to provide what you need?
>> >
>>
>> Thanks a lot for willing to help. I'll be much appreciate if you have
>> an interesting idea how HTLB mapping for drivers can be done.
>>
>> It is better to describe use case in order to make it clear what needs
>> to be done.
>> Driver provides mapping of device DMA buffers to user level
>> applications.
>
> Ok, so the buffer is in normal memory. When mmap() is called, the buffer
> is already populated by data DMA'd from the device. That scenario rules out
> calling mmap(MAP_ANONYMOUS|MAP_HUGETLB) because userspace has access to the
> buffer before it is populated by data from the device.
>
> However, it does not rule out mmap(MAP_ANONYMOUS|MAP_HUGETLB) when userspace
> is responsible for populating a buffer for sending to a device. i.e. whether it
> is suitable or not depends on when the buffer is populated and who is doing it.
>
>> User level applications process the data.
>> Device is using a master DMA to send data to the user buffer, buffer
>> size can be >1GB and performance is very important. (So huge pages
>> mapping really makes sense.)
>>
>
> Ok, so the DMA may be faster because you have to do less scatter/gather
> and can DMA in larger chunks and and reading from userspace may be faster
> because there is less translation overhead. Right?
>
>> In addition we have to mention that:
>> 1. It is hard for user to tell how much huge pages needs to be
>>    reserved by the driver.
>
> I think you have this problem either way. If the buffer is allocated and
> populated before mmap(), then the driver is going to have to guess how many
> pages it needs. If the DMA occurs as a result of mmap(), it's easier because
> you know the number of huge pages to be reserved at that point and you have
> the option of falling back to small pages if necessary.
>
>> 2. Devices add constrains on memory regions. For example it needs to
>>    be contiguous with in the physical address space. It is necessary to
>>   have ability to specify special gfp flags.
>
> The contiguity constraints are the same for huge pages. Do you mean there
> are zone restrictions? If so, the hugetlbfs_file_setup() function could be
> extended to specify a GFP mask that is used for the allocation of hugepages
> and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask
> mask that is applied to some additional flags so htlb_alloc_mask would be
> the default mask unless otherwise specified.
>
>> 3 The HW needs to access physical memory before the user level
>> software can access it. (Hugetlbfs picks up pages on page fault from
>> pool).
>> It means memory allocation needs to be driven by device driver.
>>
>
> How about;
>
>        o Extend Eric's helper slightly to take a GFP mask that is
>          associated with the inode and used for allocations from
>          outside the hugepage pool
>        o A helper that returns the page at a given offset within
>          a hugetlbfs file for population before the page has been
>          faulted.
>
> I know this is a bit hand-wavy, but it would allow significant sharing
> of the existing code and remove much of the hugetlbfs-awareness from
> your current driver.
>
>> Original idea was: create hugetlbfs file which has common mapping with
>> device file. Allocate memory. Populate page cache of hugetlbfs file
>> with allocated pages.
>> When fault occurs, page will be taken from page cache and then
>> remapped to user space by hugetlbfs.
>>
>> Another possible approach is described here:
>> http://marc.info/?l=linux-mm&m=125065257431410&w=2
>> But currently not sure  will it work or not.
>>
>>
>> Thanks,
>> Alexey
>>
>
> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
>

Alexey,

I'd be willing to take a stab at a prototype of Mel's suggestion based
on my patch set if you this it would be useful to you.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-19 10:05     ` Mel Gorman
  2009-08-19 10:35       ` Eric B Munson
@ 2009-08-20  7:03       ` Alexey Korolev
  2009-08-25 10:47         ` Mel Gorman
  2009-08-24  6:14       ` Alexey Korolev
  2 siblings, 1 reply; 15+ messages in thread
From: Alexey Korolev @ 2009-08-20  7:03 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel

Mel,

>> User level applications process the data.
>> Device is using a master DMA to send data to the user buffer, buffer
>> size can be >1GB and performance is very important. (So huge pages
>> mapping really makes sense.)
>>
>
> Ok, so the DMA may be faster because you have to do less scatter/gather
> and can DMA in larger chunks and and reading from userspace may be faster
> because there is less translation overhead. Right?
>
Less translation overhead is important. Unfortunately not all devices
have scatter/gather
(our case) as having it increase h/w complexity a lot.

>> In addition we have to mention that:
>> 1. It is hard for user to tell how much huge pages needs to be
>>    reserved by the driver.
>
> I think you have this problem either way. If the buffer is allocated and
> populated before mmap(), then the driver is going to have to guess how many
> pages it needs. If the DMA occurs as a result of mmap(), it's easier because
> you know the number of huge pages to be reserved at that point and you have
> the option of falling back to small pages if necessary.
>
>> 2. Devices add constrains on memory regions. For example it needs to
>>    be contiguous with in the physical address space. It is necessary to
>>   have ability to specify special gfp flags.
>
> The contiguity constraints are the same for huge pages. Do you mean there
> are zone restrictions? If so, the hugetlbfs_file_setup() function could be
> extended to specify a GFP mask that is used for the allocation of hugepages
> and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask
> mask that is applied to some additional flags so htlb_alloc_mask would be
> the default mask unless otherwise specified.
>
Under contiguous I mean that we need several huge pages being
physically contiguous.
To obtain it we allocate pages till not find a contig. region
(success) or reach a boundary (fail).
So in our particular case approach based on getting pages from
hugetlbfs won't work
because memory region will not be contiguous.
However this approach will give an easy way to support hugetlb
mapping, it won't cause any complexity
in accounting. But it will be suitable for hardware with large amount
of sg regions only.

>
> How about;
>
>        o Extend Eric's helper slightly to take a GFP mask that is
>          associated with the inode and used for allocations from
>          outside the hugepage pool
>        o A helper that returns the page at a given offset within
>          a hugetlbfs file for population before the page has been
>          faulted.
Do you mean get_user_pages call?

Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-20  7:03       ` Alexey Korolev
@ 2009-08-25 10:47         ` Mel Gorman
  2009-08-25 11:00           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2009-08-25 10:47 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Thu, Aug 20, 2009 at 07:03:28PM +1200, Alexey Korolev wrote:
> Mel,
> 
> >> User level applications process the data.
> >> Device is using a master DMA to send data to the user buffer, buffer
> >> size can be >1GB and performance is very important. (So huge pages
> >> mapping really makes sense.)
> >>
> >
> > Ok, so the DMA may be faster because you have to do less scatter/gather
> > and can DMA in larger chunks and and reading from userspace may be faster
> > because there is less translation overhead. Right?
> >
> Less translation overhead is important. Unfortunately not all devices
> have scatter/gather
> (our case) as having it increase h/w complexity a lot.
> 

Ok.

> >> In addition we have to mention that:
> >> 1. It is hard for user to tell how much huge pages needs to be
> >>    reserved by the driver.
> >
> > I think you have this problem either way. If the buffer is allocated and
> > populated before mmap(), then the driver is going to have to guess how many
> > pages it needs. If the DMA occurs as a result of mmap(), it's easier because
> > you know the number of huge pages to be reserved at that point and you have
> > the option of falling back to small pages if necessary.
> >
> >> 2. Devices add constrains on memory regions. For example it needs to
> >>    be contiguous with in the physical address space. It is necessary to
> >>   have ability to specify special gfp flags.
> >
> > The contiguity constraints are the same for huge pages. Do you mean there
> > are zone restrictions? If so, the hugetlbfs_file_setup() function could be
> > extended to specify a GFP mask that is used for the allocation of hugepages
> > and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask
> > mask that is applied to some additional flags so htlb_alloc_mask would be
> > the default mask unless otherwise specified.
> >
> Under contiguous I mean that we need several huge pages being
> physically contiguous.

Why? One hugepage of default size will be one TLB entry. Each hugepage
after that will be additional TLB entries so there is no savings on
translation overhead.

Getting contiguous pages beyond the hugepage boundary is not a matter
for GFP flags.

> To obtain it we allocate pages till not find a contig. region
> (success) or reach a boundary (fail).
> So in our particular case approach based on getting pages from
> hugetlbfs won't work
> because memory region will not be contiguous.

With a direct allocation of hugepages, there is no guarantee they will
be contiguous either. If you need contiguity above hugepages (which in
many cases will also be the largest page the buddy allocator can grant),
you need something else.

> However this approach will give an easy way to support hugetlb
> mapping, it won't cause any complexity
> in accounting. But it will be suitable for hardware with large amount
> of sg regions only.
> 
> >
> > How about;
> >
> >        o Extend Eric's helper slightly to take a GFP mask that is
> >          associated with the inode and used for allocations from
> >          outside the hugepage pool
> >        o A helper that returns the page at a given offset within
> >          a hugetlbfs file for population before the page has been
> >          faulted.
>
> Do you mean get_user_pages call?
> 

If you're willing to call it directly, sure.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-25 10:47         ` Mel Gorman
@ 2009-08-25 11:00           ` Benjamin Herrenschmidt
  2009-08-25 11:10             ` Mel Gorman
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2009-08-25 11:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote:

> Why? One hugepage of default size will be one TLB entry. Each hugepage
> after that will be additional TLB entries so there is no savings on
> translation overhead.
> 
> Getting contiguous pages beyond the hugepage boundary is not a matter
> for GFP flags.

Note: This patch reminds me of something else I had on the backburner
for a while and never got a chance to actually implement...

There's various cases of drivers that could have good uses of hugetlb
mappings of device memory. For example, framebuffers.

I looked at it a while back and it occured to me (and Nick) that
ideally, we should split hugetlb and hugetlbfs.

Basically, on one side, we have the (mostly arch specific) populating
and walking of page tables with hugetlb translations, associated huge
VMAs, etc... 

On the other side, hugetlbfs is backing that with memory.

Ideally, the former would have some kind of "standard" ops that
hugetlbfs can hook into for the existing case (moving some stuff out of
the common data structure and splitting it in two), allowing the driver
to instanciate hugetlb VMAs that are backed up by something else,
typically a simple mapping of IOs.

Anybody wants to do that or I keep it on my back burner until the time I
finally get to do it ? :-)

Cheers,
Ben.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-25 11:00           ` Benjamin Herrenschmidt
@ 2009-08-25 11:10             ` Mel Gorman
  2009-08-26  9:58               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2009-08-25 11:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Tue, Aug 25, 2009 at 09:00:54PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote:
> 
> > Why? One hugepage of default size will be one TLB entry. Each hugepage
> > after that will be additional TLB entries so there is no savings on
> > translation overhead.
> > 
> > Getting contiguous pages beyond the hugepage boundary is not a matter
> > for GFP flags.
> 
> Note: This patch reminds me of something else I had on the backburner
> for a while and never got a chance to actually implement...
> 
> There's various cases of drivers that could have good uses of hugetlb
> mappings of device memory. For example, framebuffers.
> 

Where is the buffer located? If it's in kernel space, than any contiguous
allocation will be automatically backed by huge PTEs. As framebuffer allocation
is probably happening early in boot, just calling alloc_pages() might do?

> I looked at it a while back and it occured to me (and Nick) that
> ideally, we should split hugetlb and hugetlbfs.
> 

Yeah, you're not the first to come to that conclusion :)

> Basically, on one side, we have the (mostly arch specific) populating
> and walking of page tables with hugetlb translations, associated huge
> VMAs, etc... 
> 
> On the other side, hugetlbfs is backing that with memory.
> 
> Ideally, the former would have some kind of "standard" ops that
> hugetlbfs can hook into for the existing case (moving some stuff out of
> the common data structure and splitting it in two),

Adam Litke at one point posted a pagetable-abstraction that would have
been the first step on a path like this. It hurt the normal fastpath
though and was ultimately put aside.

> allowing the driver
> to instanciate hugetlb VMAs that are backed up by something else,
> typically a simple mapping of IOs.
> 
> Anybody wants to do that or I keep it on my back burner until the time I
> finally get to do it ? :-)
> 

It's the sort of thing that has been resisted in the past, largely
because the only user at the time was about transparent hugepage
promotion/demotion. It would need to be a really strong incentive to
revive the effort.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-25 11:10             ` Mel Gorman
@ 2009-08-26  9:58               ` Benjamin Herrenschmidt
  2009-08-26 10:05                 ` Mel Gorman
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2009-08-26  9:58 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Tue, 2009-08-25 at 12:10 +0100, Mel Gorman wrote:
> On Tue, Aug 25, 2009 at 09:00:54PM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote:
> > 
> > > Why? One hugepage of default size will be one TLB entry. Each hugepage
> > > after that will be additional TLB entries so there is no savings on
> > > translation overhead.
> > > 
> > > Getting contiguous pages beyond the hugepage boundary is not a matter
> > > for GFP flags.
> > 
> > Note: This patch reminds me of something else I had on the backburner
> > for a while and never got a chance to actually implement...
> > 
> > There's various cases of drivers that could have good uses of hugetlb
> > mappings of device memory. For example, framebuffers.
> > 
> 
> Where is the buffer located? If it's in kernel space, than any contiguous
> allocation will be automatically backed by huge PTEs. As framebuffer allocation
> is probably happening early in boot, just calling alloc_pages() might do?

It's not a memory buffer, it's MMIO space (device memory, off your PCI
bus for example).

> Adam Litke at one point posted a pagetable-abstraction that would have
> been the first step on a path like this. It hurt the normal fastpath
> though and was ultimately put aside.

Which is why I think we should stick to just splitting hugetlb which
will not affect the normal path at all. Normal path for normal page,
HUGETLB VMAs for other sizes, whether they are backed with memory or by
anything else.

> It's the sort of thing that has been resisted in the past, largely
> because the only user at the time was about transparent hugepage
> promotion/demotion. It would need to be a really strong incentive to
> revive the effort.

Why ? I'm not proposing to hack the normal path. Just splitting
hugetlbfs in two which is reasonably easy to do, to allow drivers who
map large chunks of MMIO space to use larger page sizes.

This is the case of pretty much any discrete video card, a chunk of
RDMA-style devices, and possibly more.

It's a reasonably simple change that has 0 effect on the non-hugetlb
path. I think I'll just have to bite the bullet and send a demo patch
when I'm no longer bogged down :-)

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-26  9:58               ` Benjamin Herrenschmidt
@ 2009-08-26 10:05                 ` Mel Gorman
  0 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2009-08-26 10:05 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Wed, Aug 26, 2009 at 07:58:05PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2009-08-25 at 12:10 +0100, Mel Gorman wrote:
> > On Tue, Aug 25, 2009 at 09:00:54PM +1000, Benjamin Herrenschmidt wrote:
> > > On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote:
> > > 
> > > > Why? One hugepage of default size will be one TLB entry. Each hugepage
> > > > after that will be additional TLB entries so there is no savings on
> > > > translation overhead.
> > > > 
> > > > Getting contiguous pages beyond the hugepage boundary is not a matter
> > > > for GFP flags.
> > > 
> > > Note: This patch reminds me of something else I had on the backburner
> > > for a while and never got a chance to actually implement...
> > > 
> > > There's various cases of drivers that could have good uses of hugetlb
> > > mappings of device memory. For example, framebuffers.
> > > 
> > 
> > Where is the buffer located? If it's in kernel space, than any contiguous
> > allocation will be automatically backed by huge PTEs. As framebuffer allocation
> > is probably happening early in boot, just calling alloc_pages() might do?
> 
> It's not a memory buffer, it's MMIO space (device memory, off your PCI
> bus for example).
> 

Ah right, so you just want to set up huge PTEs within the MMIO space?

> > Adam Litke at one point posted a pagetable-abstraction that would have
> > been the first step on a path like this. It hurt the normal fastpath
> > though and was ultimately put aside.
> 
> Which is why I think we should stick to just splitting hugetlb which
> will not affect the normal path at all. Normal path for normal page,
> HUGETLB VMAs for other sizes, whether they are backed with memory or by
> anything else.
> 

Yeah, in this case I see why you want a hugetlbfs VMA, a huge-pte-backed VMA
and everything else. They are treated differently. I don't think it's exactly
what is required in the thread there though because there is a RAM-backed
buffer. For that, hugetlbfs still makes sense just to ensure the reservations
exist so that faults do not spuriously fail.  MMIO doesn't care because the
physical backing exists and is vaguely similar to MAP_SHARED.

> > It's the sort of thing that has been resisted in the past, largely
> > because the only user at the time was about transparent hugepage
> > promotion/demotion. It would need to be a really strong incentive to
> > revive the effort.
> 
> Why ? I'm not proposing to hack the normal path. Just splitting
> hugetlbfs in two which is reasonably easy to do, to allow drivers who
> map large chunks of MMIO space to use larger page sizes.
> 

That is a bit more reasonable. It would help the case of MMIO for sure.

> This is the case of pretty much any discrete video card, a chunk of
> RDMA-style devices, and possibly more.
> 
> It's a reasonably simple change that has 0 effect on the non-hugetlb
> path. I think I'll just have to bite the bullet and send a demo patch
> when I'm no longer bogged down :-)
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-19 10:05     ` Mel Gorman
  2009-08-19 10:35       ` Eric B Munson
  2009-08-20  7:03       ` Alexey Korolev
@ 2009-08-24  6:14       ` Alexey Korolev
  2009-08-25 10:53         ` Mel Gorman
  2 siblings, 1 reply; 15+ messages in thread
From: Alexey Korolev @ 2009-08-24  6:14 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel

Mel,

> How about;
>
>        o Extend Eric's helper slightly to take a GFP mask that is
>          associated with the inode and used for allocations from
>          outside the hugepage pool
>        o A helper that returns the page at a given offset within
>          a hugetlbfs file for population before the page has been
>          faulted.
>
> I know this is a bit hand-wavy, but it would allow significant sharing
> of the existing code and remove much of the hugetlbfs-awareness from
> your current driver.
>

I'm trying to write the solution you have described. The question I
have is about extension of hugetlb_file_setup function.
Is it supposed to allocate memory in hugetlb_file_setup function? Or
it is supposed to have reservation only.
If reservation only, then it is necessary to keep a gfp_mask for a
file somewhere. Would it be Ok to keep a gfp_mask for a file in
file->private_data?

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-24  6:14       ` Alexey Korolev
@ 2009-08-25 10:53         ` Mel Gorman
  2009-08-27 12:02           ` Alexey Korolev
  0 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2009-08-25 10:53 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Mon, Aug 24, 2009 at 06:14:30PM +1200, Alexey Korolev wrote:
> Mel,
> 
> > How about;
> >
> >        o Extend Eric's helper slightly to take a GFP mask that is
> >          associated with the inode and used for allocations from
> >          outside the hugepage pool
> >        o A helper that returns the page at a given offset within
> >          a hugetlbfs file for population before the page has been
> >          faulted.
> >
> > I know this is a bit hand-wavy, but it would allow significant sharing
> > of the existing code and remove much of the hugetlbfs-awareness from
> > your current driver.
> >
> 
> I'm trying to write the solution you have described. The question I
> have is about extension of hugetlb_file_setup function.
> Is it supposed to allocate memory in hugetlb_file_setup function? Or
> it is supposed to have reservation only.

It indirectly allocates. If there are sufficient hugepages in the static pool,
then it's reservation-only. If dynamic hugepage pool resizing is enabled,
it will allocate more hugepages if necessary and then reserve them.

> If reservation only, then it is necessary to keep a gfp_mask for a
> file somewhere. Would it be Ok to keep a gfp_mask for a file in
> file->private_data?
> 

I'm not seeing where this gfp mask is coming out of if you don't have zone
limitations. GFP masks don't help you get contiguity beyond the hugepage
boundary.

If you did need the GFP mask, you could store it in hugetlbfs_inode_info
as you'd expect all users of that inode to have the same GFP
requirements, right?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-25 10:53         ` Mel Gorman
@ 2009-08-27 12:02           ` Alexey Korolev
  2009-08-27 12:50             ` Mel Gorman
  0 siblings, 1 reply; 15+ messages in thread
From: Alexey Korolev @ 2009-08-27 12:02 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel

>
>> If reservation only, then it is necessary to keep a gfp_mask for a
>> file somewhere. Would it be Ok to keep a gfp_mask for a file in
>> file->private_data?
>>
>
> I'm not seeing where this gfp mask is coming out of if you don't have zone
> limitations. GFP masks don't help you get contiguity beyond the hugepage
> boundary.
Contiguity is different. It is not related to GFP mask.
Requirement to have large contigous buffer is dictated by h/w. Since
this is very specific case it will need very specific solution. So if
providing this, affects on usability of kernel interfaces it's better
to left interfaces good.
But large DMA buffers with large amount of sg regions is more common.
DMA engine often requires 32 address space. Plus memory must be non
movable.
That raises another question: would it be correct assumiing that
setting sysctl hugepages_treat_as_movable won't make huge pages
movable?
>
> If you did need the GFP mask, you could store it in hugetlbfs_inode_info
> as you'd expect all users of that inode to have the same GFP
> requirements, right?
Correct. The same GFP per inode is quite enough.
So that way works. I made a bit raw implementation, more testing and
tuning and I'll send out another version.

Thanks,
Alexey

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3]HTLB mapping for drivers (take 2)
  2009-08-27 12:02           ` Alexey Korolev
@ 2009-08-27 12:50             ` Mel Gorman
  0 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2009-08-27 12:50 UTC (permalink / raw)
  To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel

On Fri, Aug 28, 2009 at 12:02:05AM +1200, Alexey Korolev wrote:
> > > If reservation only, then it is necessary to keep a gfp_mask for a
> > > file somewhere. Would it be Ok to keep a gfp_mask for a file in
> > > file->private_data?
> > >
> >
> > I'm not seeing where this gfp mask is coming out of if you don't have zone
> > limitations. GFP masks don't help you get contiguity beyond the hugepage
> > boundary.
>
> Contiguity is different.

Ok, then contiguity is independant of any GFP mask considerations. Why
do you need a GFP mask?

> It is not related to GFP mask.
> Requirement to have large contigous buffer is dictated by h/w. Since
> this is very specific case it will need very specific solution. So if
> providing this, affects on usability of kernel interfaces it's better
> to left interfaces good.

You are in a bit of a bind with regards to contiguous allocations that are
larger than a huge page. Neither the huge page pools nor the buddy allocator
helps you much in this regard. I think it would be worth considering contiguous
allocations larger than a huge page size as a separate follow-on problem to
huge pages being available to a driver.

> But large DMA buffers with large amount of sg regions is more common.
> DMA engine often requires 32 address space. Plus memory must be non
> movable.
> That raises another question: would it be correct assumiing that
> setting sysctl hugepages_treat_as_movable won't make huge pages
> movable?

Correct, treating them as movable allows them to be allocated from
ZONE_MOVABLE. It's unlikely that swap support will be implemented for
huge pages. It's more likely that migration support would be implemented
at some point but AFAIK, there is little or not demand for that feature.

> > If you did need the GFP mask, you could store it in hugetlbfs_inode_info
> > as you'd expect all users of that inode to have the same GFP
> > requirements, right?
>
> Correct. The same GFP per inode is quite enough.
> So that way works. I made a bit raw implementation, more testing and
> tuning and I'll send out another version.
> 

Ok, but please keep the exposure of hugetlbfs internals to a minimum or
at least have a strong justification. As it is, I'm not understanding why
expanding Eric's helper for MAP_HUGETLB slightly and maintaining a mapping
between your driver file and the underlying hugetlbfs file does not cover
most of the problem.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-08-27 12:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-17 22:24 [PATCH 0/3]HTLB mapping for drivers (take 2) Alexey Korolev
2009-08-18 10:29 ` Eric Munson
2009-08-19  5:48   ` Alexey Korolev
2009-08-19 10:05     ` Mel Gorman
2009-08-19 10:35       ` Eric B Munson
2009-08-20  7:03       ` Alexey Korolev
2009-08-25 10:47         ` Mel Gorman
2009-08-25 11:00           ` Benjamin Herrenschmidt
2009-08-25 11:10             ` Mel Gorman
2009-08-26  9:58               ` Benjamin Herrenschmidt
2009-08-26 10:05                 ` Mel Gorman
2009-08-24  6:14       ` Alexey Korolev
2009-08-25 10:53         ` Mel Gorman
2009-08-27 12:02           ` Alexey Korolev
2009-08-27 12:50             ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox