* [PATCH 0/3]HTLB mapping for drivers (take 2) @ 2009-08-17 22:24 Alexey Korolev 2009-08-18 10:29 ` Eric Munson 0 siblings, 1 reply; 15+ messages in thread From: Alexey Korolev @ 2009-08-17 22:24 UTC (permalink / raw) To: Mel Gorman, linux-mm, linux-kernel Hi, The patch set listed below provides device drivers with the ability to map memory regions to user space via HTLB interfaces. Why we need it? Device drivers often need to map memory regions to user-space to allow efficient data handling in user mode. Involving hugetlb mapping may bring performance gain if mapped regions are relatively large. Our tests showed that it is possible to gain up to 7% performance if hugetlb mapping is enabled. In my case involving hugetlb starts to make sense if buffer is more or equal to 4MB. Since typically, device throughput increase over time there are more and more reasons to involve huge pages to remap large regions. For example hugetlb remapping could be important for performance of Data acquisition systems (logic analyzers, DSO), Network monitoring systems (packet capture), HD video capture/frame buffer and probably other. How it is implemented? Implementation and idea is very close to what is already done in ipc/shm.c. We create file on hugetlbfs vfsmount point and populate file with pages we want to mmap. Then we associate hugetlbfs file mapping with file mapping we want to access. So typical procedure for mapping of huge pages to userspace by drivers should be: 1 Allocate some huge pages 2 Create file on vfs mount of hugetlbfs 3 Add pages to page cache of mapping associated with hugetlbfs file 4 Replace file's mapping with the hugetlbfs file mapping .............. 5 Remove pages from page cache 6 Remove hugetlbfs file 7 Free pages (Please find example in following messages) Detailed description is given in the following messages. Thanks a lot to Mel Gorman who gave good advice and code prototype and Stephen Donnelly for assistance in description composing. Alexey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-17 22:24 [PATCH 0/3]HTLB mapping for drivers (take 2) Alexey Korolev @ 2009-08-18 10:29 ` Eric Munson 2009-08-19 5:48 ` Alexey Korolev 0 siblings, 1 reply; 15+ messages in thread From: Eric Munson @ 2009-08-18 10:29 UTC (permalink / raw) To: Alexey Korolev; +Cc: Mel Gorman, linux-mm, linux-kernel On Mon, Aug 17, 2009 at 11:24 PM, Alexey Korolev<akorolev@infradead.org> wrote: > Hi, > > The patch set listed below provides device drivers with the ability to > map memory regions to user space via HTLB interfaces. > > Why we need it? > Device drivers often need to map memory regions to user-space to allow > efficient data handling in user mode. Involving hugetlb mapping may > bring performance gain if mapped regions are relatively large. Our tests > showed that it is possible to gain up to 7% performance if hugetlb > mapping is enabled. In my case involving hugetlb starts to make sense if > buffer is more or equal to 4MB. Since typically, device throughput > increase over time there are more and more reasons to involve huge pages > to remap large regions. > For example hugetlb remapping could be important for performance of Data > acquisition systems (logic analyzers, DSO), Network monitoring systems > (packet capture), HD video capture/frame buffer and probably other. > > How it is implemented? > Implementation and idea is very close to what is already done in > ipc/shm.c. > We create file on hugetlbfs vfsmount point and populate file with pages > we want to mmap. Then we associate hugetlbfs file mapping with file > mapping we want to access. > > So typical procedure for mapping of huge pages to userspace by drivers > should be: > 1 Allocate some huge pages > 2 Create file on vfs mount of hugetlbfs > 3 Add pages to page cache of mapping associated with hugetlbfs file > 4 Replace file's mapping with the hugetlbfs file mapping > .............. > 5 Remove pages from page cache > 6 Remove hugetlbfs file > 7 Free pages > (Please find example in following messages) > > Detailed description is given in the following messages. > Thanks a lot to Mel Gorman who gave good advice and code prototype and > Stephen Donnelly for assistance in description composing. > > Alexey It sounds like this patch set working towards the same goal as my MAP_HUGETLB set. The only difference I see is you allocate huge page at a time and (if I am understanding the patch) fault the page in immediately, where MAP_HUGETLB only faults pages as needed. Does the MAP_HUGETLB patch set provide the functionality that you need, and if not, what can be done to provide what you need? Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-18 10:29 ` Eric Munson @ 2009-08-19 5:48 ` Alexey Korolev 2009-08-19 10:05 ` Mel Gorman 0 siblings, 1 reply; 15+ messages in thread From: Alexey Korolev @ 2009-08-19 5:48 UTC (permalink / raw) To: Eric Munson; +Cc: Alexey Korolev, Mel Gorman, linux-mm, linux-kernel Hi, > > It sounds like this patch set working towards the same goal as my > MAP_HUGETLB set. The only difference I see is you allocate huge page > at a time and (if I am understanding the patch) fault the page in > immediately, where MAP_HUGETLB only faults pages as needed. Does the > MAP_HUGETLB patch set provide the functionality that you need, and if > not, what can be done to provide what you need? > > Eric > Thanks a lot for willing to help. I'll be much appreciate if you have an interesting idea how HTLB mapping for drivers can be done. It is better to describe use case in order to make it clear what needs to be done. Driver provides mapping of device DMA buffers to user level applications. User level applications process the data. Device is using a master DMA to send data to the user buffer, buffer size can be >1GB and performance is very important. (So huge pages mapping really makes sense.) In addition we have to mention that: 1. It is hard for user to tell how much huge pages needs to be reserved by the driver. 2. Devices add constrains on memory regions. For example it needs to be contiguous with in the physical address space. It is necessary to have ability to specify special gfp flags. 3 The HW needs to access physical memory before the user level software can access it. (Hugetlbfs picks up pages on page fault from pool). It means memory allocation needs to be driven by device driver. Original idea was: create hugetlbfs file which has common mapping with device file. Allocate memory. Populate page cache of hugetlbfs file with allocated pages. When fault occurs, page will be taken from page cache and then remapped to user space by hugetlbfs. Another possible approach is described here: http://marc.info/?l=linux-mm&m=125065257431410&w=2 But currently not sure will it work or not. Thanks, Alexey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-19 5:48 ` Alexey Korolev @ 2009-08-19 10:05 ` Mel Gorman 2009-08-19 10:35 ` Eric B Munson ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Mel Gorman @ 2009-08-19 10:05 UTC (permalink / raw) To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Wed, Aug 19, 2009 at 05:48:11PM +1200, Alexey Korolev wrote: > Hi, > > > > It sounds like this patch set working towards the same goal as my > > MAP_HUGETLB set. The only difference I see is you allocate huge page > > at a time and (if I am understanding the patch) fault the page in > > immediately, where MAP_HUGETLB only faults pages as needed. Does the > > MAP_HUGETLB patch set provide the functionality that you need, and if > > not, what can be done to provide what you need? > > > > Thanks a lot for willing to help. I'll be much appreciate if you have > an interesting idea how HTLB mapping for drivers can be done. > > It is better to describe use case in order to make it clear what needs > to be done. > Driver provides mapping of device DMA buffers to user level > applications. Ok, so the buffer is in normal memory. When mmap() is called, the buffer is already populated by data DMA'd from the device. That scenario rules out calling mmap(MAP_ANONYMOUS|MAP_HUGETLB) because userspace has access to the buffer before it is populated by data from the device. However, it does not rule out mmap(MAP_ANONYMOUS|MAP_HUGETLB) when userspace is responsible for populating a buffer for sending to a device. i.e. whether it is suitable or not depends on when the buffer is populated and who is doing it. > User level applications process the data. > Device is using a master DMA to send data to the user buffer, buffer > size can be >1GB and performance is very important. (So huge pages > mapping really makes sense.) > Ok, so the DMA may be faster because you have to do less scatter/gather and can DMA in larger chunks and and reading from userspace may be faster because there is less translation overhead. Right? > In addition we have to mention that: > 1. It is hard for user to tell how much huge pages needs to be > reserved by the driver. I think you have this problem either way. If the buffer is allocated and populated before mmap(), then the driver is going to have to guess how many pages it needs. If the DMA occurs as a result of mmap(), it's easier because you know the number of huge pages to be reserved at that point and you have the option of falling back to small pages if necessary. > 2. Devices add constrains on memory regions. For example it needs to > be contiguous with in the physical address space. It is necessary to > have ability to specify special gfp flags. The contiguity constraints are the same for huge pages. Do you mean there are zone restrictions? If so, the hugetlbfs_file_setup() function could be extended to specify a GFP mask that is used for the allocation of hugepages and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask mask that is applied to some additional flags so htlb_alloc_mask would be the default mask unless otherwise specified. > 3 The HW needs to access physical memory before the user level > software can access it. (Hugetlbfs picks up pages on page fault from > pool). > It means memory allocation needs to be driven by device driver. > How about; o Extend Eric's helper slightly to take a GFP mask that is associated with the inode and used for allocations from outside the hugepage pool o A helper that returns the page at a given offset within a hugetlbfs file for population before the page has been faulted. I know this is a bit hand-wavy, but it would allow significant sharing of the existing code and remove much of the hugetlbfs-awareness from your current driver. > Original idea was: create hugetlbfs file which has common mapping with > device file. Allocate memory. Populate page cache of hugetlbfs file > with allocated pages. > When fault occurs, page will be taken from page cache and then > remapped to user space by hugetlbfs. > > Another possible approach is described here: > http://marc.info/?l=linux-mm&m=125065257431410&w=2 > But currently not sure will it work or not. > > > Thanks, > Alexey > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-19 10:05 ` Mel Gorman @ 2009-08-19 10:35 ` Eric B Munson 2009-08-20 7:03 ` Alexey Korolev 2009-08-24 6:14 ` Alexey Korolev 2 siblings, 0 replies; 15+ messages in thread From: Eric B Munson @ 2009-08-19 10:35 UTC (permalink / raw) To: Mel Gorman; +Cc: Alexey Korolev, Alexey Korolev, linux-mm, linux-kernel On Wed, Aug 19, 2009 at 11:05 AM, Mel Gorman<mel@csn.ul.ie> wrote: > On Wed, Aug 19, 2009 at 05:48:11PM +1200, Alexey Korolev wrote: >> Hi, >> > >> > It sounds like this patch set working towards the same goal as my >> > MAP_HUGETLB set. The only difference I see is you allocate huge page >> > at a time and (if I am understanding the patch) fault the page in >> > immediately, where MAP_HUGETLB only faults pages as needed. Does the >> > MAP_HUGETLB patch set provide the functionality that you need, and if >> > not, what can be done to provide what you need? >> > >> >> Thanks a lot for willing to help. I'll be much appreciate if you have >> an interesting idea how HTLB mapping for drivers can be done. >> >> It is better to describe use case in order to make it clear what needs >> to be done. >> Driver provides mapping of device DMA buffers to user level >> applications. > > Ok, so the buffer is in normal memory. When mmap() is called, the buffer > is already populated by data DMA'd from the device. That scenario rules out > calling mmap(MAP_ANONYMOUS|MAP_HUGETLB) because userspace has access to the > buffer before it is populated by data from the device. > > However, it does not rule out mmap(MAP_ANONYMOUS|MAP_HUGETLB) when userspace > is responsible for populating a buffer for sending to a device. i.e. whether it > is suitable or not depends on when the buffer is populated and who is doing it. > >> User level applications process the data. >> Device is using a master DMA to send data to the user buffer, buffer >> size can be >1GB and performance is very important. (So huge pages >> mapping really makes sense.) >> > > Ok, so the DMA may be faster because you have to do less scatter/gather > and can DMA in larger chunks and and reading from userspace may be faster > because there is less translation overhead. Right? > >> In addition we have to mention that: >> 1. It is hard for user to tell how much huge pages needs to be >> reserved by the driver. > > I think you have this problem either way. If the buffer is allocated and > populated before mmap(), then the driver is going to have to guess how many > pages it needs. If the DMA occurs as a result of mmap(), it's easier because > you know the number of huge pages to be reserved at that point and you have > the option of falling back to small pages if necessary. > >> 2. Devices add constrains on memory regions. For example it needs to >> be contiguous with in the physical address space. It is necessary to >> have ability to specify special gfp flags. > > The contiguity constraints are the same for huge pages. Do you mean there > are zone restrictions? If so, the hugetlbfs_file_setup() function could be > extended to specify a GFP mask that is used for the allocation of hugepages > and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask > mask that is applied to some additional flags so htlb_alloc_mask would be > the default mask unless otherwise specified. > >> 3 The HW needs to access physical memory before the user level >> software can access it. (Hugetlbfs picks up pages on page fault from >> pool). >> It means memory allocation needs to be driven by device driver. >> > > How about; > > o Extend Eric's helper slightly to take a GFP mask that is > associated with the inode and used for allocations from > outside the hugepage pool > o A helper that returns the page at a given offset within > a hugetlbfs file for population before the page has been > faulted. > > I know this is a bit hand-wavy, but it would allow significant sharing > of the existing code and remove much of the hugetlbfs-awareness from > your current driver. > >> Original idea was: create hugetlbfs file which has common mapping with >> device file. Allocate memory. Populate page cache of hugetlbfs file >> with allocated pages. >> When fault occurs, page will be taken from page cache and then >> remapped to user space by hugetlbfs. >> >> Another possible approach is described here: >> http://marc.info/?l=linux-mm&m=125065257431410&w=2 >> But currently not sure will it work or not. >> >> >> Thanks, >> Alexey >> > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > Alexey, I'd be willing to take a stab at a prototype of Mel's suggestion based on my patch set if you this it would be useful to you. Eric -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-19 10:05 ` Mel Gorman 2009-08-19 10:35 ` Eric B Munson @ 2009-08-20 7:03 ` Alexey Korolev 2009-08-25 10:47 ` Mel Gorman 2009-08-24 6:14 ` Alexey Korolev 2 siblings, 1 reply; 15+ messages in thread From: Alexey Korolev @ 2009-08-20 7:03 UTC (permalink / raw) To: Mel Gorman; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel Mel, >> User level applications process the data. >> Device is using a master DMA to send data to the user buffer, buffer >> size can be >1GB and performance is very important. (So huge pages >> mapping really makes sense.) >> > > Ok, so the DMA may be faster because you have to do less scatter/gather > and can DMA in larger chunks and and reading from userspace may be faster > because there is less translation overhead. Right? > Less translation overhead is important. Unfortunately not all devices have scatter/gather (our case) as having it increase h/w complexity a lot. >> In addition we have to mention that: >> 1. It is hard for user to tell how much huge pages needs to be >> reserved by the driver. > > I think you have this problem either way. If the buffer is allocated and > populated before mmap(), then the driver is going to have to guess how many > pages it needs. If the DMA occurs as a result of mmap(), it's easier because > you know the number of huge pages to be reserved at that point and you have > the option of falling back to small pages if necessary. > >> 2. Devices add constrains on memory regions. For example it needs to >> be contiguous with in the physical address space. It is necessary to >> have ability to specify special gfp flags. > > The contiguity constraints are the same for huge pages. Do you mean there > are zone restrictions? If so, the hugetlbfs_file_setup() function could be > extended to specify a GFP mask that is used for the allocation of hugepages > and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask > mask that is applied to some additional flags so htlb_alloc_mask would be > the default mask unless otherwise specified. > Under contiguous I mean that we need several huge pages being physically contiguous. To obtain it we allocate pages till not find a contig. region (success) or reach a boundary (fail). So in our particular case approach based on getting pages from hugetlbfs won't work because memory region will not be contiguous. However this approach will give an easy way to support hugetlb mapping, it won't cause any complexity in accounting. But it will be suitable for hardware with large amount of sg regions only. > > How about; > > o Extend Eric's helper slightly to take a GFP mask that is > associated with the inode and used for allocations from > outside the hugepage pool > o A helper that returns the page at a given offset within > a hugetlbfs file for population before the page has been > faulted. Do you mean get_user_pages call? Alexey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-20 7:03 ` Alexey Korolev @ 2009-08-25 10:47 ` Mel Gorman 2009-08-25 11:00 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Mel Gorman @ 2009-08-25 10:47 UTC (permalink / raw) To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Thu, Aug 20, 2009 at 07:03:28PM +1200, Alexey Korolev wrote: > Mel, > > >> User level applications process the data. > >> Device is using a master DMA to send data to the user buffer, buffer > >> size can be >1GB and performance is very important. (So huge pages > >> mapping really makes sense.) > >> > > > > Ok, so the DMA may be faster because you have to do less scatter/gather > > and can DMA in larger chunks and and reading from userspace may be faster > > because there is less translation overhead. Right? > > > Less translation overhead is important. Unfortunately not all devices > have scatter/gather > (our case) as having it increase h/w complexity a lot. > Ok. > >> In addition we have to mention that: > >> 1. It is hard for user to tell how much huge pages needs to be > >> reserved by the driver. > > > > I think you have this problem either way. If the buffer is allocated and > > populated before mmap(), then the driver is going to have to guess how many > > pages it needs. If the DMA occurs as a result of mmap(), it's easier because > > you know the number of huge pages to be reserved at that point and you have > > the option of falling back to small pages if necessary. > > > >> 2. Devices add constrains on memory regions. For example it needs to > >> be contiguous with in the physical address space. It is necessary to > >> have ability to specify special gfp flags. > > > > The contiguity constraints are the same for huge pages. Do you mean there > > are zone restrictions? If so, the hugetlbfs_file_setup() function could be > > extended to specify a GFP mask that is used for the allocation of hugepages > > and associated with the hugetlbfs inode. Right now, there is a htlb_alloc_mask > > mask that is applied to some additional flags so htlb_alloc_mask would be > > the default mask unless otherwise specified. > > > Under contiguous I mean that we need several huge pages being > physically contiguous. Why? One hugepage of default size will be one TLB entry. Each hugepage after that will be additional TLB entries so there is no savings on translation overhead. Getting contiguous pages beyond the hugepage boundary is not a matter for GFP flags. > To obtain it we allocate pages till not find a contig. region > (success) or reach a boundary (fail). > So in our particular case approach based on getting pages from > hugetlbfs won't work > because memory region will not be contiguous. With a direct allocation of hugepages, there is no guarantee they will be contiguous either. If you need contiguity above hugepages (which in many cases will also be the largest page the buddy allocator can grant), you need something else. > However this approach will give an easy way to support hugetlb > mapping, it won't cause any complexity > in accounting. But it will be suitable for hardware with large amount > of sg regions only. > > > > > How about; > > > > o Extend Eric's helper slightly to take a GFP mask that is > > associated with the inode and used for allocations from > > outside the hugepage pool > > o A helper that returns the page at a given offset within > > a hugetlbfs file for population before the page has been > > faulted. > > Do you mean get_user_pages call? > If you're willing to call it directly, sure. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-25 10:47 ` Mel Gorman @ 2009-08-25 11:00 ` Benjamin Herrenschmidt 2009-08-25 11:10 ` Mel Gorman 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2009-08-25 11:00 UTC (permalink / raw) To: Mel Gorman Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote: > Why? One hugepage of default size will be one TLB entry. Each hugepage > after that will be additional TLB entries so there is no savings on > translation overhead. > > Getting contiguous pages beyond the hugepage boundary is not a matter > for GFP flags. Note: This patch reminds me of something else I had on the backburner for a while and never got a chance to actually implement... There's various cases of drivers that could have good uses of hugetlb mappings of device memory. For example, framebuffers. I looked at it a while back and it occured to me (and Nick) that ideally, we should split hugetlb and hugetlbfs. Basically, on one side, we have the (mostly arch specific) populating and walking of page tables with hugetlb translations, associated huge VMAs, etc... On the other side, hugetlbfs is backing that with memory. Ideally, the former would have some kind of "standard" ops that hugetlbfs can hook into for the existing case (moving some stuff out of the common data structure and splitting it in two), allowing the driver to instanciate hugetlb VMAs that are backed up by something else, typically a simple mapping of IOs. Anybody wants to do that or I keep it on my back burner until the time I finally get to do it ? :-) Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-25 11:00 ` Benjamin Herrenschmidt @ 2009-08-25 11:10 ` Mel Gorman 2009-08-26 9:58 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Mel Gorman @ 2009-08-25 11:10 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Tue, Aug 25, 2009 at 09:00:54PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote: > > > Why? One hugepage of default size will be one TLB entry. Each hugepage > > after that will be additional TLB entries so there is no savings on > > translation overhead. > > > > Getting contiguous pages beyond the hugepage boundary is not a matter > > for GFP flags. > > Note: This patch reminds me of something else I had on the backburner > for a while and never got a chance to actually implement... > > There's various cases of drivers that could have good uses of hugetlb > mappings of device memory. For example, framebuffers. > Where is the buffer located? If it's in kernel space, than any contiguous allocation will be automatically backed by huge PTEs. As framebuffer allocation is probably happening early in boot, just calling alloc_pages() might do? > I looked at it a while back and it occured to me (and Nick) that > ideally, we should split hugetlb and hugetlbfs. > Yeah, you're not the first to come to that conclusion :) > Basically, on one side, we have the (mostly arch specific) populating > and walking of page tables with hugetlb translations, associated huge > VMAs, etc... > > On the other side, hugetlbfs is backing that with memory. > > Ideally, the former would have some kind of "standard" ops that > hugetlbfs can hook into for the existing case (moving some stuff out of > the common data structure and splitting it in two), Adam Litke at one point posted a pagetable-abstraction that would have been the first step on a path like this. It hurt the normal fastpath though and was ultimately put aside. > allowing the driver > to instanciate hugetlb VMAs that are backed up by something else, > typically a simple mapping of IOs. > > Anybody wants to do that or I keep it on my back burner until the time I > finally get to do it ? :-) > It's the sort of thing that has been resisted in the past, largely because the only user at the time was about transparent hugepage promotion/demotion. It would need to be a really strong incentive to revive the effort. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-25 11:10 ` Mel Gorman @ 2009-08-26 9:58 ` Benjamin Herrenschmidt 2009-08-26 10:05 ` Mel Gorman 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2009-08-26 9:58 UTC (permalink / raw) To: Mel Gorman Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Tue, 2009-08-25 at 12:10 +0100, Mel Gorman wrote: > On Tue, Aug 25, 2009 at 09:00:54PM +1000, Benjamin Herrenschmidt wrote: > > On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote: > > > > > Why? One hugepage of default size will be one TLB entry. Each hugepage > > > after that will be additional TLB entries so there is no savings on > > > translation overhead. > > > > > > Getting contiguous pages beyond the hugepage boundary is not a matter > > > for GFP flags. > > > > Note: This patch reminds me of something else I had on the backburner > > for a while and never got a chance to actually implement... > > > > There's various cases of drivers that could have good uses of hugetlb > > mappings of device memory. For example, framebuffers. > > > > Where is the buffer located? If it's in kernel space, than any contiguous > allocation will be automatically backed by huge PTEs. As framebuffer allocation > is probably happening early in boot, just calling alloc_pages() might do? It's not a memory buffer, it's MMIO space (device memory, off your PCI bus for example). > Adam Litke at one point posted a pagetable-abstraction that would have > been the first step on a path like this. It hurt the normal fastpath > though and was ultimately put aside. Which is why I think we should stick to just splitting hugetlb which will not affect the normal path at all. Normal path for normal page, HUGETLB VMAs for other sizes, whether they are backed with memory or by anything else. > It's the sort of thing that has been resisted in the past, largely > because the only user at the time was about transparent hugepage > promotion/demotion. It would need to be a really strong incentive to > revive the effort. Why ? I'm not proposing to hack the normal path. Just splitting hugetlbfs in two which is reasonably easy to do, to allow drivers who map large chunks of MMIO space to use larger page sizes. This is the case of pretty much any discrete video card, a chunk of RDMA-style devices, and possibly more. It's a reasonably simple change that has 0 effect on the non-hugetlb path. I think I'll just have to bite the bullet and send a demo patch when I'm no longer bogged down :-) Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-26 9:58 ` Benjamin Herrenschmidt @ 2009-08-26 10:05 ` Mel Gorman 0 siblings, 0 replies; 15+ messages in thread From: Mel Gorman @ 2009-08-26 10:05 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Alexey Korolev, Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Wed, Aug 26, 2009 at 07:58:05PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2009-08-25 at 12:10 +0100, Mel Gorman wrote: > > On Tue, Aug 25, 2009 at 09:00:54PM +1000, Benjamin Herrenschmidt wrote: > > > On Tue, 2009-08-25 at 11:47 +0100, Mel Gorman wrote: > > > > > > > Why? One hugepage of default size will be one TLB entry. Each hugepage > > > > after that will be additional TLB entries so there is no savings on > > > > translation overhead. > > > > > > > > Getting contiguous pages beyond the hugepage boundary is not a matter > > > > for GFP flags. > > > > > > Note: This patch reminds me of something else I had on the backburner > > > for a while and never got a chance to actually implement... > > > > > > There's various cases of drivers that could have good uses of hugetlb > > > mappings of device memory. For example, framebuffers. > > > > > > > Where is the buffer located? If it's in kernel space, than any contiguous > > allocation will be automatically backed by huge PTEs. As framebuffer allocation > > is probably happening early in boot, just calling alloc_pages() might do? > > It's not a memory buffer, it's MMIO space (device memory, off your PCI > bus for example). > Ah right, so you just want to set up huge PTEs within the MMIO space? > > Adam Litke at one point posted a pagetable-abstraction that would have > > been the first step on a path like this. It hurt the normal fastpath > > though and was ultimately put aside. > > Which is why I think we should stick to just splitting hugetlb which > will not affect the normal path at all. Normal path for normal page, > HUGETLB VMAs for other sizes, whether they are backed with memory or by > anything else. > Yeah, in this case I see why you want a hugetlbfs VMA, a huge-pte-backed VMA and everything else. They are treated differently. I don't think it's exactly what is required in the thread there though because there is a RAM-backed buffer. For that, hugetlbfs still makes sense just to ensure the reservations exist so that faults do not spuriously fail. MMIO doesn't care because the physical backing exists and is vaguely similar to MAP_SHARED. > > It's the sort of thing that has been resisted in the past, largely > > because the only user at the time was about transparent hugepage > > promotion/demotion. It would need to be a really strong incentive to > > revive the effort. > > Why ? I'm not proposing to hack the normal path. Just splitting > hugetlbfs in two which is reasonably easy to do, to allow drivers who > map large chunks of MMIO space to use larger page sizes. > That is a bit more reasonable. It would help the case of MMIO for sure. > This is the case of pretty much any discrete video card, a chunk of > RDMA-style devices, and possibly more. > > It's a reasonably simple change that has 0 effect on the non-hugetlb > path. I think I'll just have to bite the bullet and send a demo patch > when I'm no longer bogged down :-) > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-19 10:05 ` Mel Gorman 2009-08-19 10:35 ` Eric B Munson 2009-08-20 7:03 ` Alexey Korolev @ 2009-08-24 6:14 ` Alexey Korolev 2009-08-25 10:53 ` Mel Gorman 2 siblings, 1 reply; 15+ messages in thread From: Alexey Korolev @ 2009-08-24 6:14 UTC (permalink / raw) To: Mel Gorman; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel Mel, > How about; > > o Extend Eric's helper slightly to take a GFP mask that is > associated with the inode and used for allocations from > outside the hugepage pool > o A helper that returns the page at a given offset within > a hugetlbfs file for population before the page has been > faulted. > > I know this is a bit hand-wavy, but it would allow significant sharing > of the existing code and remove much of the hugetlbfs-awareness from > your current driver. > I'm trying to write the solution you have described. The question I have is about extension of hugetlb_file_setup function. Is it supposed to allocate memory in hugetlb_file_setup function? Or it is supposed to have reservation only. If reservation only, then it is necessary to keep a gfp_mask for a file somewhere. Would it be Ok to keep a gfp_mask for a file in file->private_data? Thanks, Alexey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-24 6:14 ` Alexey Korolev @ 2009-08-25 10:53 ` Mel Gorman 2009-08-27 12:02 ` Alexey Korolev 0 siblings, 1 reply; 15+ messages in thread From: Mel Gorman @ 2009-08-25 10:53 UTC (permalink / raw) To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Mon, Aug 24, 2009 at 06:14:30PM +1200, Alexey Korolev wrote: > Mel, > > > How about; > > > > o Extend Eric's helper slightly to take a GFP mask that is > > associated with the inode and used for allocations from > > outside the hugepage pool > > o A helper that returns the page at a given offset within > > a hugetlbfs file for population before the page has been > > faulted. > > > > I know this is a bit hand-wavy, but it would allow significant sharing > > of the existing code and remove much of the hugetlbfs-awareness from > > your current driver. > > > > I'm trying to write the solution you have described. The question I > have is about extension of hugetlb_file_setup function. > Is it supposed to allocate memory in hugetlb_file_setup function? Or > it is supposed to have reservation only. It indirectly allocates. If there are sufficient hugepages in the static pool, then it's reservation-only. If dynamic hugepage pool resizing is enabled, it will allocate more hugepages if necessary and then reserve them. > If reservation only, then it is necessary to keep a gfp_mask for a > file somewhere. Would it be Ok to keep a gfp_mask for a file in > file->private_data? > I'm not seeing where this gfp mask is coming out of if you don't have zone limitations. GFP masks don't help you get contiguity beyond the hugepage boundary. If you did need the GFP mask, you could store it in hugetlbfs_inode_info as you'd expect all users of that inode to have the same GFP requirements, right? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-25 10:53 ` Mel Gorman @ 2009-08-27 12:02 ` Alexey Korolev 2009-08-27 12:50 ` Mel Gorman 0 siblings, 1 reply; 15+ messages in thread From: Alexey Korolev @ 2009-08-27 12:02 UTC (permalink / raw) To: Mel Gorman; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel > >> If reservation only, then it is necessary to keep a gfp_mask for a >> file somewhere. Would it be Ok to keep a gfp_mask for a file in >> file->private_data? >> > > I'm not seeing where this gfp mask is coming out of if you don't have zone > limitations. GFP masks don't help you get contiguity beyond the hugepage > boundary. Contiguity is different. It is not related to GFP mask. Requirement to have large contigous buffer is dictated by h/w. Since this is very specific case it will need very specific solution. So if providing this, affects on usability of kernel interfaces it's better to left interfaces good. But large DMA buffers with large amount of sg regions is more common. DMA engine often requires 32 address space. Plus memory must be non movable. That raises another question: would it be correct assumiing that setting sysctl hugepages_treat_as_movable won't make huge pages movable? > > If you did need the GFP mask, you could store it in hugetlbfs_inode_info > as you'd expect all users of that inode to have the same GFP > requirements, right? Correct. The same GFP per inode is quite enough. So that way works. I made a bit raw implementation, more testing and tuning and I'll send out another version. Thanks, Alexey -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3]HTLB mapping for drivers (take 2) 2009-08-27 12:02 ` Alexey Korolev @ 2009-08-27 12:50 ` Mel Gorman 0 siblings, 0 replies; 15+ messages in thread From: Mel Gorman @ 2009-08-27 12:50 UTC (permalink / raw) To: Alexey Korolev; +Cc: Eric Munson, Alexey Korolev, linux-mm, linux-kernel On Fri, Aug 28, 2009 at 12:02:05AM +1200, Alexey Korolev wrote: > > > If reservation only, then it is necessary to keep a gfp_mask for a > > > file somewhere. Would it be Ok to keep a gfp_mask for a file in > > > file->private_data? > > > > > > > I'm not seeing where this gfp mask is coming out of if you don't have zone > > limitations. GFP masks don't help you get contiguity beyond the hugepage > > boundary. > > Contiguity is different. Ok, then contiguity is independant of any GFP mask considerations. Why do you need a GFP mask? > It is not related to GFP mask. > Requirement to have large contigous buffer is dictated by h/w. Since > this is very specific case it will need very specific solution. So if > providing this, affects on usability of kernel interfaces it's better > to left interfaces good. You are in a bit of a bind with regards to contiguous allocations that are larger than a huge page. Neither the huge page pools nor the buddy allocator helps you much in this regard. I think it would be worth considering contiguous allocations larger than a huge page size as a separate follow-on problem to huge pages being available to a driver. > But large DMA buffers with large amount of sg regions is more common. > DMA engine often requires 32 address space. Plus memory must be non > movable. > That raises another question: would it be correct assumiing that > setting sysctl hugepages_treat_as_movable won't make huge pages > movable? Correct, treating them as movable allows them to be allocated from ZONE_MOVABLE. It's unlikely that swap support will be implemented for huge pages. It's more likely that migration support would be implemented at some point but AFAIK, there is little or not demand for that feature. > > If you did need the GFP mask, you could store it in hugetlbfs_inode_info > > as you'd expect all users of that inode to have the same GFP > > requirements, right? > > Correct. The same GFP per inode is quite enough. > So that way works. I made a bit raw implementation, more testing and > tuning and I'll send out another version. > Ok, but please keep the exposure of hugetlbfs internals to a minimum or at least have a strong justification. As it is, I'm not understanding why expanding Eric's helper for MAP_HUGETLB slightly and maintaining a mapping between your driver file and the underlying hugetlbfs file does not cover most of the problem. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2009-08-27 12:50 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-08-17 22:24 [PATCH 0/3]HTLB mapping for drivers (take 2) Alexey Korolev 2009-08-18 10:29 ` Eric Munson 2009-08-19 5:48 ` Alexey Korolev 2009-08-19 10:05 ` Mel Gorman 2009-08-19 10:35 ` Eric B Munson 2009-08-20 7:03 ` Alexey Korolev 2009-08-25 10:47 ` Mel Gorman 2009-08-25 11:00 ` Benjamin Herrenschmidt 2009-08-25 11:10 ` Mel Gorman 2009-08-26 9:58 ` Benjamin Herrenschmidt 2009-08-26 10:05 ` Mel Gorman 2009-08-24 6:14 ` Alexey Korolev 2009-08-25 10:53 ` Mel Gorman 2009-08-27 12:02 ` Alexey Korolev 2009-08-27 12:50 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox