* [PATCH 0/1] gigantic compound pages part 2 @ 2008-10-08 9:33 Andy Whitcroft 2008-10-08 9:33 ` [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER Andy Whitcroft 0 siblings, 1 reply; 12+ messages in thread From: Andy Whitcroft @ 2008-10-08 9:33 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin, Andy Whitcroft Full stress testing of 2.6.27-rc7 with the patch below threw up some more places where we assume the mem_map is contigious: handle initialising compound pages at orders greater than MAX_ORDER Following this email is an additional patch to fix up those places. With this patch the libhugetlbfs functional tests pass, as do our stress test loads. Thanks to Jon Tollefson for his help testing these patches. Please consider this patch for merge for 2.6.27. -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 9:33 [PATCH 0/1] gigantic compound pages part 2 Andy Whitcroft @ 2008-10-08 9:33 ` Andy Whitcroft 2008-10-08 12:29 ` Nick Piggin ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Andy Whitcroft @ 2008-10-08 9:33 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin, Andy Whitcroft When working with hugepages, hugetlbfs assumes that those hugepages are smaller than MAX_ORDER. Specifically it assumes that the mem_map is contigious and uses that to optimise access to the elements of the mem_map that represent the hugepage. Gigantic pages (such as 16GB pages on powerpc) by definition are of greater order than MAX_ORDER (larger than MAX_ORDER_NR_PAGES in size). This means that we can no longer make use of the buddy alloctor guarentees for the contiguity of the mem_map, which ensures that the mem_map is at least contigious for maximmally aligned areas of MAX_ORDER_NR_PAGES pages. This patch adds new mem_map accessors and iterator helpers which handle any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these within copy_huge_page, clear_huge_page, and follow_hugetlb_page to allow these to handle gigantic pages. Signed-off-by: Andy Whitcroft <apw@shadowen.org> --- mm/hugetlb.c | 15 ++++++++++----- mm/internal.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 38 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67a7119..bb5cf81 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -357,11 +357,12 @@ static void clear_huge_page(struct page *page, unsigned long addr, unsigned long sz) { int i; + struct page *p = page; might_sleep(); - for (i = 0; i < sz/PAGE_SIZE; i++) { + for (i = 0; i < sz/PAGE_SIZE; i++, p = mem_map_next(p, page, i)) { cond_resched(); - clear_user_highpage(page + i, addr + i * PAGE_SIZE); + clear_user_highpage(p, addr + i * PAGE_SIZE); } } @@ -370,11 +371,15 @@ static void copy_huge_page(struct page *dst, struct page *src, { int i; struct hstate *h = hstate_vma(vma); + struct page *dst_base = dst; + struct page *src_base = src; might_sleep(); - for (i = 0; i < pages_per_huge_page(h); i++) { + for (i = 0; i < pages_per_huge_page(h); i++, + dst = mem_map_next(dst, dst_base, i), + src = mem_map_next(src, src_base, i)) { cond_resched(); - copy_user_highpage(dst + i, src + i, addr + i*PAGE_SIZE, vma); + copy_user_highpage(dst, src, addr + i*PAGE_SIZE, vma); } } @@ -2103,7 +2108,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, same_page: if (pages) { get_page(page); - pages[i] = page + pfn_offset; + pages[i] = mem_map_offset(page, pfn_offset); } if (vmas) diff --git a/mm/internal.h b/mm/internal.h index 1f43f74..08b8dea 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -53,6 +53,34 @@ static inline unsigned long page_order(struct page *page) } /* + * Return the mem_map entry representing the 'offset' subpage within + * the maximally aligned gigantic page 'base'. Handle any discontiguity + * in the mem_map at MAX_ORDER_NR_PAGES boundaries. + */ +static inline struct page *mem_map_offset(struct page *base, int offset) +{ + if (unlikely(offset >= MAX_ORDER_NR_PAGES)) + return pfn_to_page(page_to_pfn(base) + offset); + return base + offset; +} + +/* + * Iterator over all subpages withing the maximally aligned gigantic + * page 'base'. Handle any discontiguity in the mem_map. + */ +static inline struct page *mem_map_next(struct page *iter, + struct page *base, int offset) +{ + if (unlikely((offset & (MAX_ORDER_NR_PAGES - 1)) == 0)) { + unsigned long pfn = page_to_pfn(base) + offset; + if (!pfn_valid(pfn)) + return NULL; + return pfn_to_page(pfn); + } + return iter + 1; +} + +/* * FLATMEM and DISCONTIGMEM configurations use alloc_bootmem_node, * so all functions starting at paging_init should be marked __init * in those cases. SPARSEMEM, however, allows for memory hotplug, -- 1.6.0.1.451.gc8d31 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 9:33 ` [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER Andy Whitcroft @ 2008-10-08 12:29 ` Nick Piggin 2008-10-13 13:36 ` Andy Whitcroft 2008-10-08 14:57 ` Mel Gorman 2008-10-08 16:17 ` Christoph Lameter 2 siblings, 1 reply; 12+ messages in thread From: Nick Piggin @ 2008-10-08 12:29 UTC (permalink / raw) To: Andy Whitcroft Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman On Wednesday 08 October 2008 20:33, Andy Whitcroft wrote: > When working with hugepages, hugetlbfs assumes that those hugepages > are smaller than MAX_ORDER. Specifically it assumes that the mem_map > is contigious and uses that to optimise access to the elements of the > mem_map that represent the hugepage. Gigantic pages (such as 16GB pages > on powerpc) by definition are of greater order than MAX_ORDER (larger > than MAX_ORDER_NR_PAGES in size). This means that we can no longer make > use of the buddy alloctor guarentees for the contiguity of the mem_map, > which ensures that the mem_map is at least contigious for maximmally > aligned areas of MAX_ORDER_NR_PAGES pages. > > This patch adds new mem_map accessors and iterator helpers which handle > any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these > within copy_huge_page, clear_huge_page, and follow_hugetlb_page to allow > these to handle gigantic pages. > > Signed-off-by: Andy Whitcroft <apw@shadowen.org> Seems good to me... but do you have to add lots of stuff into the end of the for statements? Why not just at the end of the block? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 12:29 ` Nick Piggin @ 2008-10-13 13:36 ` Andy Whitcroft 0 siblings, 0 replies; 12+ messages in thread From: Andy Whitcroft @ 2008-10-13 13:36 UTC (permalink / raw) To: Nick Piggin Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman On Wed, Oct 08, 2008 at 11:29:59PM +1100, Nick Piggin wrote: > On Wednesday 08 October 2008 20:33, Andy Whitcroft wrote: > > When working with hugepages, hugetlbfs assumes that those hugepages > > are smaller than MAX_ORDER. Specifically it assumes that the mem_map > > is contigious and uses that to optimise access to the elements of the > > mem_map that represent the hugepage. Gigantic pages (such as 16GB pages > > on powerpc) by definition are of greater order than MAX_ORDER (larger > > than MAX_ORDER_NR_PAGES in size). This means that we can no longer make > > use of the buddy alloctor guarentees for the contiguity of the mem_map, > > which ensures that the mem_map is at least contigious for maximmally > > aligned areas of MAX_ORDER_NR_PAGES pages. > > > > This patch adds new mem_map accessors and iterator helpers which handle > > any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these > > within copy_huge_page, clear_huge_page, and follow_hugetlb_page to allow > > these to handle gigantic pages. > > > > Signed-off-by: Andy Whitcroft <apw@shadowen.org> > > Seems good to me... but do you have to add lots of stuff into the end of > the for statements? Why not just at the end of the block? Yes there is no particular requirement for it to be there. In the latest discussion patch (in separate email) is has the long ones moved out. -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 9:33 ` [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER Andy Whitcroft 2008-10-08 12:29 ` Nick Piggin @ 2008-10-08 14:57 ` Mel Gorman 2008-10-08 16:17 ` Christoph Lameter 2 siblings, 0 replies; 12+ messages in thread From: Mel Gorman @ 2008-10-08 14:57 UTC (permalink / raw) To: Andy Whitcroft Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Nick Piggin On (08/10/08 10:33), Andy Whitcroft didst pronounce: > When working with hugepages, hugetlbfs assumes that those hugepages > are smaller than MAX_ORDER. Specifically it assumes that the mem_map > is contigious and uses that to optimise access to the elements of the > mem_map that represent the hugepage. Gigantic pages (such as 16GB pages > on powerpc) by definition are of greater order than MAX_ORDER (larger > than MAX_ORDER_NR_PAGES in size). This means that we can no longer make > use of the buddy alloctor guarentees for the contiguity of the mem_map, > which ensures that the mem_map is at least contigious for maximmally > aligned areas of MAX_ORDER_NR_PAGES pages. > > This patch adds new mem_map accessors and iterator helpers which handle > any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these > within copy_huge_page, clear_huge_page, and follow_hugetlb_page to allow > these to handle gigantic pages. > > Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> > --- > mm/hugetlb.c | 15 ++++++++++----- > mm/internal.h | 28 ++++++++++++++++++++++++++++ > 2 files changed, 38 insertions(+), 5 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 67a7119..bb5cf81 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -357,11 +357,12 @@ static void clear_huge_page(struct page *page, > unsigned long addr, unsigned long sz) > { > int i; > + struct page *p = page; > > might_sleep(); > - for (i = 0; i < sz/PAGE_SIZE; i++) { > + for (i = 0; i < sz/PAGE_SIZE; i++, p = mem_map_next(p, page, i)) { > cond_resched(); > - clear_user_highpage(page + i, addr + i * PAGE_SIZE); > + clear_user_highpage(p, addr + i * PAGE_SIZE); > } > } > > @@ -370,11 +371,15 @@ static void copy_huge_page(struct page *dst, struct page *src, > { > int i; > struct hstate *h = hstate_vma(vma); > + struct page *dst_base = dst; > + struct page *src_base = src; > > might_sleep(); > - for (i = 0; i < pages_per_huge_page(h); i++) { > + for (i = 0; i < pages_per_huge_page(h); i++, > + dst = mem_map_next(dst, dst_base, i), > + src = mem_map_next(src, src_base, i)) { > cond_resched(); > - copy_user_highpage(dst + i, src + i, addr + i*PAGE_SIZE, vma); > + copy_user_highpage(dst, src, addr + i*PAGE_SIZE, vma); > } > } > > @@ -2103,7 +2108,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, > same_page: > if (pages) { > get_page(page); > - pages[i] = page + pfn_offset; > + pages[i] = mem_map_offset(page, pfn_offset); > } > > if (vmas) > diff --git a/mm/internal.h b/mm/internal.h > index 1f43f74..08b8dea 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -53,6 +53,34 @@ static inline unsigned long page_order(struct page *page) > } > > /* > + * Return the mem_map entry representing the 'offset' subpage within > + * the maximally aligned gigantic page 'base'. Handle any discontiguity > + * in the mem_map at MAX_ORDER_NR_PAGES boundaries. > + */ > +static inline struct page *mem_map_offset(struct page *base, int offset) > +{ > + if (unlikely(offset >= MAX_ORDER_NR_PAGES)) > + return pfn_to_page(page_to_pfn(base) + offset); > + return base + offset; > +} > + > +/* > + * Iterator over all subpages withing the maximally aligned gigantic > + * page 'base'. Handle any discontiguity in the mem_map. > + */ > +static inline struct page *mem_map_next(struct page *iter, > + struct page *base, int offset) > +{ > + if (unlikely((offset & (MAX_ORDER_NR_PAGES - 1)) == 0)) { > + unsigned long pfn = page_to_pfn(base) + offset; > + if (!pfn_valid(pfn)) > + return NULL; > + return pfn_to_page(pfn); > + } > + return iter + 1; > +} > + > +/* > * FLATMEM and DISCONTIGMEM configurations use alloc_bootmem_node, > * so all functions starting at paging_init should be marked __init > * in those cases. SPARSEMEM, however, allows for memory hotplug, > -- > 1.6.0.1.451.gc8d31 > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 9:33 ` [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER Andy Whitcroft 2008-10-08 12:29 ` Nick Piggin 2008-10-08 14:57 ` Mel Gorman @ 2008-10-08 16:17 ` Christoph Lameter 2008-10-08 17:36 ` Andi Kleen 2008-10-08 18:55 ` Andy Whitcroft 2 siblings, 2 replies; 12+ messages in thread From: Christoph Lameter @ 2008-10-08 16:17 UTC (permalink / raw) To: Andy Whitcroft Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin Andy Whitcroft wrote: > When working with hugepages, hugetlbfs assumes that those hugepages > are smaller than MAX_ORDER. Specifically it assumes that the mem_map > is contigious and uses that to optimise access to the elements of the > mem_map that represent the hugepage. Gigantic pages (such as 16GB pages > on powerpc) by definition are of greater order than MAX_ORDER (larger > than MAX_ORDER_NR_PAGES in size). This means that we can no longer make > use of the buddy alloctor guarentees for the contiguity of the mem_map, > which ensures that the mem_map is at least contigious for maximmally > aligned areas of MAX_ORDER_NR_PAGES pages. But the memmap is contiguous in most cases. FLATMEM, VMEMMAP etc. Its only some special sparsemem configurations that couldhave the issue because they break up the vmemmap. x86_64 uses VMEMMAP by default. Is this for i386? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 16:17 ` Christoph Lameter @ 2008-10-08 17:36 ` Andi Kleen 2008-10-08 18:55 ` Andy Whitcroft 1 sibling, 0 replies; 12+ messages in thread From: Andi Kleen @ 2008-10-08 17:36 UTC (permalink / raw) To: Christoph Lameter Cc: Andy Whitcroft, Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin Christoph Lameter <cl@linux-foundation.org> writes: > > But the memmap is contiguous in most cases. FLATMEM, VMEMMAP etc. Its only > some special sparsemem configurations that couldhave the issue because they > break up the vmemmap. x86_64 uses VMEMMAP by default. Is this for i386? i386 doesn't support huge pages > MAX_ORDER. I guess it's for ppc64, but they should probably just use vmemmap there if they don't already. -Andi -- ak@linux.intel.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 16:17 ` Christoph Lameter 2008-10-08 17:36 ` Andi Kleen @ 2008-10-08 18:55 ` Andy Whitcroft 2008-10-08 19:35 ` Christoph Lameter 1 sibling, 1 reply; 12+ messages in thread From: Andy Whitcroft @ 2008-10-08 18:55 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin On Wed, Oct 08, 2008 at 11:17:59AM -0500, Christoph Lameter wrote: > Andy Whitcroft wrote: > > When working with hugepages, hugetlbfs assumes that those hugepages > > are smaller than MAX_ORDER. Specifically it assumes that the mem_map > > is contigious and uses that to optimise access to the elements of the > > mem_map that represent the hugepage. Gigantic pages (such as 16GB pages > > on powerpc) by definition are of greater order than MAX_ORDER (larger > > than MAX_ORDER_NR_PAGES in size). This means that we can no longer make > > use of the buddy alloctor guarentees for the contiguity of the mem_map, > > which ensures that the mem_map is at least contigious for maximmally > > aligned areas of MAX_ORDER_NR_PAGES pages. > > But the memmap is contiguous in most cases. FLATMEM, VMEMMAP etc. Its only > some special sparsemem configurations that couldhave the issue because they > break up the vmemmap. x86_64 uses VMEMMAP by default. Is this for i386? With SPARSEMEM turned on and VMEMMAP turned off a valid combination, we will end up scribbling all over memory which is pretty serious so for that reason we should handle this case. There are cirtain combinations of features which require SPARSMEM but preclude VMEMMAP which trigger this. -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 18:55 ` Andy Whitcroft @ 2008-10-08 19:35 ` Christoph Lameter 2008-10-13 13:34 ` Andy Whitcroft 0 siblings, 1 reply; 12+ messages in thread From: Christoph Lameter @ 2008-10-08 19:35 UTC (permalink / raw) To: Andy Whitcroft Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin Andy Whitcroft wrote: > With SPARSEMEM turned on and VMEMMAP turned off a valid combination, > we will end up scribbling all over memory which is pretty serious so for > that reason we should handle this case. There are cirtain combinations > of features which require SPARSMEM but preclude VMEMMAP which trigger this. Which configurations are we talking about? 64 bit configs may generally be able to use VMEMMAP since they have lots of virtual address space. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-08 19:35 ` Christoph Lameter @ 2008-10-13 13:34 ` Andy Whitcroft 2008-10-13 16:04 ` Christoph Lameter 0 siblings, 1 reply; 12+ messages in thread From: Andy Whitcroft @ 2008-10-13 13:34 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin On Wed, Oct 08, 2008 at 02:35:04PM -0500, Christoph Lameter wrote: > Andy Whitcroft wrote: > > > With SPARSEMEM turned on and VMEMMAP turned off a valid combination, > > we will end up scribbling all over memory which is pretty serious so for > > that reason we should handle this case. There are cirtain combinations > > of features which require SPARSMEM but preclude VMEMMAP which trigger this. > > Which configurations are we talking about? 64 bit configs may generally be > able to use VMEMMAP since they have lots of virtual address space. Currently memory hot remove is not supported with VMEMMAP. Obviously that should be fixed overall and I am assuming it will. But the fact remains that the buddy guarentee is that the mem_map is contigious out to MAX_ORDER-1 order pages only beyond that we may not assume contiguity. This code is broken under the guarentees that are set out by buddy. Yes it is true that we do only have one memory model combination currently where a greater guarentee of contigious within a node is violated, but right now this code violates the current guarentees. I assume the objection here is the injection of the additional branch into these loops. The later rejig patch removes this for the non-giant cases for the non-huge use cases. Are we worried about these same branches in the huge cases? If so we could make this support dependant on a new configuration option, or perhaps only have two loop chosen based on the order of the page. Something like the patch below? This patch is not tested as yet, but if this form is acceptable we can get the pair of patches (this plus the prep compound update) tested together and I can repost them once that is done. This against 2.6.27. -apw Author: Andy Whitcroft <apw@shadowen.org> Date: Mon Oct 13 14:28:44 2008 +0100 hugetlbfs: handle pages higher order than MAX_ORDER When working with hugepages, hugetlbfs assumes that those hugepages are smaller than MAX_ORDER. Specifically it assumes that the mem_map is contigious and uses that to optimise access to the elements of the mem_map that represent the hugepage. Gigantic pages (such as 16GB pages on powerpc) by definition are of greater order than MAX_ORDER (larger than MAX_ORDER_NR_PAGES in size). This means that we can no longer make use of the buddy alloctor guarentees for the contiguity of the mem_map, which ensures that the mem_map is at least contigious for maximmally aligned areas of MAX_ORDER_NR_PAGES pages. This patch adds new mem_map accessors and iterator helpers which handle any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these to implement gigantic page versions of copy_huge_page and clear_huge_page, and to allow follow_hugetlb_page handle gigantic pages. Signed-off-by: Andy Whitcroft <apw@shadowen.org> diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67a7119..793f52e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -353,11 +353,26 @@ static int vma_has_reserves(struct vm_area_struct *vma) return 0; } +static void clear_gigantic_page(struct page *page, + unsigned long addr, unsigned long sz) +{ + int i; + struct page *p = page; + + might_sleep(); + for (i = 0; i < sz/PAGE_SIZE; i++, p = mem_map_next(p, page, i)) { + cond_resched(); + clear_user_highpage(p, addr + i * PAGE_SIZE); + } +} static void clear_huge_page(struct page *page, unsigned long addr, unsigned long sz) { int i; + if (unlikely(sz > MAX_ORDER_NR_PAGES)) + return clear_gigantic_page(page, addr, sz); + might_sleep(); for (i = 0; i < sz/PAGE_SIZE; i++) { cond_resched(); @@ -365,12 +380,32 @@ static void clear_huge_page(struct page *page, } } +static void copy_gigantic_page(struct page *dst, struct page *src, + unsigned long addr, struct vm_area_struct *vma) +{ + int i; + struct hstate *h = hstate_vma(vma); + struct page *dst_base = dst; + struct page *src_base = src; + might_sleep(); + for (i = 0; i < pages_per_huge_page(h); ) { + cond_resched(); + copy_user_highpage(dst, src, addr + i*PAGE_SIZE, vma); + + i++; + dst = mem_map_next(dst, dst_base, i); + src = mem_map_next(src, src_base, i); + } +} static void copy_huge_page(struct page *dst, struct page *src, unsigned long addr, struct vm_area_struct *vma) { int i; struct hstate *h = hstate_vma(vma); + if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) + return copy_gigantic_page(dst, src, addr, vma); + might_sleep(); for (i = 0; i < pages_per_huge_page(h); i++) { cond_resched(); @@ -2103,7 +2138,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, same_page: if (pages) { get_page(page); - pages[i] = page + pfn_offset; + pages[i] = mem_map_offset(page, pfn_offset); } if (vmas) diff --git a/mm/internal.h b/mm/internal.h index 1f43f74..08b8dea 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -53,6 +53,34 @@ static inline unsigned long page_order(struct page *page) } /* + * Return the mem_map entry representing the 'offset' subpage within + * the maximally aligned gigantic page 'base'. Handle any discontiguity + * in the mem_map at MAX_ORDER_NR_PAGES boundaries. + */ +static inline struct page *mem_map_offset(struct page *base, int offset) +{ + if (unlikely(offset >= MAX_ORDER_NR_PAGES)) + return pfn_to_page(page_to_pfn(base) + offset); + return base + offset; +} + +/* + * Iterator over all subpages withing the maximally aligned gigantic + * page 'base'. Handle any discontiguity in the mem_map. + */ +static inline struct page *mem_map_next(struct page *iter, + struct page *base, int offset) +{ + if (unlikely((offset & (MAX_ORDER_NR_PAGES - 1)) == 0)) { + unsigned long pfn = page_to_pfn(base) + offset; + if (!pfn_valid(pfn)) + return NULL; + return pfn_to_page(pfn); + } + return iter + 1; +} + +/* * FLATMEM and DISCONTIGMEM configurations use alloc_bootmem_node, * so all functions starting at paging_init should be marked __init * in those cases. SPARSEMEM, however, allows for memory hotplug, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-13 13:34 ` Andy Whitcroft @ 2008-10-13 16:04 ` Christoph Lameter 2008-10-14 7:00 ` Andy Whitcroft 0 siblings, 1 reply; 12+ messages in thread From: Christoph Lameter @ 2008-10-13 16:04 UTC (permalink / raw) To: Andy Whitcroft Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin Andy Whitcroft wrote: > Currently memory hot remove is not supported with VMEMMAP. Obviously > that should be fixed overall and I am assuming it will. But the fact > remains that the buddy guarentee is that the mem_map is contigious out > to MAX_ORDER-1 order pages only beyond that we may not assume > contiguity. This code is broken under the guarentees that are set out > by buddy. Yes it is true that we do only have one memory model combination > currently where a greater guarentee of contigious within a node is > violated, but right now this code violates the current guarentees. > > I assume the objection here is the injection of the additional branch > into these loops. The later rejig patch removes this for the non-giant > cases for the non-huge use cases. Are we worried about these same > branches in the huge cases? If so we could make this support dependant > on a new configuration option, or perhaps only have two loop chosen > based on the order of the page. > I think we are worried about these additional checks spreading further because there may be assumptions of contiguity elsewhere (in particular when new code is added) since the traditional nature of the memmap is to be linear and not spread out over memory. A fix for this particular situation may be as simple as making gigantic pages depend on SPARSE_VMEMMAP? For x86_64 this is certainly sufficient. > Something like the patch below? This patch is not tested as yet, but if > this form is acceptable we can get the pair of patches (this plus the > prep compound update) tested together and I can repost them once that is > done. This against 2.6.27. > What is the difference here to the earlier versions? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER 2008-10-13 16:04 ` Christoph Lameter @ 2008-10-14 7:00 ` Andy Whitcroft 0 siblings, 0 replies; 12+ messages in thread From: Andy Whitcroft @ 2008-10-14 7:00 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, linux-mm, linux-kernel, Jon Tollefson, Mel Gorman, Nick Piggin On Mon, Oct 13, 2008 at 09:04:32AM -0700, Christoph Lameter wrote: > Andy Whitcroft wrote: >> Currently memory hot remove is not supported with VMEMMAP. Obviously >> that should be fixed overall and I am assuming it will. But the fact >> remains that the buddy guarentee is that the mem_map is contigious out >> to MAX_ORDER-1 order pages only beyond that we may not assume >> contiguity. This code is broken under the guarentees that are set out >> by buddy. Yes it is true that we do only have one memory model combination >> currently where a greater guarentee of contigious within a node is >> violated, but right now this code violates the current guarentees. >> I assume the objection here is the injection of the additional branch >> into these loops. The later rejig patch removes this for the non-giant >> cases for the non-huge use cases. Are we worried about these same >> branches in the huge cases? If so we could make this support dependant >> on a new configuration option, or perhaps only have two loop chosen >> based on the order of the page. >> > I think we are worried about these additional checks spreading further > because there may be assumptions of contiguity elsewhere (in particular > when new code is added) since the traditional nature of the memmap is to > be linear and not spread out over memory. Yes, but it is guarenteed to be contigious in all models out to order MAX_ORDER-1, and only gigantic pages are larger than this. We already have to cope with discontiguity at the MAX_ORDER boundaries in paths which scan over the mem_map in more general terms as SPARSEMEM introduced that long long ago, and only gained a contigious mode when we added your VMEMMAP mode to that. I thought that the approach recommended by Nick, which led to the other patch in this series which pulled out compound page preparation to a specific gigantic initialiser, helped a lot with this worry as it removed any change from the regular case and helped limit gigantic page support to hugetlb only. The only reason that initialiser was placed with the normal form was to ensure they were maintained together. Would it help if I posted these two together, or perhaps even merged as a single patch? > A fix for this particular situation may be as simple as making gigantic > pages depend on SPARSE_VMEMMAP? For x86_64 this is certainly sufficient. Well that is only true if it doesn't support memory hotplug. >> Something like the patch below? This patch is not tested as yet, but if >> this form is acceptable we can get the pair of patches (this plus the >> prep compound update) tested together and I can repost them once that is >> done. This against 2.6.27. >> > What is the difference here to the earlier versions? This was a move to following the model I felt Nick preferred in the prep_compound_page where the gigantic support is pulled out of line and made very explicit. Minimising the normal case impacts. Which I felt was part of the objections to these changes. The plan here is to only fix up gigantic pages within the context of hugetlbfs. -apw -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-10-14 7:00 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-10-08 9:33 [PATCH 0/1] gigantic compound pages part 2 Andy Whitcroft 2008-10-08 9:33 ` [PATCH 1/1] hugetlbfs: handle pages higher order than MAX_ORDER Andy Whitcroft 2008-10-08 12:29 ` Nick Piggin 2008-10-13 13:36 ` Andy Whitcroft 2008-10-08 14:57 ` Mel Gorman 2008-10-08 16:17 ` Christoph Lameter 2008-10-08 17:36 ` Andi Kleen 2008-10-08 18:55 ` Andy Whitcroft 2008-10-08 19:35 ` Christoph Lameter 2008-10-13 13:34 ` Andy Whitcroft 2008-10-13 16:04 ` Christoph Lameter 2008-10-14 7:00 ` Andy Whitcroft
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox