* [hugepage] Fix unmap_and_free_vma backout path @ 2006-11-13 5:13 'David Gibson' 2006-11-13 5:29 ` Christoph Lameter 0 siblings, 1 reply; 14+ messages in thread From: 'David Gibson' @ 2006-11-13 5:13 UTC (permalink / raw) To: Andrew Morton Cc: Chen, Kenneth W, 'Christoph Lameter', Hugh Dickins, bill.irwin, Adam Litke, linux-mm Andrew, please apply: If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example, because the given file offset is not hugepage aligned - then do_mmap_pgoff will go to the unmap_and_free_vma backout path. But at this stage the vma hasn't been marked as hugepage, and the backout path will call unmap_region() on it. That will eventually call down to the non-hugepage version of unmap_page_range(). On ppc64, at least, that will cause serious problems if there are any existing hugepage pagetable entries in the vicinity - for example if there are any other hugepage mappings under the same PUD. unmap_page_range() will trigger a bad_pud() on the hugepage pud entries. I suspect this will also cause bad problems on ia64, though I don't have a machine to test it on. This patch addresses the problem by having hugetlbfs_file_mmap() mark the vma as hugepage before it does anything else, thus ensuring we use the right path for any subsequent backout. This may not be all we want. Even with this patch, performing such a failing map on to of an existing mapping will clobber (unmap) that pre-existing mapping. This is in contrast to the analogous situation with normal page mappings - mapping on top with a misaligned offset will fail early enough not to clobber the pre-existing mapping. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Index: working-2.6/fs/hugetlbfs/inode.c =================================================================== --- working-2.6.orig/fs/hugetlbfs/inode.c 2006-11-13 15:49:14.000000000 +1100 +++ working-2.6/fs/hugetlbfs/inode.c 2006-11-13 15:49:29.000000000 +1100 @@ -62,6 +62,9 @@ static int hugetlbfs_file_mmap(struct fi loff_t len, vma_len; int ret; + vma->vm_flags |= VM_HUGETLB | VM_RESERVED; + vma->vm_ops = &hugetlb_vm_ops; + if (vma->vm_pgoff & (HPAGE_SIZE / PAGE_SIZE - 1)) return -EINVAL; @@ -78,8 +81,6 @@ static int hugetlbfs_file_mmap(struct fi mutex_lock(&inode->i_mutex); file_accessed(file); - vma->vm_flags |= VM_HUGETLB | VM_RESERVED; - vma->vm_ops = &hugetlb_vm_ops; ret = -ENOMEM; len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 5:13 [hugepage] Fix unmap_and_free_vma backout path 'David Gibson' @ 2006-11-13 5:29 ` Christoph Lameter 2006-11-13 5:57 ` 'David Gibson' 0 siblings, 1 reply; 14+ messages in thread From: Christoph Lameter @ 2006-11-13 5:29 UTC (permalink / raw) To: 'David Gibson' Cc: Andrew Morton, Chen, Kenneth W, Hugh Dickins, bill.irwin, Adam Litke, linux-mm On Mon, 13 Nov 2006, 'David Gibson' wrote: > This may not be all we want. Even with this patch, performing such a > failing map on to of an existing mapping will clobber (unmap) that > pre-existing mapping. This is in contrast to the analogous situation > with normal page mappings - mapping on top with a misaligned offset > will fail early enough not to clobber the pre-existing mapping. Then it is best to check the huge page alignment at the same place as regular alignment. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 5:29 ` Christoph Lameter @ 2006-11-13 5:57 ` 'David Gibson' 2006-11-13 6:03 ` Chen, Kenneth W 0 siblings, 1 reply; 14+ messages in thread From: 'David Gibson' @ 2006-11-13 5:57 UTC (permalink / raw) To: Christoph Lameter Cc: Andrew Morton, Chen, Kenneth W, Hugh Dickins, bill.irwin, Adam Litke, linux-mm On Sun, Nov 12, 2006 at 09:29:48PM -0800, Christoph Lameter wrote: > On Mon, 13 Nov 2006, 'David Gibson' wrote: > > > This may not be all we want. Even with this patch, performing such a > > failing map on to of an existing mapping will clobber (unmap) that > > pre-existing mapping. This is in contrast to the analogous situation > > with normal page mappings - mapping on top with a misaligned offset > > will fail early enough not to clobber the pre-existing mapping. > > Then it is best to check the huge page alignment at the > same place as regular alignment. Probably, yes, although it's yet another "if (hugepage) specialcase()". But I still think we want the above patch as well. It will make sure we correctly back out from any other possible failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or which get added later. As far as I can tell, there is in general no guarantee that a failing MAP_FIXED() mmap() *won't* clobber what was there before. I believe there are (admittedly rare) possible late failure cases in pure normalpage paths which will result in a failed mmap() after clobbering the prior mapping. Any failure from the filesystem or device's f_ops->mmap callback will do this, for example. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 5:57 ` 'David Gibson' @ 2006-11-13 6:03 ` Chen, Kenneth W 2006-11-13 6:22 ` 'David Gibson' 0 siblings, 1 reply; 14+ messages in thread From: Chen, Kenneth W @ 2006-11-13 6:03 UTC (permalink / raw) To: 'David Gibson', Christoph Lameter Cc: Andrew Morton, Hugh Dickins, bill.irwin, Adam Litke, linux-mm David Gibson wrote on Sunday, November 12, 2006 9:57 PM > On Sun, Nov 12, 2006 at 09:29:48PM -0800, Christoph Lameter wrote: > > On Mon, 13 Nov 2006, 'David Gibson' wrote: > > > > > This may not be all we want. Even with this patch, performing such a > > > failing map on to of an existing mapping will clobber (unmap) that > > > pre-existing mapping. This is in contrast to the analogous situation > > > with normal page mappings - mapping on top with a misaligned offset > > > will fail early enough not to clobber the pre-existing mapping. > > > > Then it is best to check the huge page alignment at the > > same place as regular alignment. > > Probably, yes, although it's yet another "if (hugepage) > specialcase()". But I still think we want the above patch as well. > It will make sure we correctly back out from any other possible > failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or > which get added later. Something like this? I haven't tested it yet. But looks plausible because we already have if is_file_hugepages() in the generic path. --- ./mm/mmap.c.orig 2006-11-12 22:43:10.000000000 -0800 +++ ./mm/mmap.c 2006-11-12 22:56:23.000000000 -0800 @@ -1375,11 +1375,14 @@ get_unmapped_area(struct file *file, uns if (addr & ~PAGE_MASK) return -EINVAL; if (file && is_file_hugepages(file)) { + ret = 0; /* * Check if the given range is hugepage aligned, and * can be made suitable for hugepages. */ - ret = prepare_hugepage_range(addr, len); + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT) || + prepare_hugepage_range(addr, len)) + ret = -EINVAL; } else { /* * Ensure that a normal request is not falling in a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 6:03 ` Chen, Kenneth W @ 2006-11-13 6:22 ` 'David Gibson' 2006-11-13 7:35 ` Chen, Kenneth W 0 siblings, 1 reply; 14+ messages in thread From: 'David Gibson' @ 2006-11-13 6:22 UTC (permalink / raw) To: Chen, Kenneth W Cc: Christoph Lameter, Andrew Morton, Hugh Dickins, bill.irwin, Adam Litke, linux-mm On Sun, Nov 12, 2006 at 10:03:31PM -0800, Chen, Kenneth W wrote: > David Gibson wrote on Sunday, November 12, 2006 9:57 PM > > On Sun, Nov 12, 2006 at 09:29:48PM -0800, Christoph Lameter wrote: > > > On Mon, 13 Nov 2006, 'David Gibson' wrote: > > > > > > > This may not be all we want. Even with this patch, performing such a > > > > failing map on to of an existing mapping will clobber (unmap) that > > > > pre-existing mapping. This is in contrast to the analogous situation > > > > with normal page mappings - mapping on top with a misaligned offset > > > > will fail early enough not to clobber the pre-existing mapping. > > > > > > Then it is best to check the huge page alignment at the > > > same place as regular alignment. > > > > Probably, yes, although it's yet another "if (hugepage) > > specialcase()". But I still think we want the above patch as well. > > It will make sure we correctly back out from any other possible > > failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or > > which get added later. > > > Something like this? I haven't tested it yet. But looks plausible > because we already have if is_file_hugepages() in the generic path. Um.. if you're going to test pgoff here, you should also test the address. Oh, and that point is too late to catch MAP_FIXED mappings. I was thinking more of testing it in the same place we test the page alignment for normal mappings. Except that's in arch code. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 6:22 ` 'David Gibson' @ 2006-11-13 7:35 ` Chen, Kenneth W 2006-11-13 17:00 ` Hugh Dickins 2006-11-14 0:19 ` 'David Gibson' 0 siblings, 2 replies; 14+ messages in thread From: Chen, Kenneth W @ 2006-11-13 7:35 UTC (permalink / raw) To: 'David Gibson' Cc: 'Christoph Lameter', 'Andrew Morton', 'Hugh Dickins', bill.irwin, 'Adam Litke', linux-mm David Gibson wrote on Sunday, November 12, 2006 10:23 PM > > > Probably, yes, although it's yet another "if (hugepage) > > > specialcase()". But I still think we want the above patch as well. > > > It will make sure we correctly back out from any other possible > > > failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or > > > which get added later. > > > > > > Something like this? I haven't tested it yet. But looks plausible > > because we already have if is_file_hugepages() in the generic path. > > Um.. if you're going to test pgoff here, you should also test the > address. prepare_hugepage_range() should catch misaligned memory address, right? What more does get_unmapped_area() need to test? > Oh, and that point is too late to catch MAP_FIXED mappings. I don't understand what you mean by that. In do_mmap_pgoff(), very early in the code it tries to get an valid virtual address: addr = get_unmapped_area(file, addr, len, pgoff, flags); if (addr & ~PAGE_MASK) return addr; We don't even have a vma at this point, there is no error to recover. If get_unmapped_area() tests the validity of pgoff and return an error code, the immediate two lines of code will catch that and everything stops there. I don't see where the unmap gets called here. Did I miss something? - Ken -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 7:35 ` Chen, Kenneth W @ 2006-11-13 17:00 ` Hugh Dickins 2006-11-13 17:38 ` Chen, Kenneth W 2006-11-13 20:34 ` Adam Litke 2006-11-14 0:19 ` 'David Gibson' 1 sibling, 2 replies; 14+ messages in thread From: Hugh Dickins @ 2006-11-13 17:00 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'David Gibson', 'Christoph Lameter', 'Andrew Morton', bill.irwin, 'Adam Litke', linux-mm On Sun, 12 Nov 2006, Chen, Kenneth W wrote: > David Gibson wrote on Sunday, November 12, 2006 10:23 PM > > > > > > Something like this? I haven't tested it yet. But looks plausible > > > because we already have if is_file_hugepages() in the generic path. > > > > Um.. if you're going to test pgoff here, you should also test the > > address. > > prepare_hugepage_range() should catch misaligned memory address, right? > What more does get_unmapped_area() need to test? David made that remark, I see now, because PowerPC alone omits to check address and length alignment in its prepare_hugepage_range: it should be checking those as ia64 and generic do. > > > Oh, and that point is too late to catch MAP_FIXED mappings. > > I don't understand what you mean by that. > In do_mmap_pgoff(), very early in the code it tries to get an valid > virtual address: > > addr = get_unmapped_area(file, addr, len, pgoff, flags); > if (addr & ~PAGE_MASK) > return addr; > > We don't even have a vma at this point, there is no error to recover. > If get_unmapped_area() tests the validity of pgoff and return an error > code, the immediate two lines of code will catch that and everything > stops there. I don't see where the unmap gets called here. Did I > miss something? I agree with Ken on that. I agree with just about everything said by people so far. But I think the check looks nicer tucked away with the other alignment checks in prepare_hugepage_range: how about this version? (Perhaps, in another mood, I've have chosen BUG_ONs instead of just deleting all the redundant tests - another deleted in the ppc case.) [PATCH] hugetlb: prepare_hugepage_range check offset too prepare_hugepage_range should check file offset alignment when it checks virtual address and length, to stop MAP_FIXED with a bad huge offset from unmapping before it fails further down. PowerPC should apply the same prepare_hugepage_range alignment checks as ia64 and all the others do. Then none of the alignment checks in hugetlbfs_file_mmap are required (nor is the check for too small a mapping); but even so, move up setting of VM_HUGETLB and add a comment to warn of what David Gibson discovered - if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region when unwinding from error will go the non-huge way, which may cause bad behaviour on architectures (powerpc and ia64) which segregate their huge mappings into a separate region of the address space. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- arch/ia64/mm/hugetlbpage.c | 4 +++- arch/powerpc/mm/hugetlbpage.c | 8 ++++++-- fs/hugetlbfs/inode.c | 21 ++++++++------------- include/linux/hugetlb.h | 10 +++++++--- mm/mmap.c | 2 +- 5 files changed, 25 insertions(+), 20 deletions(-) --- 2.6.19-rc5/arch/ia64/mm/hugetlbpage.c 2006-09-20 04:42:06.000000000 +0100 +++ linux/arch/ia64/mm/hugetlbpage.c 2006-11-13 15:37:04.000000000 +0000 @@ -70,8 +70,10 @@ huge_pte_offset (struct mm_struct *mm, u * Don't actually need to do any preparation, but need to make sure * the address is in the right region. */ -int prepare_hugepage_range(unsigned long addr, unsigned long len) +int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff) { + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) + return -EINVAL; if (len & ~HPAGE_MASK) return -EINVAL; if (addr & ~HPAGE_MASK) --- 2.6.19-rc5/arch/powerpc/mm/hugetlbpage.c 2006-11-08 08:30:56.000000000 +0000 +++ linux/arch/powerpc/mm/hugetlbpage.c 2006-11-13 15:37:04.000000000 +0000 @@ -491,11 +491,15 @@ static int open_high_hpage_areas(struct return 0; } -int prepare_hugepage_range(unsigned long addr, unsigned long len) +int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff) { int err = 0; - if ( (addr+len) < addr ) + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) + return -EINVAL; + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) return -EINVAL; if (addr < 0x100000000UL) --- 2.6.19-rc5/fs/hugetlbfs/inode.c 2006-11-08 08:31:14.000000000 +0000 +++ linux/fs/hugetlbfs/inode.c 2006-11-13 15:37:04.000000000 +0000 @@ -62,24 +62,19 @@ static int hugetlbfs_file_mmap(struct fi loff_t len, vma_len; int ret; - if (vma->vm_pgoff & (HPAGE_SIZE / PAGE_SIZE - 1)) - return -EINVAL; - - if (vma->vm_start & ~HPAGE_MASK) - return -EINVAL; - - if (vma->vm_end & ~HPAGE_MASK) - return -EINVAL; - - if (vma->vm_end - vma->vm_start < HPAGE_SIZE) - return -EINVAL; + /* + * vma alignment has already been checked by prepare_hugepage_range. + * If you add any error returns here, do so after setting VM_HUGETLB, + * so is_vm_huge_tlb_page tests below unmap_region go the right way + * when do_mmap_pgoff unwinds (may be important on powerpc and ia64). + */ + vma->vm_flags |= VM_HUGETLB | VM_RESERVED; + vma->vm_ops = &hugetlb_vm_ops; vma_len = (loff_t)(vma->vm_end - vma->vm_start); mutex_lock(&inode->i_mutex); file_accessed(file); - vma->vm_flags |= VM_HUGETLB | VM_RESERVED; - vma->vm_ops = &hugetlb_vm_ops; ret = -ENOMEM; len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); --- 2.6.19-rc5/include/linux/hugetlb.h 2006-11-08 08:31:21.000000000 +0000 +++ linux/include/linux/hugetlb.h 2006-11-13 15:37:04.000000000 +0000 @@ -60,8 +60,11 @@ void hugetlb_free_pgd_range(struct mmu_g * If the arch doesn't supply something else, assume that hugepage * size aligned regions are ok without further preparation. */ -static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) +static inline int prepare_hugepage_range(unsigned long addr, unsigned long len, + pgoff_t pgoff) { + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) + return -EINVAL; if (len & ~HPAGE_MASK) return -EINVAL; if (addr & ~HPAGE_MASK) @@ -69,7 +72,8 @@ static inline int prepare_hugepage_range return 0; } #else -int prepare_hugepage_range(unsigned long addr, unsigned long len); +int prepare_hugepage_range(unsigned long addr, unsigned long len, + pgoff_t pgoff); #endif #ifndef ARCH_HAS_SETCLEAR_HUGE_PTE @@ -107,7 +111,7 @@ static inline unsigned long hugetlb_tota #define hugetlb_report_meminfo(buf) 0 #define hugetlb_report_node_meminfo(n, buf) 0 #define follow_huge_pmd(mm, addr, pmd, write) NULL -#define prepare_hugepage_range(addr, len) (-EINVAL) +#define prepare_hugepage_range(addr,len,pgoff) (-EINVAL) #define pmd_huge(x) 0 #define is_hugepage_only_range(mm, addr, len) 0 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; }) --- 2.6.19-rc5/mm/mmap.c 2006-11-08 08:31:23.000000000 +0000 +++ linux/mm/mmap.c 2006-11-13 15:37:04.000000000 +0000 @@ -1379,7 +1379,7 @@ get_unmapped_area(struct file *file, uns * Check if the given range is hugepage aligned, and * can be made suitable for hugepages. */ - ret = prepare_hugepage_range(addr, len); + ret = prepare_hugepage_range(addr, len, pgoff); } else { /* * Ensure that a normal request is not falling in a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 17:00 ` Hugh Dickins @ 2006-11-13 17:38 ` Chen, Kenneth W 2006-11-13 20:34 ` Adam Litke 1 sibling, 0 replies; 14+ messages in thread From: Chen, Kenneth W @ 2006-11-13 17:38 UTC (permalink / raw) To: 'Hugh Dickins' Cc: 'David Gibson', 'Christoph Lameter', 'Andrew Morton', bill.irwin, 'Adam Litke', linux-mm Hugh Dickins wrote on Monday, November 13, 2006 9:00 AM > On Sun, 12 Nov 2006, Chen, Kenneth W wrote: > > David Gibson wrote on Sunday, November 12, 2006 10:23 PM > > > > > > > > Something like this? I haven't tested it yet. But looks plausible > > > > because we already have if is_file_hugepages() in the generic path. > > > > > > Um.. if you're going to test pgoff here, you should also test the > > > address. > > > > prepare_hugepage_range() should catch misaligned memory address, right? > > What more does get_unmapped_area() need to test? > > David made that remark, I see now, because PowerPC alone omits to > check address and length alignment in its prepare_hugepage_range: > it should be checking those as ia64 and generic do. Ah, I see. I only looked at the generic and ia64 version of prepare_hugepage_range() and wasn't aware that powerpc doesn't check address alignment in there :-P > > > Oh, and that point is too late to catch MAP_FIXED mappings. > > > > I don't understand what you mean by that. > > In do_mmap_pgoff(), very early in the code it tries to get an valid > > virtual address: > > > > addr = get_unmapped_area(file, addr, len, pgoff, flags); > > if (addr & ~PAGE_MASK) > > return addr; > > > > We don't even have a vma at this point, there is no error to recover. > > If get_unmapped_area() tests the validity of pgoff and return an error > > code, the immediate two lines of code will catch that and everything > > stops there. I don't see where the unmap gets called here. Did I > > miss something? > > I agree with Ken on that. I agree with just about everything said by > people so far. But I think the check looks nicer tucked away with the > other alignment checks in prepare_hugepage_range: how about this version? Looks nice with all the extra clean up. > (Perhaps, in another mood, I've have chosen BUG_ONs instead of just > deleting all the redundant tests - another deleted in the ppc case.) > > > [PATCH] hugetlb: prepare_hugepage_range check offset too > > prepare_hugepage_range should check file offset alignment when it checks > virtual address and length, to stop MAP_FIXED with a bad huge offset from > unmapping before it fails further down. PowerPC should apply the same > prepare_hugepage_range alignment checks as ia64 and all the others do. > > Then none of the alignment checks in hugetlbfs_file_mmap are required > (nor is the check for too small a mapping); but even so, move up setting > of VM_HUGETLB and add a comment to warn of what David Gibson discovered - > if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region > when unwinding from error will go the non-huge way, which may cause bad > behaviour on architectures (powerpc and ia64) which segregate their huge > mappings into a separate region of the address space. > > Signed-off-by: Hugh Dickins <hugh@veritas.com> > --- > > arch/ia64/mm/hugetlbpage.c | 4 +++- > arch/powerpc/mm/hugetlbpage.c | 8 ++++++-- > fs/hugetlbfs/inode.c | 21 ++++++++------------- > include/linux/hugetlb.h | 10 +++++++--- > mm/mmap.c | 2 +- > 5 files changed, 25 insertions(+), 20 deletions(-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 17:00 ` Hugh Dickins 2006-11-13 17:38 ` Chen, Kenneth W @ 2006-11-13 20:34 ` Adam Litke 2006-11-13 20:41 ` Hugh Dickins 1 sibling, 1 reply; 14+ messages in thread From: Adam Litke @ 2006-11-13 20:34 UTC (permalink / raw) To: Hugh Dickins Cc: Chen, Kenneth W, 'David Gibson', 'Christoph Lameter', 'Andrew Morton', bill.irwin, linux-mm Looks good to me, notwithstanding the nano-nit below. On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote: > + /* > + * vma alignment has already been checked by prepare_hugepage_range. > + * If you add any error returns here, do so after setting VM_HUGETLB, > + * so is_vm_huge_tlb_page tests below unmap_region go the right way > + * when do_mmap_pgoff unwinds (may be important on powerpc and ia64). > + */ Sorry. This is hardly worth it, but the function referred to by this comment is actually is_vm_hugetlb_page() :-/ -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 20:34 ` Adam Litke @ 2006-11-13 20:41 ` Hugh Dickins 2006-11-13 22:07 ` Adam Litke ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Hugh Dickins @ 2006-11-13 20:41 UTC (permalink / raw) To: Adam Litke Cc: Chen, Kenneth W, 'David Gibson', 'Christoph Lameter', 'Andrew Morton', bill.irwin, linux-mm On Mon, 13 Nov 2006, Adam Litke wrote: > Looks good to me, notwithstanding the nano-nit below. > > On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote: > > + /* > > + * vma alignment has already been checked by prepare_hugepage_range. > > + * If you add any error returns here, do so after setting VM_HUGETLB, > > + * so is_vm_huge_tlb_page tests below unmap_region go the right way > > + * when do_mmap_pgoff unwinds (may be important on powerpc and ia64). > > + */ > > Sorry. This is hardly worth it, but the function referred to by this > comment is actually is_vm_hugetlb_page() :-/ Indeed, thanks! [PATCH] hugetlb: prepare_hugepage_range check offset too prepare_hugepage_range should check file offset alignment when it checks virtual address and length, to stop MAP_FIXED with a bad huge offset from unmapping before it fails further down. PowerPC should apply the same prepare_hugepage_range alignment checks as ia64 and all the others do. Then none of the alignment checks in hugetlbfs_file_mmap are required (nor is the check for too small a mapping); but even so, move up setting of VM_HUGETLB and add a comment to warn of what David Gibson discovered - if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region when unwinding from error will go the non-huge way, which may cause bad behaviour on architectures (powerpc and ia64) which segregate their huge mappings into a separate region of the address space. Signed-off-by: Hugh Dickins <hugh@veritas.com> --- arch/ia64/mm/hugetlbpage.c | 4 +++- arch/powerpc/mm/hugetlbpage.c | 8 ++++++-- fs/hugetlbfs/inode.c | 21 ++++++++------------- include/linux/hugetlb.h | 10 +++++++--- mm/mmap.c | 2 +- 5 files changed, 25 insertions(+), 20 deletions(-) --- 2.6.19-rc5/arch/ia64/mm/hugetlbpage.c 2006-09-20 04:42:06.000000000 +0100 +++ linux/arch/ia64/mm/hugetlbpage.c 2006-11-13 15:37:04.000000000 +0000 @@ -70,8 +70,10 @@ huge_pte_offset (struct mm_struct *mm, u * Don't actually need to do any preparation, but need to make sure * the address is in the right region. */ -int prepare_hugepage_range(unsigned long addr, unsigned long len) +int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff) { + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) + return -EINVAL; if (len & ~HPAGE_MASK) return -EINVAL; if (addr & ~HPAGE_MASK) --- 2.6.19-rc5/arch/powerpc/mm/hugetlbpage.c 2006-11-08 08:30:56.000000000 +0000 +++ linux/arch/powerpc/mm/hugetlbpage.c 2006-11-13 15:37:04.000000000 +0000 @@ -491,11 +491,15 @@ static int open_high_hpage_areas(struct return 0; } -int prepare_hugepage_range(unsigned long addr, unsigned long len) +int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff) { int err = 0; - if ( (addr+len) < addr ) + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) + return -EINVAL; + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) return -EINVAL; if (addr < 0x100000000UL) --- 2.6.19-rc5/fs/hugetlbfs/inode.c 2006-11-08 08:31:14.000000000 +0000 +++ linux/fs/hugetlbfs/inode.c 2006-11-13 15:37:04.000000000 +0000 @@ -62,24 +62,19 @@ static int hugetlbfs_file_mmap(struct fi loff_t len, vma_len; int ret; - if (vma->vm_pgoff & (HPAGE_SIZE / PAGE_SIZE - 1)) - return -EINVAL; - - if (vma->vm_start & ~HPAGE_MASK) - return -EINVAL; - - if (vma->vm_end & ~HPAGE_MASK) - return -EINVAL; - - if (vma->vm_end - vma->vm_start < HPAGE_SIZE) - return -EINVAL; + /* + * vma alignment has already been checked by prepare_hugepage_range. + * If you add any error returns here, do so after setting VM_HUGETLB, + * so is_vm_hugetlb_page tests below unmap_region go the right way + * when do_mmap_pgoff unwinds (may be important on powerpc and ia64). + */ + vma->vm_flags |= VM_HUGETLB | VM_RESERVED; + vma->vm_ops = &hugetlb_vm_ops; vma_len = (loff_t)(vma->vm_end - vma->vm_start); mutex_lock(&inode->i_mutex); file_accessed(file); - vma->vm_flags |= VM_HUGETLB | VM_RESERVED; - vma->vm_ops = &hugetlb_vm_ops; ret = -ENOMEM; len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT); --- 2.6.19-rc5/include/linux/hugetlb.h 2006-11-08 08:31:21.000000000 +0000 +++ linux/include/linux/hugetlb.h 2006-11-13 15:37:04.000000000 +0000 @@ -60,8 +60,11 @@ void hugetlb_free_pgd_range(struct mmu_g * If the arch doesn't supply something else, assume that hugepage * size aligned regions are ok without further preparation. */ -static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) +static inline int prepare_hugepage_range(unsigned long addr, unsigned long len, + pgoff_t pgoff) { + if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT)) + return -EINVAL; if (len & ~HPAGE_MASK) return -EINVAL; if (addr & ~HPAGE_MASK) @@ -69,7 +72,8 @@ static inline int prepare_hugepage_range return 0; } #else -int prepare_hugepage_range(unsigned long addr, unsigned long len); +int prepare_hugepage_range(unsigned long addr, unsigned long len, + pgoff_t pgoff); #endif #ifndef ARCH_HAS_SETCLEAR_HUGE_PTE @@ -107,7 +111,7 @@ static inline unsigned long hugetlb_tota #define hugetlb_report_meminfo(buf) 0 #define hugetlb_report_node_meminfo(n, buf) 0 #define follow_huge_pmd(mm, addr, pmd, write) NULL -#define prepare_hugepage_range(addr, len) (-EINVAL) +#define prepare_hugepage_range(addr,len,pgoff) (-EINVAL) #define pmd_huge(x) 0 #define is_hugepage_only_range(mm, addr, len) 0 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; }) --- 2.6.19-rc5/mm/mmap.c 2006-11-08 08:31:23.000000000 +0000 +++ linux/mm/mmap.c 2006-11-13 15:37:04.000000000 +0000 @@ -1379,7 +1379,7 @@ get_unmapped_area(struct file *file, uns * Check if the given range is hugepage aligned, and * can be made suitable for hugepages. */ - ret = prepare_hugepage_range(addr, len); + ret = prepare_hugepage_range(addr, len, pgoff); } else { /* * Ensure that a normal request is not falling in a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 20:41 ` Hugh Dickins @ 2006-11-13 22:07 ` Adam Litke 2006-11-13 23:53 ` 'David Gibson' 2006-11-14 23:48 ` Bill Irwin 2 siblings, 0 replies; 14+ messages in thread From: Adam Litke @ 2006-11-13 22:07 UTC (permalink / raw) To: Hugh Dickins Cc: Chen, Kenneth W, 'David Gibson', 'Christoph Lameter', 'Andrew Morton', bill.irwin, linux-mm On Mon, 2006-11-13 at 20:41 +0000, Hugh Dickins wrote: > On Mon, 13 Nov 2006, Adam Litke wrote: > > > Looks good to me, notwithstanding the nano-nit below. > > > > On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote: > > > + /* > > > + * vma alignment has already been checked by prepare_hugepage_range. > > > + * If you add any error returns here, do so after setting VM_HUGETLB, > > > + * so is_vm_huge_tlb_page tests below unmap_region go the right way > > > + * when do_mmap_pgoff unwinds (may be important on powerpc and ia64). > > > + */ > > > > Sorry. This is hardly worth it, but the function referred to by this > > comment is actually is_vm_hugetlb_page() :-/ > > Indeed, thanks! > > [PATCH] hugetlb: prepare_hugepage_range check offset too > > prepare_hugepage_range should check file offset alignment when it checks > virtual address and length, to stop MAP_FIXED with a bad huge offset from > unmapping before it fails further down. PowerPC should apply the same > prepare_hugepage_range alignment checks as ia64 and all the others do. > > Then none of the alignment checks in hugetlbfs_file_mmap are required > (nor is the check for too small a mapping); but even so, move up setting > of VM_HUGETLB and add a comment to warn of what David Gibson discovered - > if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region > when unwinding from error will go the non-huge way, which may cause bad > behaviour on architectures (powerpc and ia64) which segregate their huge > mappings into a separate region of the address space. > > Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Adam Litke <agl@us.ibm.com> -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 20:41 ` Hugh Dickins 2006-11-13 22:07 ` Adam Litke @ 2006-11-13 23:53 ` 'David Gibson' 2006-11-14 23:48 ` Bill Irwin 2 siblings, 0 replies; 14+ messages in thread From: 'David Gibson' @ 2006-11-13 23:53 UTC (permalink / raw) To: Hugh Dickins Cc: Adam Litke, Chen, Kenneth W, 'Christoph Lameter', 'Andrew Morton', bill.irwin, linux-mm On Mon, Nov 13, 2006 at 08:41:49PM +0000, Hugh Dickins wrote: > On Mon, 13 Nov 2006, Adam Litke wrote: > > > Looks good to me, notwithstanding the nano-nit below. > > > > On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote: > > > + /* > > > + * vma alignment has already been checked by prepare_hugepage_range. > > > + * If you add any error returns here, do so after setting VM_HUGETLB, > > > + * so is_vm_huge_tlb_page tests below unmap_region go the right way > > > + * when do_mmap_pgoff unwinds (may be important on powerpc and ia64). > > > + */ > > > > Sorry. This is hardly worth it, but the function referred to by this > > comment is actually is_vm_hugetlb_page() :-/ > > Indeed, thanks! > > [PATCH] hugetlb: prepare_hugepage_range check offset too > > prepare_hugepage_range should check file offset alignment when it checks > virtual address and length, to stop MAP_FIXED with a bad huge offset from > unmapping before it fails further down. PowerPC should apply the same > prepare_hugepage_range alignment checks as ia64 and all the others do. > > Then none of the alignment checks in hugetlbfs_file_mmap are required > (nor is the check for too small a mapping); but even so, move up setting > of VM_HUGETLB and add a comment to warn of what David Gibson discovered - > if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region > when unwinding from error will go the non-huge way, which may cause bad > behaviour on architectures (powerpc and ia64) which segregate their huge > mappings into a separate region of the address space. Looks pretty good. There's still a certain amount of wierdness on powerpc: we can back out after (irreversibly) switching chunks of the address space over to use for hugepages. I'll try to fix that up later, but in any case it's certainly more comprehensive than my original patch. Acked-by: David Gibson <david@gibson.dropbear.id.au> -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 20:41 ` Hugh Dickins 2006-11-13 22:07 ` Adam Litke 2006-11-13 23:53 ` 'David Gibson' @ 2006-11-14 23:48 ` Bill Irwin 2 siblings, 0 replies; 14+ messages in thread From: Bill Irwin @ 2006-11-14 23:48 UTC (permalink / raw) To: Hugh Dickins Cc: Adam Litke, Chen, Kenneth W, 'David Gibson', 'Christoph Lameter', 'Andrew Morton', bill.irwin, linux-mm On Mon, Nov 13, 2006 at 08:41:49PM +0000, Hugh Dickins wrote: > [PATCH] hugetlb: prepare_hugepage_range check offset too > prepare_hugepage_range should check file offset alignment when it checks > virtual address and length, to stop MAP_FIXED with a bad huge offset from > unmapping before it fails further down. PowerPC should apply the same > prepare_hugepage_range alignment checks as ia64 and all the others do. > Then none of the alignment checks in hugetlbfs_file_mmap are required > (nor is the check for too small a mapping); but even so, move up setting > of VM_HUGETLB and add a comment to warn of what David Gibson discovered - > if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region > when unwinding from error will go the non-huge way, which may cause bad > behaviour on architectures (powerpc and ia64) which segregate their huge > mappings into a separate region of the address space. > Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: William Irwin <wli@holomorphy.com> -- wli -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [hugepage] Fix unmap_and_free_vma backout path 2006-11-13 7:35 ` Chen, Kenneth W 2006-11-13 17:00 ` Hugh Dickins @ 2006-11-14 0:19 ` 'David Gibson' 1 sibling, 0 replies; 14+ messages in thread From: 'David Gibson' @ 2006-11-14 0:19 UTC (permalink / raw) To: Chen, Kenneth W Cc: 'Christoph Lameter', 'Andrew Morton', 'Hugh Dickins', bill.irwin, 'Adam Litke', linux-mm On Sun, Nov 12, 2006 at 11:35:28PM -0800, Chen, Kenneth W wrote: > David Gibson wrote on Sunday, November 12, 2006 10:23 PM > > > > Probably, yes, although it's yet another "if (hugepage) > > > > specialcase()". But I still think we want the above patch as well. > > > > It will make sure we correctly back out from any other possible > > > > failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or > > > > which get added later. > > > > > > > > > Something like this? I haven't tested it yet. But looks plausible > > > because we already have if is_file_hugepages() in the generic path. > > > > Um.. if you're going to test pgoff here, you should also test the > > address. > > prepare_hugepage_range() should catch misaligned memory address, right? > What more does get_unmapped_area() need to test? > > > > Oh, and that point is too late to catch MAP_FIXED mappings. > > I don't understand what you mean by that. Sorry, old data. I was thinking of the old get_unmapped_area() which had entirely separate paths for MAP_FIXED and otherwise. And I had my sense inverted as well. It used to be that prepare_hugepage_range() was called *only* in the MAP_FIXED case (it being assumed that the hugetlb specific get_unmapped_area call would do any necessary preparation). -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-11-14 23:48 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2006-11-13 5:13 [hugepage] Fix unmap_and_free_vma backout path 'David Gibson' 2006-11-13 5:29 ` Christoph Lameter 2006-11-13 5:57 ` 'David Gibson' 2006-11-13 6:03 ` Chen, Kenneth W 2006-11-13 6:22 ` 'David Gibson' 2006-11-13 7:35 ` Chen, Kenneth W 2006-11-13 17:00 ` Hugh Dickins 2006-11-13 17:38 ` Chen, Kenneth W 2006-11-13 20:34 ` Adam Litke 2006-11-13 20:41 ` Hugh Dickins 2006-11-13 22:07 ` Adam Litke 2006-11-13 23:53 ` 'David Gibson' 2006-11-14 23:48 ` Bill Irwin 2006-11-14 0:19 ` 'David Gibson'
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox