linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [hugepage] Fix unmap_and_free_vma backout path
@ 2006-11-13  5:13 'David Gibson'
  2006-11-13  5:29 ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: 'David Gibson' @ 2006-11-13  5:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chen, Kenneth W, 'Christoph Lameter',
	Hugh Dickins, bill.irwin, Adam Litke, linux-mm

Andrew, please apply:

If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for
example, because the given file offset is not hugepage aligned - then
do_mmap_pgoff will go to the unmap_and_free_vma backout path.

But at this stage the vma hasn't been marked as hugepage, and the
backout path will call unmap_region() on it.  That will eventually
call down to the non-hugepage version of unmap_page_range().  On
ppc64, at least, that will cause serious problems if there are any
existing hugepage pagetable entries in the vicinity - for example if
there are any other hugepage mappings under the same PUD.
unmap_page_range() will trigger a bad_pud() on the hugepage pud
entries.  I suspect this will also cause bad problems on ia64, though
I don't have a machine to test it on.

This patch addresses the problem by having hugetlbfs_file_mmap() mark
the vma as hugepage before it does anything else, thus ensuring we use
the right path for any subsequent backout.

This may not be all we want.  Even with this patch, performing such a
failing map on to of an existing mapping will clobber (unmap) that
pre-existing mapping.  This is in contrast to the analogous situation
with normal page mappings - mapping on top with a misaligned offset
will fail early enough not to clobber the pre-existing mapping.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Index: working-2.6/fs/hugetlbfs/inode.c
===================================================================
--- working-2.6.orig/fs/hugetlbfs/inode.c	2006-11-13 15:49:14.000000000 +1100
+++ working-2.6/fs/hugetlbfs/inode.c	2006-11-13 15:49:29.000000000 +1100
@@ -62,6 +62,9 @@ static int hugetlbfs_file_mmap(struct fi
 	loff_t len, vma_len;
 	int ret;
 
+	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
+	vma->vm_ops = &hugetlb_vm_ops;
+
 	if (vma->vm_pgoff & (HPAGE_SIZE / PAGE_SIZE - 1))
 		return -EINVAL;
 
@@ -78,8 +81,6 @@ static int hugetlbfs_file_mmap(struct fi
 
 	mutex_lock(&inode->i_mutex);
 	file_accessed(file);
-	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
-	vma->vm_ops = &hugetlb_vm_ops;
 
 	ret = -ENOMEM;
 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13  5:13 [hugepage] Fix unmap_and_free_vma backout path 'David Gibson'
@ 2006-11-13  5:29 ` Christoph Lameter
  2006-11-13  5:57   ` 'David Gibson'
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2006-11-13  5:29 UTC (permalink / raw)
  To: 'David Gibson'
  Cc: Andrew Morton, Chen, Kenneth W, Hugh Dickins, bill.irwin,
	Adam Litke, linux-mm

On Mon, 13 Nov 2006, 'David Gibson' wrote:

> This may not be all we want.  Even with this patch, performing such a
> failing map on to of an existing mapping will clobber (unmap) that
> pre-existing mapping.  This is in contrast to the analogous situation
> with normal page mappings - mapping on top with a misaligned offset
> will fail early enough not to clobber the pre-existing mapping.

Then it is best to check the huge page alignment at the 
same place as regular alignment.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13  5:29 ` Christoph Lameter
@ 2006-11-13  5:57   ` 'David Gibson'
  2006-11-13  6:03     ` Chen, Kenneth W
  0 siblings, 1 reply; 14+ messages in thread
From: 'David Gibson' @ 2006-11-13  5:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Chen, Kenneth W, Hugh Dickins, bill.irwin,
	Adam Litke, linux-mm

On Sun, Nov 12, 2006 at 09:29:48PM -0800, Christoph Lameter wrote:
> On Mon, 13 Nov 2006, 'David Gibson' wrote:
> 
> > This may not be all we want.  Even with this patch, performing such a
> > failing map on to of an existing mapping will clobber (unmap) that
> > pre-existing mapping.  This is in contrast to the analogous situation
> > with normal page mappings - mapping on top with a misaligned offset
> > will fail early enough not to clobber the pre-existing mapping.
> 
> Then it is best to check the huge page alignment at the 
> same place as regular alignment.

Probably, yes, although it's yet another "if (hugepage)
specialcase()".  But I still think we want the above patch as well.
It will make sure we correctly back out from any other possible
failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or
which get added later.

As far as I can tell, there is in general no guarantee that a failing
MAP_FIXED() mmap() *won't* clobber what was there before.  I believe
there are (admittedly rare) possible late failure cases in pure
normalpage paths which will result in a failed mmap() after clobbering
the prior mapping.  Any failure from the filesystem or device's
f_ops->mmap callback will do this, for example.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13  5:57   ` 'David Gibson'
@ 2006-11-13  6:03     ` Chen, Kenneth W
  2006-11-13  6:22       ` 'David Gibson'
  0 siblings, 1 reply; 14+ messages in thread
From: Chen, Kenneth W @ 2006-11-13  6:03 UTC (permalink / raw)
  To: 'David Gibson', Christoph Lameter
  Cc: Andrew Morton, Hugh Dickins, bill.irwin, Adam Litke, linux-mm

David Gibson wrote on Sunday, November 12, 2006 9:57 PM
> On Sun, Nov 12, 2006 at 09:29:48PM -0800, Christoph Lameter wrote:
> > On Mon, 13 Nov 2006, 'David Gibson' wrote:
> > 
> > > This may not be all we want.  Even with this patch, performing such a
> > > failing map on to of an existing mapping will clobber (unmap) that
> > > pre-existing mapping.  This is in contrast to the analogous situation
> > > with normal page mappings - mapping on top with a misaligned offset
> > > will fail early enough not to clobber the pre-existing mapping.
> > 
> > Then it is best to check the huge page alignment at the 
> > same place as regular alignment.
> 
> Probably, yes, although it's yet another "if (hugepage)
> specialcase()".  But I still think we want the above patch as well.
> It will make sure we correctly back out from any other possible
> failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or
> which get added later.


Something like this?  I haven't tested it yet.  But looks plausible
because we already have if is_file_hugepages() in the generic path.




--- ./mm/mmap.c.orig	2006-11-12 22:43:10.000000000 -0800
+++ ./mm/mmap.c	2006-11-12 22:56:23.000000000 -0800
@@ -1375,11 +1375,14 @@ get_unmapped_area(struct file *file, uns
 	if (addr & ~PAGE_MASK)
 		return -EINVAL;
 	if (file && is_file_hugepages(file))  {
+		ret = 0;
 		/*
 		 * Check if the given range is hugepage aligned, and
 		 * can be made suitable for hugepages.
 		 */
-		ret = prepare_hugepage_range(addr, len);
+		if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT) ||
+		    prepare_hugepage_range(addr, len))
+			ret = -EINVAL;
 	} else {
 		/*
 		 * Ensure that a normal request is not falling in a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13  6:03     ` Chen, Kenneth W
@ 2006-11-13  6:22       ` 'David Gibson'
  2006-11-13  7:35         ` Chen, Kenneth W
  0 siblings, 1 reply; 14+ messages in thread
From: 'David Gibson' @ 2006-11-13  6:22 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: Christoph Lameter, Andrew Morton, Hugh Dickins, bill.irwin,
	Adam Litke, linux-mm

On Sun, Nov 12, 2006 at 10:03:31PM -0800, Chen, Kenneth W wrote:
> David Gibson wrote on Sunday, November 12, 2006 9:57 PM
> > On Sun, Nov 12, 2006 at 09:29:48PM -0800, Christoph Lameter wrote:
> > > On Mon, 13 Nov 2006, 'David Gibson' wrote:
> > > 
> > > > This may not be all we want.  Even with this patch, performing such a
> > > > failing map on to of an existing mapping will clobber (unmap) that
> > > > pre-existing mapping.  This is in contrast to the analogous situation
> > > > with normal page mappings - mapping on top with a misaligned offset
> > > > will fail early enough not to clobber the pre-existing mapping.
> > > 
> > > Then it is best to check the huge page alignment at the 
> > > same place as regular alignment.
> > 
> > Probably, yes, although it's yet another "if (hugepage)
> > specialcase()".  But I still think we want the above patch as well.
> > It will make sure we correctly back out from any other possible
> > failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or
> > which get added later.
> 
> 
> Something like this?  I haven't tested it yet.  But looks plausible
> because we already have if is_file_hugepages() in the generic path.

Um.. if you're going to test pgoff here, you should also test the
address.  Oh, and that point is too late to catch MAP_FIXED mappings.

I was thinking more of testing it in the same place we test the page
alignment for normal mappings.  Except that's in arch code.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13  6:22       ` 'David Gibson'
@ 2006-11-13  7:35         ` Chen, Kenneth W
  2006-11-13 17:00           ` Hugh Dickins
  2006-11-14  0:19           ` 'David Gibson'
  0 siblings, 2 replies; 14+ messages in thread
From: Chen, Kenneth W @ 2006-11-13  7:35 UTC (permalink / raw)
  To: 'David Gibson'
  Cc: 'Christoph Lameter', 'Andrew Morton',
	'Hugh Dickins', bill.irwin, 'Adam Litke',
	linux-mm

David Gibson wrote on Sunday, November 12, 2006 10:23 PM
> > > Probably, yes, although it's yet another "if (hugepage)
> > > specialcase()".  But I still think we want the above patch as well.
> > > It will make sure we correctly back out from any other possible
> > > failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or
> > > which get added later.
> > 
> > 
> > Something like this?  I haven't tested it yet.  But looks plausible
> > because we already have if is_file_hugepages() in the generic path.
> 
> Um.. if you're going to test pgoff here, you should also test the
> address.

prepare_hugepage_range() should catch misaligned memory address, right?
What more does get_unmapped_area() need to test?


> Oh, and that point is too late to catch MAP_FIXED mappings.

I don't understand what you mean by that.
In do_mmap_pgoff(), very early in the code it tries to get an valid
virtual address:

        addr = get_unmapped_area(file, addr, len, pgoff, flags);
        if (addr & ~PAGE_MASK)
                return addr;

We don't even have a vma at this point, there is no error to recover.
If get_unmapped_area() tests the validity of pgoff and return an error
code, the immediate two lines of code will catch that and everything
stops there.  I don't see where the unmap gets called here.  Did I
miss something?


- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13  7:35         ` Chen, Kenneth W
@ 2006-11-13 17:00           ` Hugh Dickins
  2006-11-13 17:38             ` Chen, Kenneth W
  2006-11-13 20:34             ` Adam Litke
  2006-11-14  0:19           ` 'David Gibson'
  1 sibling, 2 replies; 14+ messages in thread
From: Hugh Dickins @ 2006-11-13 17:00 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'David Gibson', 'Christoph Lameter',
	'Andrew Morton', bill.irwin, 'Adam Litke',
	linux-mm

On Sun, 12 Nov 2006, Chen, Kenneth W wrote:
> David Gibson wrote on Sunday, November 12, 2006 10:23 PM
> > > 
> > > Something like this?  I haven't tested it yet.  But looks plausible
> > > because we already have if is_file_hugepages() in the generic path.
> > 
> > Um.. if you're going to test pgoff here, you should also test the
> > address.
> 
> prepare_hugepage_range() should catch misaligned memory address, right?
> What more does get_unmapped_area() need to test?

David made that remark, I see now, because PowerPC alone omits to
check address and length alignment in its prepare_hugepage_range:
it should be checking those as ia64 and generic do.

> 
> > Oh, and that point is too late to catch MAP_FIXED mappings.
> 
> I don't understand what you mean by that.
> In do_mmap_pgoff(), very early in the code it tries to get an valid
> virtual address:
> 
>         addr = get_unmapped_area(file, addr, len, pgoff, flags);
>         if (addr & ~PAGE_MASK)
>                 return addr;
> 
> We don't even have a vma at this point, there is no error to recover.
> If get_unmapped_area() tests the validity of pgoff and return an error
> code, the immediate two lines of code will catch that and everything
> stops there.  I don't see where the unmap gets called here.  Did I
> miss something?

I agree with Ken on that.  I agree with just about everything said by
people so far.  But I think the check looks nicer tucked away with the
other alignment checks in prepare_hugepage_range: how about this version?
(Perhaps, in another mood, I've have chosen BUG_ONs instead of just
deleting all the redundant tests - another deleted in the ppc case.)


[PATCH] hugetlb: prepare_hugepage_range check offset too

prepare_hugepage_range should check file offset alignment when it checks
virtual address and length, to stop MAP_FIXED with a bad huge offset from
unmapping before it fails further down.  PowerPC should apply the same
prepare_hugepage_range alignment checks as ia64 and all the others do.

Then none of the alignment checks in hugetlbfs_file_mmap are required
(nor is the check for too small a mapping); but even so, move up setting
of VM_HUGETLB and add a comment to warn of what David Gibson discovered -
if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
when unwinding from error will go the non-huge way, which may cause bad
behaviour on architectures (powerpc and ia64) which segregate their huge
mappings into a separate region of the address space.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 arch/ia64/mm/hugetlbpage.c    |    4 +++-
 arch/powerpc/mm/hugetlbpage.c |    8 ++++++--
 fs/hugetlbfs/inode.c          |   21 ++++++++-------------
 include/linux/hugetlb.h       |   10 +++++++---
 mm/mmap.c                     |    2 +-
 5 files changed, 25 insertions(+), 20 deletions(-)

--- 2.6.19-rc5/arch/ia64/mm/hugetlbpage.c	2006-09-20 04:42:06.000000000 +0100
+++ linux/arch/ia64/mm/hugetlbpage.c	2006-11-13 15:37:04.000000000 +0000
@@ -70,8 +70,10 @@ huge_pte_offset (struct mm_struct *mm, u
  * Don't actually need to do any preparation, but need to make sure
  * the address is in the right region.
  */
-int prepare_hugepage_range(unsigned long addr, unsigned long len)
+int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff)
 {
+	if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT))
+		return -EINVAL;
 	if (len & ~HPAGE_MASK)
 		return -EINVAL;
 	if (addr & ~HPAGE_MASK)
--- 2.6.19-rc5/arch/powerpc/mm/hugetlbpage.c	2006-11-08 08:30:56.000000000 +0000
+++ linux/arch/powerpc/mm/hugetlbpage.c	2006-11-13 15:37:04.000000000 +0000
@@ -491,11 +491,15 @@ static int open_high_hpage_areas(struct 
 	return 0;
 }
 
-int prepare_hugepage_range(unsigned long addr, unsigned long len)
+int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff)
 {
 	int err = 0;
 
-	if ( (addr+len) < addr )
+	if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT))
+		return -EINVAL;
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
 		return -EINVAL;
 
 	if (addr < 0x100000000UL)
--- 2.6.19-rc5/fs/hugetlbfs/inode.c	2006-11-08 08:31:14.000000000 +0000
+++ linux/fs/hugetlbfs/inode.c	2006-11-13 15:37:04.000000000 +0000
@@ -62,24 +62,19 @@ static int hugetlbfs_file_mmap(struct fi
 	loff_t len, vma_len;
 	int ret;
 
-	if (vma->vm_pgoff & (HPAGE_SIZE / PAGE_SIZE - 1))
-		return -EINVAL;
-
-	if (vma->vm_start & ~HPAGE_MASK)
-		return -EINVAL;
-
-	if (vma->vm_end & ~HPAGE_MASK)
-		return -EINVAL;
-
-	if (vma->vm_end - vma->vm_start < HPAGE_SIZE)
-		return -EINVAL;
+	/*
+	 * vma alignment has already been checked by prepare_hugepage_range.
+	 * If you add any error returns here, do so after setting VM_HUGETLB,
+	 * so is_vm_huge_tlb_page tests below unmap_region go the right way
+	 * when do_mmap_pgoff unwinds (may be important on powerpc and ia64).
+	 */
+	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
+	vma->vm_ops = &hugetlb_vm_ops;
 
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
 
 	mutex_lock(&inode->i_mutex);
 	file_accessed(file);
-	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
-	vma->vm_ops = &hugetlb_vm_ops;
 
 	ret = -ENOMEM;
 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
--- 2.6.19-rc5/include/linux/hugetlb.h	2006-11-08 08:31:21.000000000 +0000
+++ linux/include/linux/hugetlb.h	2006-11-13 15:37:04.000000000 +0000
@@ -60,8 +60,11 @@ void hugetlb_free_pgd_range(struct mmu_g
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
  */
-static inline int prepare_hugepage_range(unsigned long addr, unsigned long len)
+static inline int prepare_hugepage_range(unsigned long addr, unsigned long len,
+						pgoff_t pgoff)
 {
+	if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT))
+		return -EINVAL;
 	if (len & ~HPAGE_MASK)
 		return -EINVAL;
 	if (addr & ~HPAGE_MASK)
@@ -69,7 +72,8 @@ static inline int prepare_hugepage_range
 	return 0;
 }
 #else
-int prepare_hugepage_range(unsigned long addr, unsigned long len);
+int prepare_hugepage_range(unsigned long addr, unsigned long len,
+						pgoff_t pgoff);
 #endif
 
 #ifndef ARCH_HAS_SETCLEAR_HUGE_PTE
@@ -107,7 +111,7 @@ static inline unsigned long hugetlb_tota
 #define hugetlb_report_meminfo(buf)		0
 #define hugetlb_report_node_meminfo(n, buf)	0
 #define follow_huge_pmd(mm, addr, pmd, write)	NULL
-#define prepare_hugepage_range(addr, len)	(-EINVAL)
+#define prepare_hugepage_range(addr,len,pgoff)	(-EINVAL)
 #define pmd_huge(x)	0
 #define is_hugepage_only_range(mm, addr, len)	0
 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
--- 2.6.19-rc5/mm/mmap.c	2006-11-08 08:31:23.000000000 +0000
+++ linux/mm/mmap.c	2006-11-13 15:37:04.000000000 +0000
@@ -1379,7 +1379,7 @@ get_unmapped_area(struct file *file, uns
 		 * Check if the given range is hugepage aligned, and
 		 * can be made suitable for hugepages.
 		 */
-		ret = prepare_hugepage_range(addr, len);
+		ret = prepare_hugepage_range(addr, len, pgoff);
 	} else {
 		/*
 		 * Ensure that a normal request is not falling in a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13 17:00           ` Hugh Dickins
@ 2006-11-13 17:38             ` Chen, Kenneth W
  2006-11-13 20:34             ` Adam Litke
  1 sibling, 0 replies; 14+ messages in thread
From: Chen, Kenneth W @ 2006-11-13 17:38 UTC (permalink / raw)
  To: 'Hugh Dickins'
  Cc: 'David Gibson', 'Christoph Lameter',
	'Andrew Morton', bill.irwin, 'Adam Litke',
	linux-mm

Hugh Dickins wrote on Monday, November 13, 2006 9:00 AM
> On Sun, 12 Nov 2006, Chen, Kenneth W wrote:
> > David Gibson wrote on Sunday, November 12, 2006 10:23 PM
> > > > 
> > > > Something like this?  I haven't tested it yet.  But looks plausible
> > > > because we already have if is_file_hugepages() in the generic path.
> > > 
> > > Um.. if you're going to test pgoff here, you should also test the
> > > address.
> > 
> > prepare_hugepage_range() should catch misaligned memory address, right?
> > What more does get_unmapped_area() need to test?
> 
> David made that remark, I see now, because PowerPC alone omits to
> check address and length alignment in its prepare_hugepage_range:
> it should be checking those as ia64 and generic do.

Ah, I see. I only looked at the generic and ia64 version of
prepare_hugepage_range() and wasn't aware that powerpc doesn't check
address alignment in there :-P


> > > Oh, and that point is too late to catch MAP_FIXED mappings.
> > 
> > I don't understand what you mean by that.
> > In do_mmap_pgoff(), very early in the code it tries to get an valid
> > virtual address:
> > 
> >         addr = get_unmapped_area(file, addr, len, pgoff, flags);
> >         if (addr & ~PAGE_MASK)
> >                 return addr;
> > 
> > We don't even have a vma at this point, there is no error to recover.
> > If get_unmapped_area() tests the validity of pgoff and return an error
> > code, the immediate two lines of code will catch that and everything
> > stops there.  I don't see where the unmap gets called here.  Did I
> > miss something?
> 
> I agree with Ken on that.  I agree with just about everything said by
> people so far.  But I think the check looks nicer tucked away with the
> other alignment checks in prepare_hugepage_range: how about this version?


Looks nice with all the extra clean up.


> (Perhaps, in another mood, I've have chosen BUG_ONs instead of just
> deleting all the redundant tests - another deleted in the ppc case.)
> 
> 
> [PATCH] hugetlb: prepare_hugepage_range check offset too
> 
> prepare_hugepage_range should check file offset alignment when it checks
> virtual address and length, to stop MAP_FIXED with a bad huge offset from
> unmapping before it fails further down.  PowerPC should apply the same
> prepare_hugepage_range alignment checks as ia64 and all the others do.
> 
> Then none of the alignment checks in hugetlbfs_file_mmap are required
> (nor is the check for too small a mapping); but even so, move up setting
> of VM_HUGETLB and add a comment to warn of what David Gibson discovered -
> if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
> when unwinding from error will go the non-huge way, which may cause bad
> behaviour on architectures (powerpc and ia64) which segregate their huge
> mappings into a separate region of the address space.
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
> 
>  arch/ia64/mm/hugetlbpage.c    |    4 +++-
>  arch/powerpc/mm/hugetlbpage.c |    8 ++++++--
>  fs/hugetlbfs/inode.c          |   21 ++++++++-------------
>  include/linux/hugetlb.h       |   10 +++++++---
>  mm/mmap.c                     |    2 +-
>  5 files changed, 25 insertions(+), 20 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13 17:00           ` Hugh Dickins
  2006-11-13 17:38             ` Chen, Kenneth W
@ 2006-11-13 20:34             ` Adam Litke
  2006-11-13 20:41               ` Hugh Dickins
  1 sibling, 1 reply; 14+ messages in thread
From: Adam Litke @ 2006-11-13 20:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chen, Kenneth W, 'David Gibson',
	'Christoph Lameter', 'Andrew Morton',
	bill.irwin, linux-mm

Looks good to me, notwithstanding the nano-nit below.

On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote:
> +	/*
> +	 * vma alignment has already been checked by prepare_hugepage_range.
> +	 * If you add any error returns here, do so after setting VM_HUGETLB,
> +	 * so is_vm_huge_tlb_page tests below unmap_region go the right way
> +	 * when do_mmap_pgoff unwinds (may be important on powerpc and ia64).
> +	 */

Sorry.  This is hardly worth it, but the function referred to by this
comment is actually is_vm_hugetlb_page() :-/

-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13 20:34             ` Adam Litke
@ 2006-11-13 20:41               ` Hugh Dickins
  2006-11-13 22:07                 ` Adam Litke
                                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Hugh Dickins @ 2006-11-13 20:41 UTC (permalink / raw)
  To: Adam Litke
  Cc: Chen, Kenneth W, 'David Gibson',
	'Christoph Lameter', 'Andrew Morton',
	bill.irwin, linux-mm

On Mon, 13 Nov 2006, Adam Litke wrote:

> Looks good to me, notwithstanding the nano-nit below.
> 
> On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote:
> > +	/*
> > +	 * vma alignment has already been checked by prepare_hugepage_range.
> > +	 * If you add any error returns here, do so after setting VM_HUGETLB,
> > +	 * so is_vm_huge_tlb_page tests below unmap_region go the right way
> > +	 * when do_mmap_pgoff unwinds (may be important on powerpc and ia64).
> > +	 */
> 
> Sorry.  This is hardly worth it, but the function referred to by this
> comment is actually is_vm_hugetlb_page() :-/

Indeed, thanks!

[PATCH] hugetlb: prepare_hugepage_range check offset too

prepare_hugepage_range should check file offset alignment when it checks
virtual address and length, to stop MAP_FIXED with a bad huge offset from
unmapping before it fails further down.  PowerPC should apply the same
prepare_hugepage_range alignment checks as ia64 and all the others do.

Then none of the alignment checks in hugetlbfs_file_mmap are required
(nor is the check for too small a mapping); but even so, move up setting
of VM_HUGETLB and add a comment to warn of what David Gibson discovered -
if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
when unwinding from error will go the non-huge way, which may cause bad
behaviour on architectures (powerpc and ia64) which segregate their huge
mappings into a separate region of the address space.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 arch/ia64/mm/hugetlbpage.c    |    4 +++-
 arch/powerpc/mm/hugetlbpage.c |    8 ++++++--
 fs/hugetlbfs/inode.c          |   21 ++++++++-------------
 include/linux/hugetlb.h       |   10 +++++++---
 mm/mmap.c                     |    2 +-
 5 files changed, 25 insertions(+), 20 deletions(-)

--- 2.6.19-rc5/arch/ia64/mm/hugetlbpage.c	2006-09-20 04:42:06.000000000 +0100
+++ linux/arch/ia64/mm/hugetlbpage.c	2006-11-13 15:37:04.000000000 +0000
@@ -70,8 +70,10 @@ huge_pte_offset (struct mm_struct *mm, u
  * Don't actually need to do any preparation, but need to make sure
  * the address is in the right region.
  */
-int prepare_hugepage_range(unsigned long addr, unsigned long len)
+int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff)
 {
+	if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT))
+		return -EINVAL;
 	if (len & ~HPAGE_MASK)
 		return -EINVAL;
 	if (addr & ~HPAGE_MASK)
--- 2.6.19-rc5/arch/powerpc/mm/hugetlbpage.c	2006-11-08 08:30:56.000000000 +0000
+++ linux/arch/powerpc/mm/hugetlbpage.c	2006-11-13 15:37:04.000000000 +0000
@@ -491,11 +491,15 @@ static int open_high_hpage_areas(struct 
 	return 0;
 }
 
-int prepare_hugepage_range(unsigned long addr, unsigned long len)
+int prepare_hugepage_range(unsigned long addr, unsigned long len, pgoff_t pgoff)
 {
 	int err = 0;
 
-	if ( (addr+len) < addr )
+	if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT))
+		return -EINVAL;
+	if (len & ~HPAGE_MASK)
+		return -EINVAL;
+	if (addr & ~HPAGE_MASK)
 		return -EINVAL;
 
 	if (addr < 0x100000000UL)
--- 2.6.19-rc5/fs/hugetlbfs/inode.c	2006-11-08 08:31:14.000000000 +0000
+++ linux/fs/hugetlbfs/inode.c	2006-11-13 15:37:04.000000000 +0000
@@ -62,24 +62,19 @@ static int hugetlbfs_file_mmap(struct fi
 	loff_t len, vma_len;
 	int ret;
 
-	if (vma->vm_pgoff & (HPAGE_SIZE / PAGE_SIZE - 1))
-		return -EINVAL;
-
-	if (vma->vm_start & ~HPAGE_MASK)
-		return -EINVAL;
-
-	if (vma->vm_end & ~HPAGE_MASK)
-		return -EINVAL;
-
-	if (vma->vm_end - vma->vm_start < HPAGE_SIZE)
-		return -EINVAL;
+	/*
+	 * vma alignment has already been checked by prepare_hugepage_range.
+	 * If you add any error returns here, do so after setting VM_HUGETLB,
+	 * so is_vm_hugetlb_page tests below unmap_region go the right way
+	 * when do_mmap_pgoff unwinds (may be important on powerpc and ia64).
+	 */
+	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
+	vma->vm_ops = &hugetlb_vm_ops;
 
 	vma_len = (loff_t)(vma->vm_end - vma->vm_start);
 
 	mutex_lock(&inode->i_mutex);
 	file_accessed(file);
-	vma->vm_flags |= VM_HUGETLB | VM_RESERVED;
-	vma->vm_ops = &hugetlb_vm_ops;
 
 	ret = -ENOMEM;
 	len = vma_len + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
--- 2.6.19-rc5/include/linux/hugetlb.h	2006-11-08 08:31:21.000000000 +0000
+++ linux/include/linux/hugetlb.h	2006-11-13 15:37:04.000000000 +0000
@@ -60,8 +60,11 @@ void hugetlb_free_pgd_range(struct mmu_g
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
  */
-static inline int prepare_hugepage_range(unsigned long addr, unsigned long len)
+static inline int prepare_hugepage_range(unsigned long addr, unsigned long len,
+						pgoff_t pgoff)
 {
+	if (pgoff & (~HPAGE_MASK >> PAGE_SHIFT))
+		return -EINVAL;
 	if (len & ~HPAGE_MASK)
 		return -EINVAL;
 	if (addr & ~HPAGE_MASK)
@@ -69,7 +72,8 @@ static inline int prepare_hugepage_range
 	return 0;
 }
 #else
-int prepare_hugepage_range(unsigned long addr, unsigned long len);
+int prepare_hugepage_range(unsigned long addr, unsigned long len,
+						pgoff_t pgoff);
 #endif
 
 #ifndef ARCH_HAS_SETCLEAR_HUGE_PTE
@@ -107,7 +111,7 @@ static inline unsigned long hugetlb_tota
 #define hugetlb_report_meminfo(buf)		0
 #define hugetlb_report_node_meminfo(n, buf)	0
 #define follow_huge_pmd(mm, addr, pmd, write)	NULL
-#define prepare_hugepage_range(addr, len)	(-EINVAL)
+#define prepare_hugepage_range(addr,len,pgoff)	(-EINVAL)
 #define pmd_huge(x)	0
 #define is_hugepage_only_range(mm, addr, len)	0
 #define hugetlb_free_pgd_range(tlb, addr, end, floor, ceiling) ({BUG(); 0; })
--- 2.6.19-rc5/mm/mmap.c	2006-11-08 08:31:23.000000000 +0000
+++ linux/mm/mmap.c	2006-11-13 15:37:04.000000000 +0000
@@ -1379,7 +1379,7 @@ get_unmapped_area(struct file *file, uns
 		 * Check if the given range is hugepage aligned, and
 		 * can be made suitable for hugepages.
 		 */
-		ret = prepare_hugepage_range(addr, len);
+		ret = prepare_hugepage_range(addr, len, pgoff);
 	} else {
 		/*
 		 * Ensure that a normal request is not falling in a

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13 20:41               ` Hugh Dickins
@ 2006-11-13 22:07                 ` Adam Litke
  2006-11-13 23:53                 ` 'David Gibson'
  2006-11-14 23:48                 ` Bill Irwin
  2 siblings, 0 replies; 14+ messages in thread
From: Adam Litke @ 2006-11-13 22:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chen, Kenneth W, 'David Gibson',
	'Christoph Lameter', 'Andrew Morton',
	bill.irwin, linux-mm

On Mon, 2006-11-13 at 20:41 +0000, Hugh Dickins wrote:
> On Mon, 13 Nov 2006, Adam Litke wrote:
> 
> > Looks good to me, notwithstanding the nano-nit below.
> > 
> > On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote:
> > > +	/*
> > > +	 * vma alignment has already been checked by prepare_hugepage_range.
> > > +	 * If you add any error returns here, do so after setting VM_HUGETLB,
> > > +	 * so is_vm_huge_tlb_page tests below unmap_region go the right way
> > > +	 * when do_mmap_pgoff unwinds (may be important on powerpc and ia64).
> > > +	 */
> > 
> > Sorry.  This is hardly worth it, but the function referred to by this
> > comment is actually is_vm_hugetlb_page() :-/
> 
> Indeed, thanks!
> 
> [PATCH] hugetlb: prepare_hugepage_range check offset too
> 
> prepare_hugepage_range should check file offset alignment when it checks
> virtual address and length, to stop MAP_FIXED with a bad huge offset from
> unmapping before it fails further down.  PowerPC should apply the same
> prepare_hugepage_range alignment checks as ia64 and all the others do.
> 
> Then none of the alignment checks in hugetlbfs_file_mmap are required
> (nor is the check for too small a mapping); but even so, move up setting
> of VM_HUGETLB and add a comment to warn of what David Gibson discovered -
> if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
> when unwinding from error will go the non-huge way, which may cause bad
> behaviour on architectures (powerpc and ia64) which segregate their huge
> mappings into a separate region of the address space.
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Adam Litke <agl@us.ibm.com>
-- 
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13 20:41               ` Hugh Dickins
  2006-11-13 22:07                 ` Adam Litke
@ 2006-11-13 23:53                 ` 'David Gibson'
  2006-11-14 23:48                 ` Bill Irwin
  2 siblings, 0 replies; 14+ messages in thread
From: 'David Gibson' @ 2006-11-13 23:53 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Adam Litke, Chen, Kenneth W, 'Christoph Lameter',
	'Andrew Morton',
	bill.irwin, linux-mm

On Mon, Nov 13, 2006 at 08:41:49PM +0000, Hugh Dickins wrote:
> On Mon, 13 Nov 2006, Adam Litke wrote:
> 
> > Looks good to me, notwithstanding the nano-nit below.
> > 
> > On Mon, 2006-11-13 at 17:00 +0000, Hugh Dickins wrote:
> > > +	/*
> > > +	 * vma alignment has already been checked by prepare_hugepage_range.
> > > +	 * If you add any error returns here, do so after setting VM_HUGETLB,
> > > +	 * so is_vm_huge_tlb_page tests below unmap_region go the right way
> > > +	 * when do_mmap_pgoff unwinds (may be important on powerpc and ia64).
> > > +	 */
> > 
> > Sorry.  This is hardly worth it, but the function referred to by this
> > comment is actually is_vm_hugetlb_page() :-/
> 
> Indeed, thanks!
> 
> [PATCH] hugetlb: prepare_hugepage_range check offset too
> 
> prepare_hugepage_range should check file offset alignment when it checks
> virtual address and length, to stop MAP_FIXED with a bad huge offset from
> unmapping before it fails further down.  PowerPC should apply the same
> prepare_hugepage_range alignment checks as ia64 and all the others do.
> 
> Then none of the alignment checks in hugetlbfs_file_mmap are required
> (nor is the check for too small a mapping); but even so, move up setting
> of VM_HUGETLB and add a comment to warn of what David Gibson discovered -
> if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
> when unwinding from error will go the non-huge way, which may cause bad
> behaviour on architectures (powerpc and ia64) which segregate their huge
> mappings into a separate region of the address space.

Looks pretty good.  There's still a certain amount of wierdness on
powerpc: we can back out after (irreversibly) switching chunks of the
address space over to use for hugepages.  I'll try to fix that up
later, but in any case it's certainly more comprehensive than my
original patch.

Acked-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13  7:35         ` Chen, Kenneth W
  2006-11-13 17:00           ` Hugh Dickins
@ 2006-11-14  0:19           ` 'David Gibson'
  1 sibling, 0 replies; 14+ messages in thread
From: 'David Gibson' @ 2006-11-14  0:19 UTC (permalink / raw)
  To: Chen, Kenneth W
  Cc: 'Christoph Lameter', 'Andrew Morton',
	'Hugh Dickins', bill.irwin, 'Adam Litke',
	linux-mm

On Sun, Nov 12, 2006 at 11:35:28PM -0800, Chen, Kenneth W wrote:
> David Gibson wrote on Sunday, November 12, 2006 10:23 PM
> > > > Probably, yes, although it's yet another "if (hugepage)
> > > > specialcase()".  But I still think we want the above patch as well.
> > > > It will make sure we correctly back out from any other possible
> > > > failure cases in hugetlbfs_file_mmap() - ones I haven't thought of, or
> > > > which get added later.
> > > 
> > > 
> > > Something like this?  I haven't tested it yet.  But looks plausible
> > > because we already have if is_file_hugepages() in the generic path.
> > 
> > Um.. if you're going to test pgoff here, you should also test the
> > address.
> 
> prepare_hugepage_range() should catch misaligned memory address, right?
> What more does get_unmapped_area() need to test?
> 
> 
> > Oh, and that point is too late to catch MAP_FIXED mappings.
> 
> I don't understand what you mean by that.

Sorry, old data.  I was thinking of the old get_unmapped_area() which
had entirely separate paths for MAP_FIXED and otherwise.  And I had my
sense inverted as well.

It used to be that prepare_hugepage_range() was called *only* in the
MAP_FIXED case (it being assumed that the hugetlb specific
get_unmapped_area call would do any necessary preparation).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [hugepage] Fix unmap_and_free_vma backout path
  2006-11-13 20:41               ` Hugh Dickins
  2006-11-13 22:07                 ` Adam Litke
  2006-11-13 23:53                 ` 'David Gibson'
@ 2006-11-14 23:48                 ` Bill Irwin
  2 siblings, 0 replies; 14+ messages in thread
From: Bill Irwin @ 2006-11-14 23:48 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Adam Litke, Chen, Kenneth W, 'David Gibson',
	'Christoph Lameter', 'Andrew Morton',
	bill.irwin, linux-mm

On Mon, Nov 13, 2006 at 08:41:49PM +0000, Hugh Dickins wrote:
> [PATCH] hugetlb: prepare_hugepage_range check offset too
> prepare_hugepage_range should check file offset alignment when it checks
> virtual address and length, to stop MAP_FIXED with a bad huge offset from
> unmapping before it fails further down.  PowerPC should apply the same
> prepare_hugepage_range alignment checks as ia64 and all the others do.
> Then none of the alignment checks in hugetlbfs_file_mmap are required
> (nor is the check for too small a mapping); but even so, move up setting
> of VM_HUGETLB and add a comment to warn of what David Gibson discovered -
> if hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
> when unwinding from error will go the non-huge way, which may cause bad
> behaviour on architectures (powerpc and ia64) which segregate their huge
> mappings into a separate region of the address space.
> Signed-off-by: Hugh Dickins <hugh@veritas.com>

Acked-by: William Irwin <wli@holomorphy.com>


-- wli

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-11-14 23:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-11-13  5:13 [hugepage] Fix unmap_and_free_vma backout path 'David Gibson'
2006-11-13  5:29 ` Christoph Lameter
2006-11-13  5:57   ` 'David Gibson'
2006-11-13  6:03     ` Chen, Kenneth W
2006-11-13  6:22       ` 'David Gibson'
2006-11-13  7:35         ` Chen, Kenneth W
2006-11-13 17:00           ` Hugh Dickins
2006-11-13 17:38             ` Chen, Kenneth W
2006-11-13 20:34             ` Adam Litke
2006-11-13 20:41               ` Hugh Dickins
2006-11-13 22:07                 ` Adam Litke
2006-11-13 23:53                 ` 'David Gibson'
2006-11-14 23:48                 ` Bill Irwin
2006-11-14  0:19           ` 'David Gibson'

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox