linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Patch 0/6] Mlock:  doc, patch grouping and error return cleanups
@ 2008-08-19 21:05 Lee Schermerhorn
  2008-08-19 21:05 ` [PATCH 1/6] Mlock: fix __mlock_vma_pages_range comment block Lee Schermerhorn, KOSAKI Motohiro
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Lee Schermerhorn @ 2008-08-19 21:05 UTC (permalink / raw)
  To: akpm; +Cc: riel, linux-mm, kosaki.motohiro

The six patches introduced by this message are against:

	2.6.27-rc3-mmotm-080819-0259

These patches replace the series of 5 RFC patches posted by Kosaki
Motohiro at:

	http://marc.info/?l=linux-mm&m=121843816412096&w=4


Patch 1/6 is a rework of Kosaki-san's cleanup of the __mlock_vma_pages_range()
comment block.  I tried to follow kerneldoc format.  Randy will tell me if
I made a mistake :)

Patch 2/6 is a rework of Kosaki-san's patch to remove the locked_vm 
adjustments for "special vmas" during mmap() processing.  Kosaki-san
wanted to "kill" this adjustment.  After discussion, he requested that
it be resubmitted as a separate patch.  This is the first step in providing
the separate patch [even tho' I consider this part of correctly "handling
mlocked pages during mmap()..."].

Patch 3/6 resubmits the locked_vm adjustment during mmap(MAP_LOCKED)) to
match the explicit mlock() behavior.

Patch 4/6 is Kosaki-san's patch to change the error return for mlock
when, after downgrading the mmap semaphore to read during population of
the vma and switching back to write lock as our callers expect, the 
vma that we just locked no longer covers the range we expected.  See
the description.

Patch 5/6 backs out a mainline patch to make_pages_present() to adjust
the error return to match the Posix specification for mlock error
returns.  make_pages_present() is used by other than mlock, so this
isn't really the appropriate place to make the change, even tho'
apparently only mlock() looks at the return value from make_pages_present().

Patch 6/6 fixes the mlock error return to be Posixly Correct in the
appropriate [IMO] paths in mlock.c.  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] Mlock:  fix __mlock_vma_pages_range comment block
  2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
@ 2008-08-19 21:05 ` Lee Schermerhorn, KOSAKI Motohiro
  2008-08-19 21:05 ` [PATCH 2/6] Mlock: backout locked_vm adjustment during mmap() Lee Schermerhorn, KOSAKI Motohiro
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Lee Schermerhorn, KOSAKI Motohiro @ 2008-08-19 21:05 UTC (permalink / raw)
  To: akpm; +Cc: riel, linux-mm, kosaki.motohiro

Against:  2.6.27-rc3-mmotm-080819-0259:

fix to mmap-handle-mlocked-pages-during-map-remap-unmap.patch

__mlock_vma_pages_range comment block needs updating:
 - it fails to mention the mlock parameter
 - no longer requires that mmap_sem be held for write.

following patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

Index: linux-2.6.27-rc3-mmotm/mm/mlock.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/mlock.c	2008-08-18 11:41:19.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/mlock.c	2008-08-18 11:48:13.000000000 -0400
@@ -112,12 +112,20 @@ static void munlock_vma_page(struct page
 	}
 }
 
-/*
- * mlock a range of pages in the vma.
+/**
+ * __mlock_vma_pages_range() -  mlock/munlock a range of pages in the vma.
+ * @vma:   target vma
+ * @start: start address
+ * @end:   end address
+ * @mlock: 0 indicate munlock, otherwise mlock.
+ *
+ * If @mlock == 0, unlock an mlocked range;
+ * else mlock the range of pages.  This takes care of making the pages present ,
+ * too.
  *
- * This takes care of making the pages present too.
+ * return 0 on success, negative error code on error.
  *
- * vma->vm_mm->mmap_sem must be held for write.
+ * vma->vm_mm->mmap_sem must be held for at least read.
  */
 static int __mlock_vma_pages_range(struct vm_area_struct *vma,
 				   unsigned long start, unsigned long end,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/6] Mlock: backout locked_vm adjustment during mmap()
  2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
  2008-08-19 21:05 ` [PATCH 1/6] Mlock: fix __mlock_vma_pages_range comment block Lee Schermerhorn, KOSAKI Motohiro
@ 2008-08-19 21:05 ` Lee Schermerhorn, KOSAKI Motohiro
  2008-08-19 21:05 ` [PATCH 3/6] Mlock: resubmit locked_vm adjustment as separate patch Lee Schermerhorn, Lee Schermerhorn
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Lee Schermerhorn, KOSAKI Motohiro @ 2008-08-19 21:05 UTC (permalink / raw)
  To: akpm; +Cc: riel, linux-mm, kosaki.motohiro

Against: 2.6.27-rc3-mmotm-080819-0259

can/should be folded into:
	mmap-handle-mlocked-pages-during-map-remap-unmap.patch

Backout mmap() path locked_vm accounting adjustment from the "handle
mlocked pages during map/remap/unmap" patch.  Will resubmit as separate
patch with its own description.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c |   19 ++++++-------------
 mm/mmap.c  |   19 ++++++++-----------
 2 files changed, 14 insertions(+), 24 deletions(-)

Index: linux-2.6.27-rc3-mmotm/mm/mlock.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/mlock.c	2008-08-18 12:38:20.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/mlock.c	2008-08-18 12:44:16.000000000 -0400
@@ -246,7 +246,7 @@ int mlock_vma_pages_range(struct vm_area
 			unsigned long start, unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	int nr_pages = (end - start) / PAGE_SIZE;
+	int error = 0;
 	BUG_ON(!(vma->vm_flags & VM_LOCKED));
 
 	/*
@@ -259,8 +259,7 @@ int mlock_vma_pages_range(struct vm_area
 			is_vm_hugetlb_page(vma) ||
 			vma == get_gate_vma(current))) {
 		downgrade_write(&mm->mmap_sem);
-		nr_pages = __mlock_vma_pages_range(vma, start, end, 1);
-
+		error = __mlock_vma_pages_range(vma, start, end, 1);
 		up_read(&mm->mmap_sem);
 		/* vma can change or disappear */
 		down_write(&mm->mmap_sem);
@@ -268,22 +267,20 @@ int mlock_vma_pages_range(struct vm_area
 		/* non-NULL vma must contain @start, but need to check @end */
 		if (!vma ||  end > vma->vm_end)
 			return -EAGAIN;
-		return nr_pages;
+		return error;
 	}
 
 	/*
 	 * User mapped kernel pages or huge pages:
 	 * make these pages present to populate the ptes, but
-	 * fall thru' to reset VM_LOCKED--no need to unlock, and
-	 * return nr_pages so these don't get counted against task's
-	 * locked limit.  huge pages are already counted against
-	 * locked vm limit.
+	 * fall thru' to reset VM_LOCKED so we don't try to munlock
+	 * this vma during munmap()/munlock().
 	 */
 	make_pages_present(start, end);
 
 no_mlock:
 	vma->vm_flags &= ~VM_LOCKED;	/* and don't come back! */
-	return nr_pages;		/* pages NOT mlocked */
+	return error;
 }
 
 
@@ -372,10 +369,6 @@ success:
 		downgrade_write(&mm->mmap_sem);
 
 		ret = __mlock_vma_pages_range(vma, start, end, 1);
-		if (ret > 0) {
-			mm->locked_vm -= ret;
-			ret = 0;
-		}
 		/*
 		 * Need to reacquire mmap sem in write mode, as our callers
 		 * expect this.  We have no support for atomically upgrading
Index: linux-2.6.27-rc3-mmotm/mm/mmap.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/mmap.c	2008-08-18 12:38:20.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/mmap.c	2008-08-18 12:52:21.000000000 -0400
@@ -1224,10 +1224,10 @@ out:
 		/*
 		 * makes pages present; downgrades, drops, reacquires mmap_sem
 		 */
-		int nr_pages = mlock_vma_pages_range(vma, addr, addr + len);
-		if (nr_pages < 0)
-			return nr_pages;	/* vma gone! */
-		mm->locked_vm += (len >> PAGE_SHIFT) - nr_pages;
+		int error = mlock_vma_pages_range(vma, addr, addr + len);
+		if (error < 0)
+			return error;	/* vma gone! */
+		mm->locked_vm += (len >> PAGE_SHIFT);
 	} else if ((flags & MAP_POPULATE) && !(flags & MAP_NONBLOCK))
 		make_pages_present(addr, addr + len);
 	return addr;
@@ -1702,8 +1702,7 @@ find_extend_vma(struct mm_struct *mm, un
 	if (!prev || expand_stack(prev, addr))
 		return NULL;
 	if (prev->vm_flags & VM_LOCKED) {
-		int nr_pages = mlock_vma_pages_range(prev, addr, prev->vm_end);
-		if (nr_pages < 0)
+		if (mlock_vma_pages_range(prev, addr, prev->vm_end) < 0)
 			return NULL;	/* vma gone! */
 	}
 	return prev;
@@ -1732,8 +1731,7 @@ find_extend_vma(struct mm_struct * mm, u
 	if (expand_stack(vma, addr))
 		return NULL;
 	if (vma->vm_flags & VM_LOCKED) {
-		int nr_pages = mlock_vma_pages_range(vma, addr, start);
-		if (nr_pages < 0)
+		if (mlock_vma_pages_range(vma, addr, start) < 0)
 			return NULL;	/* vma gone! */
 	}
 	return vma;
@@ -2068,9 +2066,8 @@ unsigned long do_brk(unsigned long addr,
 out:
 	mm->total_vm += len >> PAGE_SHIFT;
 	if (flags & VM_LOCKED) {
-		int nr_pages = mlock_vma_pages_range(vma, addr, addr + len);
-		if (nr_pages >= 0)
-			mm->locked_vm += (len >> PAGE_SHIFT) - nr_pages;
+		if (mlock_vma_pages_range(vma, addr, addr + len) >= 0)
+			mm->locked_vm += (len >> PAGE_SHIFT);
 	}
 	return addr;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/6] Mlock: resubmit locked_vm adjustment as separate patch
  2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
  2008-08-19 21:05 ` [PATCH 1/6] Mlock: fix __mlock_vma_pages_range comment block Lee Schermerhorn, KOSAKI Motohiro
  2008-08-19 21:05 ` [PATCH 2/6] Mlock: backout locked_vm adjustment during mmap() Lee Schermerhorn, KOSAKI Motohiro
@ 2008-08-19 21:05 ` Lee Schermerhorn, Lee Schermerhorn
  2008-08-19 21:05 ` [PATCH 4/6] Mlock: fix return value for munmap/mlock vma race Lee Schermerhorn, KOSAKI Motohiro
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Lee Schermerhorn, Lee Schermerhorn @ 2008-08-19 21:05 UTC (permalink / raw)
  To: akpm; +Cc: riel, linux-mm, kosaki.motohiro

Against:  2.6.27-rc3-mmotm-080816-0202

atop patch:
	mmap-handle-mlocked-pages-during-map-remap-unmap.patch
with locked_vm adjustment backout patch.

Adjust mm->locked_vm in the mmap(MAP_LOCKED) path to match mlock()
behavior and VM_LOCKED flag setting.

Broken out as separate patch.

During mlock*(), mlock_fixup() adjusts locked_vm as appropriate,
based on the type of vma.  For the "special" vmas--those whose 
pages we don't actually mark as PageMlocked()--VM_LOCKED is not
set, so that we don't attempt to munlock the pages during munmap
or munlock, and so we don't need to duplicate the vma type filtering
there.  These vmas are not included in locked_vm by mlock_fixup().

During mmap() and vma extension, locked_vm is adjusted outside of the
mlock functions.  This patch enhances those path to match the behavior
of mlock for the special vmas.  Return number of pages NOT mlocked from
mlock_vma_pages_range() [0 or positive].  Share the return value with
possible error code [negative].  Caller adjusts locked_vm by non-negative
return value.  For "special" vmas, this will include all pages mapped
by the vma.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/internal.h |    2 +-
 mm/mlock.c    |   26 +++++++++++++++++---------
 mm/mmap.c     |    8 ++++----
 3 files changed, 22 insertions(+), 14 deletions(-)

Index: linux-2.6.27-rc3-mmotm/mm/mlock.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/mlock.c	2008-08-18 13:59:22.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/mlock.c	2008-08-18 14:02:43.000000000 -0400
@@ -127,7 +127,7 @@ static void munlock_vma_page(struct page
  *
  * vma->vm_mm->mmap_sem must be held for at least read.
  */
-static int __mlock_vma_pages_range(struct vm_area_struct *vma,
+static long __mlock_vma_pages_range(struct vm_area_struct *vma,
 				   unsigned long start, unsigned long end,
 				   int mlock)
 {
@@ -229,7 +229,7 @@ static int __mlock_vma_pages_range(struc
 /*
  * Just make pages present if VM_LOCKED.  No-op if unlocking.
  */
-static int __mlock_vma_pages_range(struct vm_area_struct *vma,
+static long __mlock_vma_pages_range(struct vm_area_struct *vma,
 				   unsigned long start, unsigned long end,
 				   int mlock)
 {
@@ -242,11 +242,11 @@ static int __mlock_vma_pages_range(struc
 /*
  * mlock all pages in this vma range.  For mmap()/mremap()/...
  */
-int mlock_vma_pages_range(struct vm_area_struct *vma,
+long mlock_vma_pages_range(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end)
 {
 	struct mm_struct *mm = vma->vm_mm;
-	int error = 0;
+	int nr_pages = (end - start) / PAGE_SIZE;
 	BUG_ON(!(vma->vm_flags & VM_LOCKED));
 
 	/*
@@ -259,7 +259,9 @@ int mlock_vma_pages_range(struct vm_area
 			is_vm_hugetlb_page(vma) ||
 			vma == get_gate_vma(current))) {
 		downgrade_write(&mm->mmap_sem);
-		error = __mlock_vma_pages_range(vma, start, end, 1);
+
+		nr_pages = __mlock_vma_pages_range(vma, start, end, 1);
+
 		up_read(&mm->mmap_sem);
 		/* vma can change or disappear */
 		down_write(&mm->mmap_sem);
@@ -267,20 +269,22 @@ int mlock_vma_pages_range(struct vm_area
 		/* non-NULL vma must contain @start, but need to check @end */
 		if (!vma ||  end > vma->vm_end)
 			return -EAGAIN;
-		return error;
+		return nr_pages;
 	}
 
 	/*
 	 * User mapped kernel pages or huge pages:
 	 * make these pages present to populate the ptes, but
-	 * fall thru' to reset VM_LOCKED so we don't try to munlock
-	 * this vma during munmap()/munlock().
+	 * fall thru' to reset VM_LOCKED--no need to unlock, and
+	 * return nr_pages so these don't get counted against task's
+	 * locked limit.  huge pages are already counted against
+	 * locked vm limit.
 	 */
 	make_pages_present(start, end);
 
 no_mlock:
 	vma->vm_flags &= ~VM_LOCKED;	/* and don't come back! */
-	return error;
+	return nr_pages;		/* error or pages NOT mlocked */
 }
 
 
@@ -369,6 +373,10 @@ success:
 		downgrade_write(&mm->mmap_sem);
 
 		ret = __mlock_vma_pages_range(vma, start, end, 1);
+		if (ret > 0) {
+			mm->locked_vm -= ret;
+			ret = 0;
+		}
 		/*
 		 * Need to reacquire mmap sem in write mode, as our callers
 		 * expect this.  We have no support for atomically upgrading
Index: linux-2.6.27-rc3-mmotm/mm/mmap.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/mmap.c	2008-08-18 13:59:22.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/mmap.c	2008-08-18 14:01:36.000000000 -0400
@@ -1224,10 +1224,10 @@ out:
 		/*
 		 * makes pages present; downgrades, drops, reacquires mmap_sem
 		 */
-		int error = mlock_vma_pages_range(vma, addr, addr + len);
-		if (error < 0)
-			return error;	/* vma gone! */
-		mm->locked_vm += (len >> PAGE_SHIFT);
+		long nr_pages = mlock_vma_pages_range(vma, addr, addr + len);
+		if (nr_pages < 0)
+			return nr_pages;	/* vma gone! */
+		mm->locked_vm += (len >> PAGE_SHIFT) - nr_pages;
 	} else if ((flags & MAP_POPULATE) && !(flags & MAP_NONBLOCK))
 		make_pages_present(addr, addr + len);
 	return addr;
Index: linux-2.6.27-rc3-mmotm/mm/internal.h
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/internal.h	2008-08-18 13:59:22.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/internal.h	2008-08-18 14:01:36.000000000 -0400
@@ -61,7 +61,7 @@ static inline unsigned long page_order(s
 	return page_private(page);
 }
 
-extern int mlock_vma_pages_range(struct vm_area_struct *vma,
+extern long mlock_vma_pages_range(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end);
 extern void munlock_vma_pages_range(struct vm_area_struct *vma,
 			unsigned long start, unsigned long end);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/6] Mlock:  fix return value for munmap/mlock vma race
  2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
                   ` (2 preceding siblings ...)
  2008-08-19 21:05 ` [PATCH 3/6] Mlock: resubmit locked_vm adjustment as separate patch Lee Schermerhorn, Lee Schermerhorn
@ 2008-08-19 21:05 ` Lee Schermerhorn, KOSAKI Motohiro
  2008-08-20  8:31   ` KOSAKI Motohiro
  2008-08-19 21:05 ` [PATCH 5/6] Mlock: revert mainline handling of mlock error return Lee Schermerhorn, Lee Schermerhorn
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Lee Schermerhorn, KOSAKI Motohiro @ 2008-08-19 21:05 UTC (permalink / raw)
  To: akpm; +Cc: riel, linux-mm, kosaki.motohiro

Against 2.6.27-rc3-mmotm-080819-0259

Now, We call downgrade_write(&mm->mmap_sem) at begin of mlock.
It increase mlock scalability.

But if mlock and munmap conflict happend, We can find vma gone.
At that time, kernel should return ENOMEM because mlock after munmap return ENOMEM.
(in addition, EAGAIN indicate "please try again", but mlock() called again cause error again)

This problem is theoretical issue.
I can't reproduce that vma gone on my box, but fixes is better.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/mlock.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.27-rc3-mmotm/mm/mlock.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/mlock.c	2008-08-18 14:50:05.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/mlock.c	2008-08-18 14:50:26.000000000 -0400
@@ -268,7 +268,7 @@ long mlock_vma_pages_range(struct vm_are
 		vma = find_vma(mm, start);
 		/* non-NULL vma must contain @start, but need to check @end */
 		if (!vma ||  end > vma->vm_end)
-			return -EAGAIN;
+			return -ENOMEM;
 		return nr_pages;
 	}
 
@@ -389,7 +389,7 @@ success:
 		*prev = find_vma(mm, start);
 		/* non-NULL *prev must contain @start, but need to check @end */
 		if (!(*prev) || end > (*prev)->vm_end)
-			ret = -EAGAIN;
+			ret = -ENOMEM;
 	} else {
 		/*
 		 * TODO:  for unlocking, pages will already be resident, so

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/6] Mlock:  revert mainline handling of mlock error return
  2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
                   ` (3 preceding siblings ...)
  2008-08-19 21:05 ` [PATCH 4/6] Mlock: fix return value for munmap/mlock vma race Lee Schermerhorn, KOSAKI Motohiro
@ 2008-08-19 21:05 ` Lee Schermerhorn, Lee Schermerhorn
  2008-08-20  7:20   ` KOSAKI Motohiro
  2008-08-19 21:05 ` [PATCH 6/6] Mlock: make mlock error return Posixly Correct Lee Schermerhorn, KOSAKI Motohiro
  2008-08-20  7:21 ` [Patch 0/6] Mlock: doc, patch grouping and error return cleanups KOSAKI Motohiro
  6 siblings, 1 reply; 18+ messages in thread
From: Lee Schermerhorn, Lee Schermerhorn @ 2008-08-19 21:05 UTC (permalink / raw)
  To: akpm; +Cc: riel, linux-mm, kosaki.motohiro

Against:  2.6.27-rc3-mmotm-080819-0259

This can apply atop the mmotm series or anywhere after, say,
mlock-count-attempts-to-free-mlocked-page-2.patch.


Revert the change to make_page_present() error return.

This change is intended to make mlock() error returns correct.
make_page_present() is a lower level function used by more than
mlock(), although only mlock() currently examines the return
value.

Subsequent patch[es] will add this error return fixup in an mlock
specific path.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/memory.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

Index: linux-2.6.27-rc3-mmotm/mm/memory.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/memory.c	2008-08-18 14:50:36.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/memory.c	2008-08-18 14:53:15.000000000 -0400
@@ -2819,19 +2819,9 @@ int make_pages_present(unsigned long add
 	len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;
 	ret = get_user_pages(current, current->mm, addr,
 			len, write, 0, NULL, NULL);
-	if (ret < 0) {
-		/*
-		   SUS require strange return value to mlock
-		    - invalid addr generate to ENOMEM.
-		    - out of memory should generate EAGAIN.
-		*/
-		if (ret == -EFAULT)
-			ret = -ENOMEM;
-		else if (ret == -ENOMEM)
-			ret = -EAGAIN;
+	if (ret < 0)
 		return ret;
-	}
-	return ret == len ? 0 : -ENOMEM;
+	return ret == len ? 0 : -1;
 }
 
 #if !defined(__HAVE_ARCH_GATE_AREA)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 6/6] Mlock:  make mlock error return Posixly Correct
  2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
                   ` (4 preceding siblings ...)
  2008-08-19 21:05 ` [PATCH 5/6] Mlock: revert mainline handling of mlock error return Lee Schermerhorn, Lee Schermerhorn
@ 2008-08-19 21:05 ` Lee Schermerhorn, KOSAKI Motohiro
  2008-08-20  8:35   ` KOSAKI Motohiro
  2008-08-20 10:17   ` Pekka Enberg
  2008-08-20  7:21 ` [Patch 0/6] Mlock: doc, patch grouping and error return cleanups KOSAKI Motohiro
  6 siblings, 2 replies; 18+ messages in thread
From: Lee Schermerhorn, KOSAKI Motohiro @ 2008-08-19 21:05 UTC (permalink / raw)
  To: akpm; +Cc: riel, linux-mm, kosaki.motohiro

Against:  2.6.27-rc3-mmotm-080816-0202

Rework Posix error return for mlock().

Translate get_user_pages() error to posix specified error codes.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>

 mm/memory.c |    2 +-
 mm/mlock.c  |   27 ++++++++++++++++++++++++---
 2 files changed, 25 insertions(+), 4 deletions(-)

Index: linux-2.6.27-rc3-mmotm/mm/mlock.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/mlock.c	2008-08-18 15:57:11.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/mlock.c	2008-08-18 15:57:39.000000000 -0400
@@ -143,6 +143,18 @@ static void munlock_vma_page(struct page
 	}
 }
 
+/*
+ * convert get_user_pages() return value to posix mlock() error
+ */
+static int __mlock_posix_error_return(long retval)
+{
+	if (retval == -EFAULT)
+		retval = -ENOMEM;
+	else if (retval == -ENOMEM)
+		retval = -EAGAIN;
+	return retval;
+}
+
 /**
  * __mlock_vma_pages_range() -  mlock/munlock a range of pages in the vma.
  * @vma:   target vma
@@ -209,8 +221,13 @@ static long __mlock_vma_pages_range(stru
 		 * or for addresses that map beyond end of a file.
 		 * We'll mlock the the pages if/when they get faulted in.
 		 */
-		if (ret < 0)
+		if (ret < 0) {
+			if (vma->vm_flags & VM_NONLINEAR)
+				ret = 0;
+			else
+				ret = __mlock_posix_error_return(ret);
 			break;
+		}
 		if (ret == 0) {
 			/*
 			 * We know the vma is there, so the only time
@@ -248,6 +265,7 @@ static long __mlock_vma_pages_range(stru
 			addr += PAGE_SIZE;	/* for next get_user_pages() */
 			nr_pages--;
 		}
+		ret = 0;
 	}
 
 	lru_add_drain_all();	/* to update stats */
@@ -264,8 +282,11 @@ static long __mlock_vma_pages_range(stru
 				   unsigned long start, unsigned long end,
 				   int mlock)
 {
-	if (mlock && (vma->vm_flags & VM_LOCKED))
-		make_pages_present(start, end);
+	if (mlock && (vma->vm_flags & VM_LOCKED)) {
+		long retval = make_pages_present(start, end);
+		if (retval < 0)
+			return  __mlock_posix_error_return(retval);
+	}
 	return 0;
 }
 #endif /* CONFIG_UNEVICTABLE_LRU */
Index: linux-2.6.27-rc3-mmotm/mm/memory.c
===================================================================
--- linux-2.6.27-rc3-mmotm.orig/mm/memory.c	2008-08-18 15:57:11.000000000 -0400
+++ linux-2.6.27-rc3-mmotm/mm/memory.c	2008-08-18 15:57:39.000000000 -0400
@@ -2821,7 +2821,7 @@ int make_pages_present(unsigned long add
 			len, write, 0, NULL, NULL);
 	if (ret < 0)
 		return ret;
-	return ret == len ? 0 : -1;
+	return ret == len ? 0 : -EFAULT;
 }
 
 #if !defined(__HAVE_ARCH_GATE_AREA)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] Mlock:  revert mainline handling of mlock error return
  2008-08-19 21:05 ` [PATCH 5/6] Mlock: revert mainline handling of mlock error return Lee Schermerhorn, Lee Schermerhorn
@ 2008-08-20  7:20   ` KOSAKI Motohiro
  2008-08-20  7:24     ` KOSAKI Motohiro
  0 siblings, 1 reply; 18+ messages in thread
From: KOSAKI Motohiro @ 2008-08-20  7:20 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: kosaki.motohiro, akpm, riel, linux-mm

Hi

> +	if (ret < 0)
>  		return ret;
> -	}
> -	return ret == len ? 0 : -ENOMEM;
> +	return ret == len ? 0 : -1;

Please don't use "-1".
user process interpret -1 as EPERM.

Yes, I know it isn't introduce by you.
it exist in original make_pages_present().



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Patch 0/6] Mlock:  doc, patch grouping and error return cleanups
  2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
                   ` (5 preceding siblings ...)
  2008-08-19 21:05 ` [PATCH 6/6] Mlock: make mlock error return Posixly Correct Lee Schermerhorn, KOSAKI Motohiro
@ 2008-08-20  7:21 ` KOSAKI Motohiro
  6 siblings, 0 replies; 18+ messages in thread
From: KOSAKI Motohiro @ 2008-08-20  7:21 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: kosaki.motohiro, akpm, riel, linux-mm

Hi

Thank you for great help!
I'll review soon!!

> The six patches introduced by this message are against:
> 
> 	2.6.27-rc3-mmotm-080819-0259
> 
> These patches replace the series of 5 RFC patches posted by Kosaki
> Motohiro at:
> 
> 	http://marc.info/?l=linux-mm&m=121843816412096&w=4



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] Mlock:  revert mainline handling of mlock error return
  2008-08-20  7:20   ` KOSAKI Motohiro
@ 2008-08-20  7:24     ` KOSAKI Motohiro
  0 siblings, 0 replies; 18+ messages in thread
From: KOSAKI Motohiro @ 2008-08-20  7:24 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Lee Schermerhorn, akpm, riel, linux-mm

> Hi
> 
> > +	if (ret < 0)
> >  		return ret;
> > -	}
> > -	return ret == len ? 0 : -ENOMEM;
> > +	return ret == len ? 0 : -1;
> 
> Please don't use "-1".
> user process interpret -1 as EPERM.

Oops, sorry, 
It is fixed by [6/6].


> Yes, I know it isn't introduce by you.
> it exist in original make_pages_present().
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] Mlock:  fix return value for munmap/mlock vma race
  2008-08-19 21:05 ` [PATCH 4/6] Mlock: fix return value for munmap/mlock vma race Lee Schermerhorn, KOSAKI Motohiro
@ 2008-08-20  8:31   ` KOSAKI Motohiro
  0 siblings, 0 replies; 18+ messages in thread
From: KOSAKI Motohiro @ 2008-08-20  8:31 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: kosaki.motohiro, akpm, riel, linux-mm

> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> Against 2.6.27-rc3-mmotm-080819-0259

just note.

atop patch:
	mlock-downgrade-mmap-sem-while-populating-mlocked-regions.patch


> 
> Now, We call downgrade_write(&mm->mmap_sem) at begin of mlock.
> It increase mlock scalability.
> 
> But if mlock and munmap conflict happend, We can find vma gone.
> At that time, kernel should return ENOMEM because mlock after munmap return ENOMEM.
> (in addition, EAGAIN indicate "please try again", but mlock() called again cause error again)
> 
> This problem is theoretical issue.
> I can't reproduce that vma gone on my box, but fixes is better.
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Mlock:  make mlock error return Posixly Correct
  2008-08-19 21:05 ` [PATCH 6/6] Mlock: make mlock error return Posixly Correct Lee Schermerhorn, KOSAKI Motohiro
@ 2008-08-20  8:35   ` KOSAKI Motohiro
  2008-08-20 16:24     ` Lee Schermerhorn
  2008-08-20 10:17   ` Pekka Enberg
  1 sibling, 1 reply; 18+ messages in thread
From: KOSAKI Motohiro @ 2008-08-20  8:35 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: kosaki.motohiro, akpm, riel, linux-mm

> From:  KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> Against:  2.6.27-rc3-mmotm-080816-0202
> 
> Rework Posix error return for mlock().
> 
> Translate get_user_pages() error to posix specified error codes.
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  mm/memory.c |    2 +-
>  mm/mlock.c  |   27 ++++++++++++++++++++++++---
>  2 files changed, 25 insertions(+), 4 deletions(-)
> 
> Index: linux-2.6.27-rc3-mmotm/mm/mlock.c
> ===================================================================
> --- linux-2.6.27-rc3-mmotm.orig/mm/mlock.c	2008-08-18 15:57:11.000000000 -0400
> +++ linux-2.6.27-rc3-mmotm/mm/mlock.c	2008-08-18 15:57:39.000000000 -0400
> @@ -143,6 +143,18 @@ static void munlock_vma_page(struct page
>  	}
>  }
>  
> +/*
> + * convert get_user_pages() return value to posix mlock() error
> + */
> +static int __mlock_posix_error_return(long retval)
> +{
> +	if (retval == -EFAULT)
> +		retval = -ENOMEM;
> +	else if (retval == -ENOMEM)
> +		retval = -EAGAIN;
> +	return retval;
> +}
> +
>  /**
>   * __mlock_vma_pages_range() -  mlock/munlock a range of pages in the vma.
>   * @vma:   target vma
> @@ -209,8 +221,13 @@ static long __mlock_vma_pages_range(stru
>  		 * or for addresses that map beyond end of a file.
>  		 * We'll mlock the the pages if/when they get faulted in.
>  		 */
> -		if (ret < 0)
> +		if (ret < 0) {
> +			if (vma->vm_flags & VM_NONLINEAR)
> +				ret = 0;
> +			else
> +				ret = __mlock_posix_error_return(ret);
>  			break;
> +		}
>  		if (ret == 0) {
>  			/*
>  			 * We know the vma is there, so the only time

__mlock_vma_pages_range is used by mmap() and mlock().

mlock case 

	sys_mlock
		do_mlock
			mlock_fixup
				__mlock_vma_pages_range

mmap case

	do_mmap_pgoff
		mmap_region
			mlock_vma_pages_range
				__mlock_vma_pages_range


mlock() need error code if vma permission failure happend.
but mmap() (and remap_pages_range(), etc..) should ignore it.

So, mlock_vma_pages_range() should ignore __mlock_vma_pages_range()'s error code.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Mlock: make mlock error return Posixly Correct
  2008-08-19 21:05 ` [PATCH 6/6] Mlock: make mlock error return Posixly Correct Lee Schermerhorn, KOSAKI Motohiro
  2008-08-20  8:35   ` KOSAKI Motohiro
@ 2008-08-20 10:17   ` Pekka Enberg
  2008-08-20 16:26     ` Lee Schermerhorn
  1 sibling, 1 reply; 18+ messages in thread
From: Pekka Enberg @ 2008-08-20 10:17 UTC (permalink / raw)
  To: Lee Schermerhorn, KOSAKI Motohiro; +Cc: akpm, riel, linux-mm

Hi Lee,

On Wed, Aug 20, 2008 at 12:05 AM, Lee Schermerhorn
<lee.schermerhorn@hp.com> wrote:
> Against:  2.6.27-rc3-mmotm-080816-0202
>
> Rework Posix error return for mlock().
>
> Translate get_user_pages() error to posix specified error codes.

It would be nice if the changelog explained why this matters (i.e. why
we need this).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Mlock:  make mlock error return Posixly Correct
  2008-08-20  8:35   ` KOSAKI Motohiro
@ 2008-08-20 16:24     ` Lee Schermerhorn
  2008-08-20 17:58       ` KOSAKI Motohiro
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Schermerhorn @ 2008-08-20 16:24 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: akpm, riel, linux-mm

On Wed, 2008-08-20 at 17:35 +0900, KOSAKI Motohiro wrote:
> > From:  KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > 
> > Against:  2.6.27-rc3-mmotm-080816-0202
> > 
> > Rework Posix error return for mlock().
> > 
> > Translate get_user_pages() error to posix specified error codes.
> > 
> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
> > 
> >  mm/memory.c |    2 +-
> >  mm/mlock.c  |   27 ++++++++++++++++++++++++---
> >  2 files changed, 25 insertions(+), 4 deletions(-)
> > 
> > Index: linux-2.6.27-rc3-mmotm/mm/mlock.c
> > ===================================================================
> > --- linux-2.6.27-rc3-mmotm.orig/mm/mlock.c	2008-08-18 15:57:11.000000000 -0400
> > +++ linux-2.6.27-rc3-mmotm/mm/mlock.c	2008-08-18 15:57:39.000000000 -0400
> > @@ -143,6 +143,18 @@ static void munlock_vma_page(struct page
> >  	}
> >  }
> >  
> > +/*
> > + * convert get_user_pages() return value to posix mlock() error
> > + */
> > +static int __mlock_posix_error_return(long retval)
> > +{
> > +	if (retval == -EFAULT)
> > +		retval = -ENOMEM;
> > +	else if (retval == -ENOMEM)
> > +		retval = -EAGAIN;
> > +	return retval;
> > +}
> > +
> >  /**
> >   * __mlock_vma_pages_range() -  mlock/munlock a range of pages in the vma.
> >   * @vma:   target vma
> > @@ -209,8 +221,13 @@ static long __mlock_vma_pages_range(stru
> >  		 * or for addresses that map beyond end of a file.
> >  		 * We'll mlock the the pages if/when they get faulted in.
> >  		 */
> > -		if (ret < 0)
> > +		if (ret < 0) {
> > +			if (vma->vm_flags & VM_NONLINEAR)
> > +				ret = 0;
> > +			else
> > +				ret = __mlock_posix_error_return(ret);
> >  			break;
> > +		}
> >  		if (ret == 0) {
> >  			/*
> >  			 * We know the vma is there, so the only time
> 
> __mlock_vma_pages_range is used by mmap() and mlock().
> 
> mlock case 
> 
> 	sys_mlock
> 		do_mlock
> 			mlock_fixup
> 				__mlock_vma_pages_range
> 
> mmap case
> 
> 	do_mmap_pgoff
> 		mmap_region
> 			mlock_vma_pages_range
> 				__mlock_vma_pages_range
> 
> 
> mlock() need error code if vma permission failure happend.
> but mmap() (and remap_pages_range(), etc..) should ignore it.
> 
> So, mlock_vma_pages_range() should ignore __mlock_vma_pages_range()'s error code.

Well, I don't know whether we can trigger a vma permission failure
during mmap(MAP_LOCKED) or a remap within a VM_LOCKED vma, either of
which will end up calling mlock_vma_pages_range().  However, [after
rereading the man page] looks like we DO want to return any ENOMEM w/o
translating to EAGAIN.  Guess that means I should do the translation
from within for mlock() from within mlock_fixup().  remap_pages_range()
probably wants to explicitly ignore any error from the mlock callout.

Will resend.

Thanks,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Mlock: make mlock error return Posixly Correct
  2008-08-20 10:17   ` Pekka Enberg
@ 2008-08-20 16:26     ` Lee Schermerhorn
  0 siblings, 0 replies; 18+ messages in thread
From: Lee Schermerhorn @ 2008-08-20 16:26 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: KOSAKI Motohiro, akpm, riel, linux-mm

On Wed, 2008-08-20 at 13:17 +0300, Pekka Enberg wrote:
> Hi Lee,
> 
> On Wed, Aug 20, 2008 at 12:05 AM, Lee Schermerhorn
> <lee.schermerhorn@hp.com> wrote:
> > Against:  2.6.27-rc3-mmotm-080816-0202
> >
> > Rework Posix error return for mlock().
> >
> > Translate get_user_pages() error to posix specified error codes.
> 
> It would be nice if the changelog explained why this matters (i.e. why
> we need this).

OK.  This patch is actually moving code that was introduced upstream in
another earlier patch that explained the rationale.  I'll include a
pointer to that and a summary of why.

I need to respin this patch anyway.  I'll update the description when I
resend.

Thanks,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Mlock: make mlock error return Posixly Correct
  2008-08-20 16:24     ` Lee Schermerhorn
@ 2008-08-20 17:58       ` KOSAKI Motohiro
  2008-08-20 19:04         ` Lee Schermerhorn
  0 siblings, 1 reply; 18+ messages in thread
From: KOSAKI Motohiro @ 2008-08-20 17:58 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: akpm, riel, linux-mm

>> mlock() need error code if vma permission failure happend.
>> but mmap() (and remap_pages_range(), etc..) should ignore it.
>>
>> So, mlock_vma_pages_range() should ignore __mlock_vma_pages_range()'s error code.
>
> Well, I don't know whether we can trigger a vma permission failure
> during mmap(MAP_LOCKED) or a remap within a VM_LOCKED vma, either of
> which will end up calling mlock_vma_pages_range().  However, [after
> rereading the man page] looks like we DO want to return any ENOMEM w/o
> translating to EAGAIN.

Linus-tree implemetation does it?
Can we make reproduce programs?

So, I think implimentation compatibility is important than man pages
because many person think imcompatibility is bug ;-)


> Guess that means I should do the translation
> from within for mlock() from within mlock_fixup().  remap_pages_range()
> probably wants to explicitly ignore any error from the mlock callout.
>
> Will resend.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Mlock: make mlock error return Posixly Correct
  2008-08-20 17:58       ` KOSAKI Motohiro
@ 2008-08-20 19:04         ` Lee Schermerhorn
  2008-08-22 20:48           ` Lee Schermerhorn
  0 siblings, 1 reply; 18+ messages in thread
From: Lee Schermerhorn @ 2008-08-20 19:04 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: akpm, riel, linux-mm

On Thu, 2008-08-21 at 02:58 +0900, KOSAKI Motohiro wrote:
> >> mlock() need error code if vma permission failure happend.
> >> but mmap() (and remap_pages_range(), etc..) should ignore it.
> >>
> >> So, mlock_vma_pages_range() should ignore __mlock_vma_pages_range()'s error code.
> >
> > Well, I don't know whether we can trigger a vma permission failure
> > during mmap(MAP_LOCKED) or a remap within a VM_LOCKED vma, either of
> > which will end up calling mlock_vma_pages_range().  However, [after
> > rereading the man page] looks like we DO want to return any ENOMEM w/o
> > translating to EAGAIN.
> 
> Linus-tree implemetation does it?
> Can we make reproduce programs?

> So, I think implimentation compatibility is important than man pages
> because many person think imcompatibility is bug ;-)

Currently, the upstream kernel uses make_pages_present() and ignores the
return value.  However, the mmap(2) man page does say that it can return
ENOMEM for "no memory is available"--which is what get_user_pages() will
return in that situation and which I propose we pass on untranslated.

To make a reproducer, we'd need to call mmap() with MAP_LOCKED from a
program running on a system or in a mem control group with insufficient
available memory to satisfy the mapping.  [I think that's really the
only get_user_pages() error we should get back from
mlock_vma_pages_range().  We shouldn't get the 'EFAULT' as we're
mlocking know good vma addresses and we filter out VM_IO|VM_PFNMAP
vmas.]  If the system/container is close enough to its memory capacity
that mmap(MAP_LOCKED) is returning ENOMEM, the application probably has
other more important problems to deal with.

In any case, we'd only return an error when one occurs.  This might be
different from today's behavior which is to not tell the application
about the condition, even tho' it has occurred--e.g., insufficient
memory to satisfy MAP_LOCKED--but I think it's better for the
application to know about it than to pretend it didn't happen.  

So, I think we're OK, after I move the posix error return translation to
mlock_fixup().  [I'm testing the patch now.]

Lee
> 

> 
> > Guess that means I should do the translation
> > from within for mlock() from within mlock_fixup().  remap_pages_range()
> > probably wants to explicitly ignore any error from the mlock callout.
> >
> > Will resend.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] Mlock: make mlock error return Posixly Correct
  2008-08-20 19:04         ` Lee Schermerhorn
@ 2008-08-22 20:48           ` Lee Schermerhorn
  0 siblings, 0 replies; 18+ messages in thread
From: Lee Schermerhorn @ 2008-08-22 20:48 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: akpm, riel, linux-mm

On Wed, 2008-08-20 at 15:04 -0400, Lee Schermerhorn wrote:
> On Thu, 2008-08-21 at 02:58 +0900, KOSAKI Motohiro wrote:
> > >> mlock() need error code if vma permission failure happend.
> > >> but mmap() (and remap_pages_range(), etc..) should ignore it.
> > >>
> > >> So, mlock_vma_pages_range() should ignore __mlock_vma_pages_range()'s error code.
> > >
> > > Well, I don't know whether we can trigger a vma permission failure
> > > during mmap(MAP_LOCKED) or a remap within a VM_LOCKED vma, either of
> > > which will end up calling mlock_vma_pages_range().  However, [after
> > > rereading the man page] looks like we DO want to return any ENOMEM w/o
> > > translating to EAGAIN.
> > 
> > Linus-tree implemetation does it?
> > Can we make reproduce programs?
> 
> > So, I think implimentation compatibility is important than man pages
> > because many person think imcompatibility is bug ;-)
> 
> Currently, the upstream kernel uses make_pages_present() and ignores the
> return value.  However, the mmap(2) man page does say that it can return
> ENOMEM for "no memory is available"--which is what get_user_pages() will
> return in that situation and which I propose we pass on untranslated.
> 
> To make a reproducer, we'd need to call mmap() with MAP_LOCKED from a
> program running on a system or in a mem control group with insufficient
> available memory to satisfy the mapping.  [I think that's really the
> only get_user_pages() error we should get back from
> mlock_vma_pages_range().  We shouldn't get the 'EFAULT' as we're
> mlocking know good vma addresses and we filter out VM_IO|VM_PFNMAP
> vmas.]  If the system/container is close enough to its memory capacity
> that mmap(MAP_LOCKED) is returning ENOMEM, the application probably has
> other more important problems to deal with.
> 
> In any case, we'd only return an error when one occurs.  This might be
> different from today's behavior which is to not tell the application
> about the condition, even tho' it has occurred--e.g., insufficient
> memory to satisfy MAP_LOCKED--but I think it's better for the
> application to know about it than to pretend it didn't happen.  
> 
> So, I think we're OK, after I move the posix error return translation to
> mlock_fixup().  [I'm testing the patch now.]
> 

After further examination of the code and thinking about separating
issues, I agree with you that we should hide any pte population error
[returned by get_user_pages()] from mmap() callers and just return the
address altho' the range may not be fully populated.  

I reread the Single Unix Specification mmap() page:

	http://www.opengroup.org/onlinepubs/007908799/xsh/mmap.html

It appears that, technically, we should be returning the error code when
we can't mlock the pages, at least in the case where mlockall() has
previously been called for the process--the MAP_LOCKED flag is not
defined in the standard.  Since the current upstream code doesn't do
this, it's not a regression nor otherwise directly related to the
unevictable mlocked pages patches, so I'd like to handle it as a
separate patch.  It will require backing out the already mmap()ed vma.

mlock_vma_pages_range() still returns the error in the case where the
vma goes away when it switches the mmap_sem back to write.  This should
be VERY unlikely, and I suppose we could return the virtual address of a
missing vma to the application and let it find out via SIGSEGV that the
address is invalid when it tries to access the memory.  I chose to
return the error.

I'm sending out the revised series shortly.

Lee


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-08-22 20:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-19 21:05 [Patch 0/6] Mlock: doc, patch grouping and error return cleanups Lee Schermerhorn
2008-08-19 21:05 ` [PATCH 1/6] Mlock: fix __mlock_vma_pages_range comment block Lee Schermerhorn, KOSAKI Motohiro
2008-08-19 21:05 ` [PATCH 2/6] Mlock: backout locked_vm adjustment during mmap() Lee Schermerhorn, KOSAKI Motohiro
2008-08-19 21:05 ` [PATCH 3/6] Mlock: resubmit locked_vm adjustment as separate patch Lee Schermerhorn, Lee Schermerhorn
2008-08-19 21:05 ` [PATCH 4/6] Mlock: fix return value for munmap/mlock vma race Lee Schermerhorn, KOSAKI Motohiro
2008-08-20  8:31   ` KOSAKI Motohiro
2008-08-19 21:05 ` [PATCH 5/6] Mlock: revert mainline handling of mlock error return Lee Schermerhorn, Lee Schermerhorn
2008-08-20  7:20   ` KOSAKI Motohiro
2008-08-20  7:24     ` KOSAKI Motohiro
2008-08-19 21:05 ` [PATCH 6/6] Mlock: make mlock error return Posixly Correct Lee Schermerhorn, KOSAKI Motohiro
2008-08-20  8:35   ` KOSAKI Motohiro
2008-08-20 16:24     ` Lee Schermerhorn
2008-08-20 17:58       ` KOSAKI Motohiro
2008-08-20 19:04         ` Lee Schermerhorn
2008-08-22 20:48           ` Lee Schermerhorn
2008-08-20 10:17   ` Pekka Enberg
2008-08-20 16:26     ` Lee Schermerhorn
2008-08-20  7:21 ` [Patch 0/6] Mlock: doc, patch grouping and error return cleanups KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox