linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Question:  mmotm-081207 - Should i_mmap_writable go negative?
@ 2008-12-09 20:47 Lee Schermerhorn
  2008-12-10 13:27 ` Hugh Dickins
  0 siblings, 1 reply; 5+ messages in thread
From: Lee Schermerhorn @ 2008-12-09 20:47 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, linux-mm

I've been trying to figure out how to determine when the last shared
mapping [vma with VM_SHARED] is removed from a mmap()ed file.  Looking
at the sources, it appears that vma->vm_file->f_mapping->i_mmap_writable
counts the number of vmas with VM_SHARED mapping the file.  However, in
2.6.28-rc7-mmotm-081207, it appears that i_mmap_writable doesn't get
incremented when a task fork()s, and can go negative when the parent and
child both unmap the file.

I instrumented a couple of functions in mmap.c that increment and
decrement i_mmap_writable and then mapped a file sharable, fork(), and
unmapped the file in the parent and child, using my memtoy ad hoc test
program.  Here's what I see [again, on 2.6.28-rc7-mmotm-081207]--right
after boot, so my test file should have zero mappers:

--------

root@dropout(root):memtoy
memtoy pid:  3301
memtoy>file /tmp/zf1
memtoy>map zf1 shared

console:__vma_link_file:  vma: ffff8803fdc090b8 - mapping->i_mmap_writable: 0 -> 1

memtoy>child c1
memtoy:  child c1 - pid 3302

me: I would have expected to see i_mmap_writable incremented again here, but
me: saw no console output from my instrumentation.

memtoy>unmap zf1	# unmap in parent

console:__remove_shared_vm_struct:  vma: ffff8803fdc090b8 - mapping->i_mmap_writable: 1 -> 0
console:__remove_shared_vm_struct:  vma: ffff8803fdc090b8 - removed last shared mapping

memtoy>/c1 show
  _____address______ ____length____ ____offset____ prot  share  name
f 0x00007f000ae68000 0x000001000000 0x000000000000  rw- shared  /tmp/zf1

me:  child still has zf1 mapped

memtoy>/c1 unmap zf1	# unmap in child

console:__remove_shared_vm_struct:  vma: ffff8803fe5d3170 - mapping->i_mmap_writable: 0 -> -1

--------

So, the file's i_mmap_writable goes negative.  Is this expected?

If I remap the file, whether or not I restart memtoy, I see that it's
i_mmap_writable has remained negative:

-------
memtoy>map zf1		# map private [!shared] - no change in i_mmap_writable

console:__vma_link_file:  vma: ffff8805fd0590b8 - mapping->i_mmap_writable: -1 -> -1

memtoy>unmap zf1	# unmap:  no change in i_mmap_writable

console:__remove_shared_vm_struct:  vma: ffff8805fd0590b8 - mapping->i_mmap_writable: -1 -> -1

memtoy>map zf1 shared	# mmap shared, again

console:__vma_link_file:  vma: ffff8805fd0590b8 - mapping->i_mmap_writable: -1 -> 0

--------

I recall that this used to work [as I expected] at one time.

Lee




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question:  mmotm-081207 - Should i_mmap_writable go negative?
  2008-12-09 20:47 Question: mmotm-081207 - Should i_mmap_writable go negative? Lee Schermerhorn
@ 2008-12-10 13:27 ` Hugh Dickins
  2008-12-10 15:48   ` Lee Schermerhorn
  2008-12-10 20:48   ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
  0 siblings, 2 replies; 5+ messages in thread
From: Hugh Dickins @ 2008-12-10 13:27 UTC (permalink / raw)
  To: Lee Schermerhorn; +Cc: Andrew Morton, linux-mm

On Tue, 9 Dec 2008, Lee Schermerhorn wrote:
> I've been trying to figure out how to determine when the last shared
> mapping [vma with VM_SHARED] is removed from a mmap()ed file.  Looking
> at the sources, it appears that vma->vm_file->f_mapping->i_mmap_writable
> counts the number of vmas with VM_SHARED mapping the file.  However, in
> 2.6.28-rc7-mmotm-081207, it appears that i_mmap_writable doesn't get
> incremented when a task fork()s, and can go negative when the parent and
> child both unmap the file.

Wow, that's what I call a good find!

> I recall that this used to work [as I expected] at one time.

Are you sure?  It looks to me to have been wrong ever since I added
i_mmap_writable.  Not that I've tested yet at all.  Here's the patch
I intend to send Linus a.s.a.p: but I do need to test it, and also
extend the comment to say just what's been done wrong all this time.

Big thank you from all our users!
Hugh


Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
test in 2.6.7!  Bad bad bug, good good find.

The i_mmap_writable count must be incremented for VM_SHARED (just as
i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
when dup_mmap() copies the vma for fork: it has its own more optimal
version of __vma_link_file() and I missed this out.  So the count
was later going down to 0 (dangerous) when one end unmapped, then
wrapping negative (inefficient) when the other end unmapped.

Fix would be a two-liner, but mapping variable added and comment moved.

Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Not-Quite-Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 kernel/fork.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

--- 2.6.28-rc7/kernel/fork.c	2008-11-15 23:09:30.000000000 +0000
+++ linux/kernel/fork.c	2008-12-10 12:49:13.000000000 +0000
@@ -315,17 +315,20 @@ static int dup_mmap(struct mm_struct *mm
 		file = tmp->vm_file;
 		if (file) {
 			struct inode *inode = file->f_path.dentry->d_inode;
+			struct address_space *mapping = file->f_mapping;
+
 			get_file(file);
 			if (tmp->vm_flags & VM_DENYWRITE)
 				atomic_dec(&inode->i_writecount);
-
-			/* insert tmp into the share list, just after mpnt */
-			spin_lock(&file->f_mapping->i_mmap_lock);
+			spin_lock(&mapping->i_mmap_lock);
+			if (tmp->vm_flags & VM_SHARED)
+				mapping->i_mmap_writable++;
 			tmp->vm_truncate_count = mpnt->vm_truncate_count;
-			flush_dcache_mmap_lock(file->f_mapping);
+			flush_dcache_mmap_lock(mapping);
+			/* insert tmp into the share list, just after mpnt */
 			vma_prio_tree_add(tmp, mpnt);
-			flush_dcache_mmap_unlock(file->f_mapping);
-			spin_unlock(&file->f_mapping->i_mmap_lock);
+			flush_dcache_mmap_unlock(mapping);
+			spin_unlock(&mapping->i_mmap_lock);
 		}
 
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Question:  mmotm-081207 - Should i_mmap_writable go negative?
  2008-12-10 13:27 ` Hugh Dickins
@ 2008-12-10 15:48   ` Lee Schermerhorn
  2008-12-10 20:48   ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
  1 sibling, 0 replies; 5+ messages in thread
From: Lee Schermerhorn @ 2008-12-10 15:48 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-mm

On Wed, 2008-12-10 at 13:27 +0000, Hugh Dickins wrote:
> On Tue, 9 Dec 2008, Lee Schermerhorn wrote:
> > I've been trying to figure out how to determine when the last shared
> > mapping [vma with VM_SHARED] is removed from a mmap()ed file.  Looking
> > at the sources, it appears that vma->vm_file->f_mapping->i_mmap_writable
> > counts the number of vmas with VM_SHARED mapping the file.  However, in
> > 2.6.28-rc7-mmotm-081207, it appears that i_mmap_writable doesn't get
> > incremented when a task fork()s, and can go negative when the parent and
> > child both unmap the file.
> 
> Wow, that's what I call a good find!
> 
> > I recall that this used to work [as I expected] at one time.
> 
> Are you sure?  It looks to me to have been wrong ever since I added
> i_mmap_writable.  Not that I've tested yet at all.  Here's the patch
> I intend to send Linus a.s.a.p: but I do need to test it, and also
> extend the comment to say just what's been done wrong all this time.

Well, no, not sure.  A test that I had done at one time seemed to work
then.  I recently started testing this area again, and noticed
"unexpected behavior" which I tracked down to this, uh, feature.  I may
have been doing the mmap() after the fork before, while this time I
forked after mmap().  Anyway, I'll add in your patch at test with that.

Thanks,
Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] fix mapping_writably_mapped()
  2008-12-10 13:27 ` Hugh Dickins
  2008-12-10 15:48   ` Lee Schermerhorn
@ 2008-12-10 20:48   ` Hugh Dickins
  2008-12-10 22:22     ` Lee Schermerhorn
  1 sibling, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2008-12-10 20:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Lee Schermerhorn, linux-mm, linux-kernel,
	linux-arch, stable

Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
test in 2.6.7!  Bad bad bug, good good find.

The i_mmap_writable count must be incremented for VM_SHARED (just as
i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
when dup_mmap() copies the vma for fork: it has its own more optimal
version of __vma_link_file(), and I missed this out.  So the count
was later going down to 0 (dangerous) when one end unmapped, then
wrapping negative (inefficient) when the other end unmapped.

The only impact on x86 would have been that setting a mandatory lock on
a file which has at some time been opened O_RDWR and mapped MAP_SHARED
(but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN
when it should succeed, or succeed when it should fail.

But those architectures which rely on flush_dcache_page() to flush
userspace modifications back into the page before the kernel reads it,
may in some cases have skipped the flush after such a fork - though any
repetitive test will soon wrap the count negative, in which case it will
flush_dcache_page() unnecessarily.

Fix would be a two-liner, but mapping variable added, and comment moved.

Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---

 kernel/fork.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

--- 2.6.28-rc7/kernel/fork.c	2008-11-15 23:09:30.000000000 +0000
+++ linux/kernel/fork.c	2008-12-10 12:49:13.000000000 +0000
@@ -315,17 +315,20 @@ static int dup_mmap(struct mm_struct *mm
 		file = tmp->vm_file;
 		if (file) {
 			struct inode *inode = file->f_path.dentry->d_inode;
+			struct address_space *mapping = file->f_mapping;
+
 			get_file(file);
 			if (tmp->vm_flags & VM_DENYWRITE)
 				atomic_dec(&inode->i_writecount);
-
-			/* insert tmp into the share list, just after mpnt */
-			spin_lock(&file->f_mapping->i_mmap_lock);
+			spin_lock(&mapping->i_mmap_lock);
+			if (tmp->vm_flags & VM_SHARED)
+				mapping->i_mmap_writable++;
 			tmp->vm_truncate_count = mpnt->vm_truncate_count;
-			flush_dcache_mmap_lock(file->f_mapping);
+			flush_dcache_mmap_lock(mapping);
+			/* insert tmp into the share list, just after mpnt */
 			vma_prio_tree_add(tmp, mpnt);
-			flush_dcache_mmap_unlock(file->f_mapping);
-			spin_unlock(&file->f_mapping->i_mmap_lock);
+			flush_dcache_mmap_unlock(mapping);
+			spin_unlock(&mapping->i_mmap_lock);
 		}
 
 		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fix mapping_writably_mapped()
  2008-12-10 20:48   ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
@ 2008-12-10 22:22     ` Lee Schermerhorn
  0 siblings, 0 replies; 5+ messages in thread
From: Lee Schermerhorn @ 2008-12-10 22:22 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Linus Torvalds, Andrew Morton, linux-mm, linux-kernel,
	linux-arch, stable

On Wed, 2008-12-10 at 20:48 +0000, Hugh Dickins wrote:
> Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
> test in 2.6.7!  Bad bad bug, good good find.
> 
> The i_mmap_writable count must be incremented for VM_SHARED (just as
> i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
> when dup_mmap() copies the vma for fork: it has its own more optimal
> version of __vma_link_file(), and I missed this out.  So the count
> was later going down to 0 (dangerous) when one end unmapped, then
> wrapping negative (inefficient) when the other end unmapped.
> 
> The only impact on x86 would have been that setting a mandatory lock on
> a file which has at some time been opened O_RDWR and mapped MAP_SHARED
> (but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN
> when it should succeed, or succeed when it should fail.
> 
> But those architectures which rely on flush_dcache_page() to flush
> userspace modifications back into the page before the kernel reads it,
> may in some cases have skipped the flush after such a fork - though any
> repetitive test will soon wrap the count negative, in which case it will
> flush_dcache_page() unnecessarily.
> 
> Fix would be a two-liner, but mapping variable added, and comment moved.
> 
> Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>

I tested the version you sent out earlier atop 28-rc7-mmotm-081208 and
it appears to be working as expected.  Not very stressful testing, tho'.
Just a few ad hoc shared file mmap()ing, fork()ing, unmap()ing, ... with
some printk()s to verify the values of i_mmap_writable.

Lee

> ---
> 
>  kernel/fork.c |   15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> --- 2.6.28-rc7/kernel/fork.c	2008-11-15 23:09:30.000000000 +0000
> +++ linux/kernel/fork.c	2008-12-10 12:49:13.000000000 +0000
> @@ -315,17 +315,20 @@ static int dup_mmap(struct mm_struct *mm
>  		file = tmp->vm_file;
>  		if (file) {
>  			struct inode *inode = file->f_path.dentry->d_inode;
> +			struct address_space *mapping = file->f_mapping;
> +
>  			get_file(file);
>  			if (tmp->vm_flags & VM_DENYWRITE)
>  				atomic_dec(&inode->i_writecount);
> -
> -			/* insert tmp into the share list, just after mpnt */
> -			spin_lock(&file->f_mapping->i_mmap_lock);
> +			spin_lock(&mapping->i_mmap_lock);
> +			if (tmp->vm_flags & VM_SHARED)
> +				mapping->i_mmap_writable++;
>  			tmp->vm_truncate_count = mpnt->vm_truncate_count;
> -			flush_dcache_mmap_lock(file->f_mapping);
> +			flush_dcache_mmap_lock(mapping);
> +			/* insert tmp into the share list, just after mpnt */
>  			vma_prio_tree_add(tmp, mpnt);
> -			flush_dcache_mmap_unlock(file->f_mapping);
> -			spin_unlock(&file->f_mapping->i_mmap_lock);
> +			flush_dcache_mmap_unlock(mapping);
> +			spin_unlock(&mapping->i_mmap_lock);
>  		}
>  
>  		/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-12-10 22:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-09 20:47 Question: mmotm-081207 - Should i_mmap_writable go negative? Lee Schermerhorn
2008-12-10 13:27 ` Hugh Dickins
2008-12-10 15:48   ` Lee Schermerhorn
2008-12-10 20:48   ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
2008-12-10 22:22     ` Lee Schermerhorn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox