* Question: mmotm-081207 - Should i_mmap_writable go negative?
@ 2008-12-09 20:47 Lee Schermerhorn
2008-12-10 13:27 ` Hugh Dickins
0 siblings, 1 reply; 5+ messages in thread
From: Lee Schermerhorn @ 2008-12-09 20:47 UTC (permalink / raw)
To: Andrew Morton, Hugh Dickins, linux-mm
I've been trying to figure out how to determine when the last shared
mapping [vma with VM_SHARED] is removed from a mmap()ed file. Looking
at the sources, it appears that vma->vm_file->f_mapping->i_mmap_writable
counts the number of vmas with VM_SHARED mapping the file. However, in
2.6.28-rc7-mmotm-081207, it appears that i_mmap_writable doesn't get
incremented when a task fork()s, and can go negative when the parent and
child both unmap the file.
I instrumented a couple of functions in mmap.c that increment and
decrement i_mmap_writable and then mapped a file sharable, fork(), and
unmapped the file in the parent and child, using my memtoy ad hoc test
program. Here's what I see [again, on 2.6.28-rc7-mmotm-081207]--right
after boot, so my test file should have zero mappers:
--------
root@dropout(root):memtoy
memtoy pid: 3301
memtoy>file /tmp/zf1
memtoy>map zf1 shared
console:__vma_link_file: vma: ffff8803fdc090b8 - mapping->i_mmap_writable: 0 -> 1
memtoy>child c1
memtoy: child c1 - pid 3302
me: I would have expected to see i_mmap_writable incremented again here, but
me: saw no console output from my instrumentation.
memtoy>unmap zf1 # unmap in parent
console:__remove_shared_vm_struct: vma: ffff8803fdc090b8 - mapping->i_mmap_writable: 1 -> 0
console:__remove_shared_vm_struct: vma: ffff8803fdc090b8 - removed last shared mapping
memtoy>/c1 show
_____address______ ____length____ ____offset____ prot share name
f 0x00007f000ae68000 0x000001000000 0x000000000000 rw- shared /tmp/zf1
me: child still has zf1 mapped
memtoy>/c1 unmap zf1 # unmap in child
console:__remove_shared_vm_struct: vma: ffff8803fe5d3170 - mapping->i_mmap_writable: 0 -> -1
--------
So, the file's i_mmap_writable goes negative. Is this expected?
If I remap the file, whether or not I restart memtoy, I see that it's
i_mmap_writable has remained negative:
-------
memtoy>map zf1 # map private [!shared] - no change in i_mmap_writable
console:__vma_link_file: vma: ffff8805fd0590b8 - mapping->i_mmap_writable: -1 -> -1
memtoy>unmap zf1 # unmap: no change in i_mmap_writable
console:__remove_shared_vm_struct: vma: ffff8805fd0590b8 - mapping->i_mmap_writable: -1 -> -1
memtoy>map zf1 shared # mmap shared, again
console:__vma_link_file: vma: ffff8805fd0590b8 - mapping->i_mmap_writable: -1 -> 0
--------
I recall that this used to work [as I expected] at one time.
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question: mmotm-081207 - Should i_mmap_writable go negative?
2008-12-09 20:47 Question: mmotm-081207 - Should i_mmap_writable go negative? Lee Schermerhorn
@ 2008-12-10 13:27 ` Hugh Dickins
2008-12-10 15:48 ` Lee Schermerhorn
2008-12-10 20:48 ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
0 siblings, 2 replies; 5+ messages in thread
From: Hugh Dickins @ 2008-12-10 13:27 UTC (permalink / raw)
To: Lee Schermerhorn; +Cc: Andrew Morton, linux-mm
On Tue, 9 Dec 2008, Lee Schermerhorn wrote:
> I've been trying to figure out how to determine when the last shared
> mapping [vma with VM_SHARED] is removed from a mmap()ed file. Looking
> at the sources, it appears that vma->vm_file->f_mapping->i_mmap_writable
> counts the number of vmas with VM_SHARED mapping the file. However, in
> 2.6.28-rc7-mmotm-081207, it appears that i_mmap_writable doesn't get
> incremented when a task fork()s, and can go negative when the parent and
> child both unmap the file.
Wow, that's what I call a good find!
> I recall that this used to work [as I expected] at one time.
Are you sure? It looks to me to have been wrong ever since I added
i_mmap_writable. Not that I've tested yet at all. Here's the patch
I intend to send Linus a.s.a.p: but I do need to test it, and also
extend the comment to say just what's been done wrong all this time.
Big thank you from all our users!
Hugh
Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
test in 2.6.7! Bad bad bug, good good find.
The i_mmap_writable count must be incremented for VM_SHARED (just as
i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
when dup_mmap() copies the vma for fork: it has its own more optimal
version of __vma_link_file() and I missed this out. So the count
was later going down to 0 (dangerous) when one end unmapped, then
wrapping negative (inefficient) when the other end unmapped.
Fix would be a two-liner, but mapping variable added and comment moved.
Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Not-Quite-Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
kernel/fork.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- 2.6.28-rc7/kernel/fork.c 2008-11-15 23:09:30.000000000 +0000
+++ linux/kernel/fork.c 2008-12-10 12:49:13.000000000 +0000
@@ -315,17 +315,20 @@ static int dup_mmap(struct mm_struct *mm
file = tmp->vm_file;
if (file) {
struct inode *inode = file->f_path.dentry->d_inode;
+ struct address_space *mapping = file->f_mapping;
+
get_file(file);
if (tmp->vm_flags & VM_DENYWRITE)
atomic_dec(&inode->i_writecount);
-
- /* insert tmp into the share list, just after mpnt */
- spin_lock(&file->f_mapping->i_mmap_lock);
+ spin_lock(&mapping->i_mmap_lock);
+ if (tmp->vm_flags & VM_SHARED)
+ mapping->i_mmap_writable++;
tmp->vm_truncate_count = mpnt->vm_truncate_count;
- flush_dcache_mmap_lock(file->f_mapping);
+ flush_dcache_mmap_lock(mapping);
+ /* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
- flush_dcache_mmap_unlock(file->f_mapping);
- spin_unlock(&file->f_mapping->i_mmap_lock);
+ flush_dcache_mmap_unlock(mapping);
+ spin_unlock(&mapping->i_mmap_lock);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Question: mmotm-081207 - Should i_mmap_writable go negative?
2008-12-10 13:27 ` Hugh Dickins
@ 2008-12-10 15:48 ` Lee Schermerhorn
2008-12-10 20:48 ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
1 sibling, 0 replies; 5+ messages in thread
From: Lee Schermerhorn @ 2008-12-10 15:48 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, linux-mm
On Wed, 2008-12-10 at 13:27 +0000, Hugh Dickins wrote:
> On Tue, 9 Dec 2008, Lee Schermerhorn wrote:
> > I've been trying to figure out how to determine when the last shared
> > mapping [vma with VM_SHARED] is removed from a mmap()ed file. Looking
> > at the sources, it appears that vma->vm_file->f_mapping->i_mmap_writable
> > counts the number of vmas with VM_SHARED mapping the file. However, in
> > 2.6.28-rc7-mmotm-081207, it appears that i_mmap_writable doesn't get
> > incremented when a task fork()s, and can go negative when the parent and
> > child both unmap the file.
>
> Wow, that's what I call a good find!
>
> > I recall that this used to work [as I expected] at one time.
>
> Are you sure? It looks to me to have been wrong ever since I added
> i_mmap_writable. Not that I've tested yet at all. Here's the patch
> I intend to send Linus a.s.a.p: but I do need to test it, and also
> extend the comment to say just what's been done wrong all this time.
Well, no, not sure. A test that I had done at one time seemed to work
then. I recently started testing this area again, and noticed
"unexpected behavior" which I tracked down to this, uh, feature. I may
have been doing the mmap() after the fork before, while this time I
forked after mmap(). Anyway, I'll add in your patch at test with that.
Thanks,
Lee
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] fix mapping_writably_mapped()
2008-12-10 13:27 ` Hugh Dickins
2008-12-10 15:48 ` Lee Schermerhorn
@ 2008-12-10 20:48 ` Hugh Dickins
2008-12-10 22:22 ` Lee Schermerhorn
1 sibling, 1 reply; 5+ messages in thread
From: Hugh Dickins @ 2008-12-10 20:48 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, Lee Schermerhorn, linux-mm, linux-kernel,
linux-arch, stable
Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
test in 2.6.7! Bad bad bug, good good find.
The i_mmap_writable count must be incremented for VM_SHARED (just as
i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
when dup_mmap() copies the vma for fork: it has its own more optimal
version of __vma_link_file(), and I missed this out. So the count
was later going down to 0 (dangerous) when one end unmapped, then
wrapping negative (inefficient) when the other end unmapped.
The only impact on x86 would have been that setting a mandatory lock on
a file which has at some time been opened O_RDWR and mapped MAP_SHARED
(but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN
when it should succeed, or succeed when it should fail.
But those architectures which rely on flush_dcache_page() to flush
userspace modifications back into the page before the kernel reads it,
may in some cases have skipped the flush after such a fork - though any
repetitive test will soon wrap the count negative, in which case it will
flush_dcache_page() unnecessarily.
Fix would be a two-liner, but mapping variable added, and comment moved.
Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
kernel/fork.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- 2.6.28-rc7/kernel/fork.c 2008-11-15 23:09:30.000000000 +0000
+++ linux/kernel/fork.c 2008-12-10 12:49:13.000000000 +0000
@@ -315,17 +315,20 @@ static int dup_mmap(struct mm_struct *mm
file = tmp->vm_file;
if (file) {
struct inode *inode = file->f_path.dentry->d_inode;
+ struct address_space *mapping = file->f_mapping;
+
get_file(file);
if (tmp->vm_flags & VM_DENYWRITE)
atomic_dec(&inode->i_writecount);
-
- /* insert tmp into the share list, just after mpnt */
- spin_lock(&file->f_mapping->i_mmap_lock);
+ spin_lock(&mapping->i_mmap_lock);
+ if (tmp->vm_flags & VM_SHARED)
+ mapping->i_mmap_writable++;
tmp->vm_truncate_count = mpnt->vm_truncate_count;
- flush_dcache_mmap_lock(file->f_mapping);
+ flush_dcache_mmap_lock(mapping);
+ /* insert tmp into the share list, just after mpnt */
vma_prio_tree_add(tmp, mpnt);
- flush_dcache_mmap_unlock(file->f_mapping);
- spin_unlock(&file->f_mapping->i_mmap_lock);
+ flush_dcache_mmap_unlock(mapping);
+ spin_unlock(&mapping->i_mmap_lock);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fix mapping_writably_mapped()
2008-12-10 20:48 ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
@ 2008-12-10 22:22 ` Lee Schermerhorn
0 siblings, 0 replies; 5+ messages in thread
From: Lee Schermerhorn @ 2008-12-10 22:22 UTC (permalink / raw)
To: Hugh Dickins
Cc: Linus Torvalds, Andrew Morton, linux-mm, linux-kernel,
linux-arch, stable
On Wed, 2008-12-10 at 20:48 +0000, Hugh Dickins wrote:
> Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped
> test in 2.6.7! Bad bad bug, good good find.
>
> The i_mmap_writable count must be incremented for VM_SHARED (just as
> i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock)
> when dup_mmap() copies the vma for fork: it has its own more optimal
> version of __vma_link_file(), and I missed this out. So the count
> was later going down to 0 (dangerous) when one end unmapped, then
> wrapping negative (inefficient) when the other end unmapped.
>
> The only impact on x86 would have been that setting a mandatory lock on
> a file which has at some time been opened O_RDWR and mapped MAP_SHARED
> (but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN
> when it should succeed, or succeed when it should fail.
>
> But those architectures which rely on flush_dcache_page() to flush
> userspace modifications back into the page before the kernel reads it,
> may in some cases have skipped the flush after such a fork - though any
> repetitive test will soon wrap the count negative, in which case it will
> flush_dcache_page() unnecessarily.
>
> Fix would be a two-liner, but mapping variable added, and comment moved.
>
> Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
I tested the version you sent out earlier atop 28-rc7-mmotm-081208 and
it appears to be working as expected. Not very stressful testing, tho'.
Just a few ad hoc shared file mmap()ing, fork()ing, unmap()ing, ... with
some printk()s to verify the values of i_mmap_writable.
Lee
> ---
>
> kernel/fork.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> --- 2.6.28-rc7/kernel/fork.c 2008-11-15 23:09:30.000000000 +0000
> +++ linux/kernel/fork.c 2008-12-10 12:49:13.000000000 +0000
> @@ -315,17 +315,20 @@ static int dup_mmap(struct mm_struct *mm
> file = tmp->vm_file;
> if (file) {
> struct inode *inode = file->f_path.dentry->d_inode;
> + struct address_space *mapping = file->f_mapping;
> +
> get_file(file);
> if (tmp->vm_flags & VM_DENYWRITE)
> atomic_dec(&inode->i_writecount);
> -
> - /* insert tmp into the share list, just after mpnt */
> - spin_lock(&file->f_mapping->i_mmap_lock);
> + spin_lock(&mapping->i_mmap_lock);
> + if (tmp->vm_flags & VM_SHARED)
> + mapping->i_mmap_writable++;
> tmp->vm_truncate_count = mpnt->vm_truncate_count;
> - flush_dcache_mmap_lock(file->f_mapping);
> + flush_dcache_mmap_lock(mapping);
> + /* insert tmp into the share list, just after mpnt */
> vma_prio_tree_add(tmp, mpnt);
> - flush_dcache_mmap_unlock(file->f_mapping);
> - spin_unlock(&file->f_mapping->i_mmap_lock);
> + flush_dcache_mmap_unlock(mapping);
> + spin_unlock(&mapping->i_mmap_lock);
> }
>
> /*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-12-10 22:22 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-09 20:47 Question: mmotm-081207 - Should i_mmap_writable go negative? Lee Schermerhorn
2008-12-10 13:27 ` Hugh Dickins
2008-12-10 15:48 ` Lee Schermerhorn
2008-12-10 20:48 ` [PATCH] fix mapping_writably_mapped() Hugh Dickins
2008-12-10 22:22 ` Lee Schermerhorn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox