* [rfc][patch] mm: dirty page accounting hole
@ 2008-08-12 5:58 Nick Piggin
2008-08-12 6:50 ` Peter Zijlstra
2008-08-12 11:15 ` Hugh Dickins
0 siblings, 2 replies; 7+ messages in thread
From: Nick Piggin @ 2008-08-12 5:58 UTC (permalink / raw)
To: linux-mm; +Cc: Dickins, Hugh, Zijlstra, Peter
[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]
Hi,
I think I'm running into a hole in dirty page accounting...
What seems to be happening is that a page gets written to via a
VM_SHARED vma. We then set the pte dirty, then mark the page dirty.
Next, mprotect changes the vma so it is no longer writeable so it
is no longer VM_SHARED. The pte is still dirty.
Then clear_page_dirty_for_io is called and leaves that pte dirty
and cleans the page. It never gets cleaned until munmap, so msync
and writeout accounting are broken.
I have a fix which just scans VM_SHARED to VM_MAYSHARE. The other
way I tried is to clear the dirty and write bits and set the page
dirty in mprotect. The problem with that for me is that I'm trying
to rework the vm/fs layer so we never have to allocate data to
write out dirty pages (using page_mkwrite and dirty accounting),
and so this still leaves me with a window where the vma flags are
changed but before the pte is marked clean, in which time the page
is still dirty but it may have its metadata freed because it
doesn't look dirty.
There are several other problems I've also run into, including a
fundamentally indadequate page_mkwrite locking scheme, which was
naturally ignored when I brought it up during reviewing those
patches. I digress...
Anyway, here's a patch to fix this first particular issue...
[-- Attachment #2: mm-dirty-account-fix.patch --]
[-- Type: text/x-diff, Size: 469 bytes --]
Index: linux-2.6/mm/rmap.c
===================================================================
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -481,7 +481,7 @@ static int page_mkclean_file(struct addr
spin_lock(&mapping->i_mmap_lock);
vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
- if (vma->vm_flags & VM_SHARED)
+ if (vma->vm_flags & VM_MAYSHARE)
ret += page_mkclean_one(page, vma);
}
spin_unlock(&mapping->i_mmap_lock);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] mm: dirty page accounting hole
2008-08-12 5:58 [rfc][patch] mm: dirty page accounting hole Nick Piggin
@ 2008-08-12 6:50 ` Peter Zijlstra
2008-08-12 7:06 ` Nick Piggin
2008-08-12 11:15 ` Hugh Dickins
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2008-08-12 6:50 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-mm, Dickins, Hugh
On Tue, 2008-08-12 at 15:58 +1000, Nick Piggin wrote:
>
> Hi,
>
> I think I'm running into a hole in dirty page accounting...
>
> What seems to be happening is that a page gets written to via a
> VM_SHARED vma. We then set the pte dirty, then mark the page dirty.
> Next, mprotect changes the vma so it is no longer writeable so it
> is no longer VM_SHARED. The pte is still dirty.
>
> Then clear_page_dirty_for_io is called and leaves that pte dirty
> and cleans the page. It never gets cleaned until munmap, so msync
> and writeout accounting are broken.
>
> I have a fix which just scans VM_SHARED to VM_MAYSHARE. The other
> way I tried is to clear the dirty and write bits and set the page
> dirty in mprotect. The problem with that for me is that I'm trying
> to rework the vm/fs layer so we never have to allocate data to
> write out dirty pages (using page_mkwrite and dirty accounting),
Ooh, nice!
> and so this still leaves me with a window where the vma flags are
> changed but before the pte is marked clean, in which time the page
> is still dirty but it may have its metadata freed because it
> doesn't look dirty.
>
> There are several other problems I've also run into, including a
> fundamentally indadequate page_mkwrite locking scheme, which was
> naturally ignored when I brought it up during reviewing those
> patches. I digress...
Yes, I remember you bringing that up, and later too when you did those
fault patches. I assumed you were 'working' on it.
> Anyway, here's a patch to fix this first particular issue...
Looks good.
>
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c
> +++ linux-2.6/mm/rmap.c
> @@ -481,7 +481,7 @@ static int page_mkclean_file(struct addr
>
> spin_lock(&mapping->i_mmap_lock);
> vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> - if (vma->vm_flags & VM_SHARED)
> + if (vma->vm_flags & VM_MAYSHARE)
> ret += page_mkclean_one(page, vma);
> }
> spin_unlock(&mapping->i_mmap_lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] mm: dirty page accounting hole
2008-08-12 6:50 ` Peter Zijlstra
@ 2008-08-12 7:06 ` Nick Piggin
0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2008-08-12 7:06 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-mm, Dickins, Hugh
On Tuesday 12 August 2008 16:50, Peter Zijlstra wrote:
> On Tue, 2008-08-12 at 15:58 +1000, Nick Piggin wrote:
> > Hi,
> >
> > I think I'm running into a hole in dirty page accounting...
> >
> > What seems to be happening is that a page gets written to via a
> > VM_SHARED vma. We then set the pte dirty, then mark the page dirty.
> > Next, mprotect changes the vma so it is no longer writeable so it
> > is no longer VM_SHARED. The pte is still dirty.
> >
> > Then clear_page_dirty_for_io is called and leaves that pte dirty
> > and cleans the page. It never gets cleaned until munmap, so msync
> > and writeout accounting are broken.
> >
> > I have a fix which just scans VM_SHARED to VM_MAYSHARE. The other
> > way I tried is to clear the dirty and write bits and set the page
> > dirty in mprotect. The problem with that for me is that I'm trying
> > to rework the vm/fs layer so we never have to allocate data to
> > write out dirty pages (using page_mkwrite and dirty accounting),
>
> Ooh, nice!
Thanks for confirming.
> > and so this still leaves me with a window where the vma flags are
> > changed but before the pte is marked clean, in which time the page
> > is still dirty but it may have its metadata freed because it
> > doesn't look dirty.
> >
> > There are several other problems I've also run into, including a
> > fundamentally indadequate page_mkwrite locking scheme, which was
> > naturally ignored when I brought it up during reviewing those
> > patches. I digress...
>
> Yes, I remember you bringing that up, and later too when you did those
> fault patches. I assumed you were 'working' on it.
Oh that wasn't aimed at you, sorry ;) It is our general... "process"
of development. I mean, there is apparently some impending apocolypse
due to our scarcity of review bandwidth, so ignoring any review is
by definition insignificant.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] mm: dirty page accounting hole
2008-08-12 5:58 [rfc][patch] mm: dirty page accounting hole Nick Piggin
2008-08-12 6:50 ` Peter Zijlstra
@ 2008-08-12 11:15 ` Hugh Dickins
2008-08-12 11:30 ` Peter Zijlstra
2008-08-12 11:53 ` Nick Piggin
1 sibling, 2 replies; 7+ messages in thread
From: Hugh Dickins @ 2008-08-12 11:15 UTC (permalink / raw)
To: Nick Piggin; +Cc: Peter Zijlstra, linux-mm
On Tue, 12 Aug 2008, Nick Piggin wrote:
>
> I think I'm running into a hole in dirty page accounting...
>
> What seems to be happening is that a page gets written to via a
> VM_SHARED vma. We then set the pte dirty, then mark the page dirty.
> Next, mprotect changes the vma so it is no longer writeable so it
> is no longer VM_SHARED. The pte is still dirty.
I don't think you've got that right yet.
mprotect can of course change vma->vm_flags to take VM_WRITE off,
making vma no longer writeable; but it shouldn't be touching
VM_SHARED. And a quick check with debugger confirms that.
It's precisely because of mprotect that page_mkclean_one tests
VM_SHARED not VM_WRITE. Changing that to VM_MAYSHARE, as in your
patch below, should make no difference to correctness; but would
potentially make its loop less efficient (it would also go off to
check MAP_SHARED, PROT_READ, fd readonly mappings unnecessarily).
Perhaps there's somewhere else that clears VM_SHARED by mistake?
Or another path through mprotect which does so? I haven't checked
further, hoping this will jolt you into a different realization.
>
> Then clear_page_dirty_for_io is called and leaves that pte dirty
> and cleans the page. It never gets cleaned until munmap, so msync
> and writeout accounting are broken.
>
> I have a fix which just scans VM_SHARED to VM_MAYSHARE. The other
> way I tried is to clear the dirty and write bits and set the page
> dirty in mprotect. The problem with that for me is that I'm trying
> to rework the vm/fs layer so we never have to allocate data to
> write out dirty pages (using page_mkwrite and dirty accounting),
> and so this still leaves me with a window where the vma flags are
> changed but before the pte is marked clean, in which time the page
> is still dirty but it may have its metadata freed because it
> doesn't look dirty.
While I disagree with the patch itself, and don't understand the
details of what you're working on there, I certainly agree that it's
better for mprotect not to set the pages dirty: at present (on some
arches, in most cases? it changes from time to time) change_pte_range
is an operation on ptes which doesn't have to mess with struct pages.
>
> There are several other problems I've also run into, including a
> fundamentally indadequate page_mkwrite locking scheme, which was
> naturally ignored when I brought it up during reviewing those
> patches. I digress...
Unsatisfactory, yes, sorry about that;
but no, you're not "naturally ignored".
>
> Anyway, here's a patch to fix this first particular issue...
Could you please go back to inlining your patches?
Thanks,
Hugh
>
> Index: linux-2.6/mm/rmap.c
> ===================================================================
> --- linux-2.6.orig/mm/rmap.c
> +++ linux-2.6/mm/rmap.c
> @@ -481,7 +481,7 @@ static int page_mkclean_file(struct addr
>
> spin_lock(&mapping->i_mmap_lock);
> vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) {
> - if (vma->vm_flags & VM_SHARED)
> + if (vma->vm_flags & VM_MAYSHARE)
> ret += page_mkclean_one(page, vma);
> }
> spin_unlock(&mapping->i_mmap_lock);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] mm: dirty page accounting hole
2008-08-12 11:15 ` Hugh Dickins
@ 2008-08-12 11:30 ` Peter Zijlstra
2008-08-12 11:53 ` Nick Piggin
1 sibling, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2008-08-12 11:30 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Nick Piggin, linux-mm
On Tue, 2008-08-12 at 12:15 +0100, Hugh Dickins wrote:
> On Tue, 12 Aug 2008, Nick Piggin wrote:
> >
> > I think I'm running into a hole in dirty page accounting...
> >
> > What seems to be happening is that a page gets written to via a
> > VM_SHARED vma. We then set the pte dirty, then mark the page dirty.
> > Next, mprotect changes the vma so it is no longer writeable so it
> > is no longer VM_SHARED. The pte is still dirty.
>
> I don't think you've got that right yet.
>
> mprotect can of course change vma->vm_flags to take VM_WRITE off,
> making vma no longer writeable; but it shouldn't be touching
> VM_SHARED. And a quick check with debugger confirms that.
>
> It's precisely because of mprotect that page_mkclean_one tests
> VM_SHARED not VM_WRITE. Changing that to VM_MAYSHARE, as in your
> patch below, should make no difference to correctness; but would
> potentially make its loop less efficient (it would also go off to
> check MAP_SHARED, PROT_READ, fd readonly mappings unnecessarily).
>
> Perhaps there's somewhere else that clears VM_SHARED by mistake?
> Or another path through mprotect which does so? I haven't checked
> further, hoping this will jolt you into a different realization.
You are right, I cannot find a path through mprotect that unsets
VM_SHARED either.
Something which I failed to validate this morning.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] mm: dirty page accounting hole
2008-08-12 11:15 ` Hugh Dickins
2008-08-12 11:30 ` Peter Zijlstra
@ 2008-08-12 11:53 ` Nick Piggin
2008-08-12 13:17 ` Nick Piggin
1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2008-08-12 11:53 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Peter Zijlstra, linux-mm
On Tuesday 12 August 2008 21:15, Hugh Dickins wrote:
> On Tue, 12 Aug 2008, Nick Piggin wrote:
> > I think I'm running into a hole in dirty page accounting...
> >
> > What seems to be happening is that a page gets written to via a
> > VM_SHARED vma. We then set the pte dirty, then mark the page dirty.
> > Next, mprotect changes the vma so it is no longer writeable so it
> > is no longer VM_SHARED. The pte is still dirty.
>
> I don't think you've got that right yet.
>
> mprotect can of course change vma->vm_flags to take VM_WRITE off,
> making vma no longer writeable; but it shouldn't be touching
> VM_SHARED. And a quick check with debugger confirms that.
Drat, yes, I must have been thinking of VM_WRITE vs VM_MAYWRITE.
> It's precisely because of mprotect that page_mkclean_one tests
> VM_SHARED not VM_WRITE. Changing that to VM_MAYSHARE, as in your
> patch below, should make no difference to correctness; but would
> potentially make its loop less efficient (it would also go off to
> check MAP_SHARED, PROT_READ, fd readonly mappings unnecessarily).
>
> Perhaps there's somewhere else that clears VM_SHARED by mistake?
> Or another path through mprotect which does so? I haven't checked
> further, hoping this will jolt you into a different realization.
Will have to dig further... Thanks for the tip.
> > Then clear_page_dirty_for_io is called and leaves that pte dirty
> > and cleans the page. It never gets cleaned until munmap, so msync
> > and writeout accounting are broken.
> >
> > I have a fix which just scans VM_SHARED to VM_MAYSHARE. The other
> > way I tried is to clear the dirty and write bits and set the page
> > dirty in mprotect. The problem with that for me is that I'm trying
> > to rework the vm/fs layer so we never have to allocate data to
> > write out dirty pages (using page_mkwrite and dirty accounting),
> > and so this still leaves me with a window where the vma flags are
> > changed but before the pte is marked clean, in which time the page
> > is still dirty but it may have its metadata freed because it
> > doesn't look dirty.
>
> While I disagree with the patch itself, and don't understand the
> details of what you're working on there, I certainly agree that it's
> better for mprotect not to set the pages dirty: at present (on some
> arches, in most cases? it changes from time to time) change_pte_range
> is an operation on ptes which doesn't have to mess with struct pages.
Right. For me, it gets a bit messy because we'd really like to hold the
page lock there (like clear_page_dirty_for_io) in order to be able to get
reasonable synchronization between page faults and pte cleaning...
> > There are several other problems I've also run into, including a
> > fundamentally indadequate page_mkwrite locking scheme, which was
> > naturally ignored when I brought it up during reviewing those
> > patches. I digress...
>
> Unsatisfactory, yes, sorry about that;
> but no, you're not "naturally ignored".
>
> > Anyway, here's a patch to fix this first particular issue...
>
> Could you please go back to inlining your patches?
OK I'll try.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [rfc][patch] mm: dirty page accounting hole
2008-08-12 11:53 ` Nick Piggin
@ 2008-08-12 13:17 ` Nick Piggin
0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2008-08-12 13:17 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Peter Zijlstra, linux-mm
On Tuesday 12 August 2008 21:53, Nick Piggin wrote:
> On Tuesday 12 August 2008 21:15, Hugh Dickins wrote:
> > On Tue, 12 Aug 2008, Nick Piggin wrote:
> > > I think I'm running into a hole in dirty page accounting...
> > >
> > > What seems to be happening is that a page gets written to via a
> > > VM_SHARED vma. We then set the pte dirty, then mark the page dirty.
> > > Next, mprotect changes the vma so it is no longer writeable so it
> > > is no longer VM_SHARED. The pte is still dirty.
> >
> > I don't think you've got that right yet.
> >
> > mprotect can of course change vma->vm_flags to take VM_WRITE off,
> > making vma no longer writeable; but it shouldn't be touching
> > VM_SHARED. And a quick check with debugger confirms that.
>
> Drat, yes, I must have been thinking of VM_WRITE vs VM_MAYWRITE.
And indeed I was able to reproduce the problem with my "fix" applied
too, after refining the test case to be more reproduceable...
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-08-12 13:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-12 5:58 [rfc][patch] mm: dirty page accounting hole Nick Piggin
2008-08-12 6:50 ` Peter Zijlstra
2008-08-12 7:06 ` Nick Piggin
2008-08-12 11:15 ` Hugh Dickins
2008-08-12 11:30 ` Peter Zijlstra
2008-08-12 11:53 ` Nick Piggin
2008-08-12 13:17 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox