* [PATCH] Make invalidate_inode_pages2() work again
@ 2006-09-25 23:15 Chuck Lever
2006-09-25 23:59 ` Nick Piggin
0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2006-09-25 23:15 UTC (permalink / raw)
To: apkm; +Cc: linux-mm, Trond.Myklebust, steved
A recent change to fix a problem with invalidate_inode_pages() has weakened
the behavior of invalidate_inode_pages2() inadvertently. Add a flag to
tell the helper routines when stronger invalidation semantics are desired.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
mm/truncate.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/truncate.c b/mm/truncate.c
index c6ab55e..b3097a2 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -59,7 +59,7 @@ truncate_complete_page(struct address_sp
* Returns non-zero if the page was successfully invalidated.
*/
static int
-invalidate_complete_page(struct address_space *mapping, struct page *page)
+invalidate_complete_page(struct address_space *mapping, struct page *page, int try_harder)
{
if (page->mapping != mapping)
return 0;
@@ -70,7 +70,7 @@ invalidate_complete_page(struct address_
write_lock_irq(&mapping->tree_lock);
if (PageDirty(page))
goto failed;
- if (page_count(page) != 2) /* caller's ref + pagecache ref */
+ if (!try_harder && page_count(page) != 2) /* caller's ref + pagecache ref */
goto failed;
BUG_ON(PagePrivate(page));
@@ -255,7 +255,7 @@ unsigned long invalidate_mapping_pages(s
goto unlock;
if (page_mapped(page))
goto unlock;
- ret += invalidate_complete_page(mapping, page);
+ ret += invalidate_complete_page(mapping, page, 0);
unlock:
unlock_page(page);
if (next > end)
@@ -339,7 +339,7 @@ int invalidate_inode_pages2_range(struct
}
}
was_dirty = test_clear_page_dirty(page);
- if (!invalidate_complete_page(mapping, page)) {
+ if (!invalidate_complete_page(mapping, page, 1)) {
if (was_dirty)
set_page_dirty(page);
ret = -EIO;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make invalidate_inode_pages2() work again
2006-09-25 23:15 [PATCH] Make invalidate_inode_pages2() work again Chuck Lever
@ 2006-09-25 23:59 ` Nick Piggin
2006-09-26 1:20 ` Trond Myklebust
0 siblings, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2006-09-25 23:59 UTC (permalink / raw)
To: Chuck Lever; +Cc: apkm, linux-mm, Trond.Myklebust, steved
Chuck Lever wrote:
>A recent change to fix a problem with invalidate_inode_pages() has weakened
>the behavior of invalidate_inode_pages2() inadvertently. Add a flag to
>tell the helper routines when stronger invalidation semantics are desired.
>
Question: if invalidate_inode_pages2 cares about not invalidating dirty
pages, how can one avoid the page_count check and it still be correct
(ie. not randomly lose dirty bits in some situations)?
>Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>---
>
> mm/truncate.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/mm/truncate.c b/mm/truncate.c
>index c6ab55e..b3097a2 100644
>--- a/mm/truncate.c
>+++ b/mm/truncate.c
>@@ -59,7 +59,7 @@ truncate_complete_page(struct address_sp
> * Returns non-zero if the page was successfully invalidated.
> */
> static int
>-invalidate_complete_page(struct address_space *mapping, struct page *page)
>+invalidate_complete_page(struct address_space *mapping, struct page *page, int try_harder)
> {
> if (page->mapping != mapping)
> return 0;
>@@ -70,7 +70,7 @@ invalidate_complete_page(struct address_
> write_lock_irq(&mapping->tree_lock);
> if (PageDirty(page))
> goto failed;
>- if (page_count(page) != 2) /* caller's ref + pagecache ref */
>+ if (!try_harder && page_count(page) != 2) /* caller's ref + pagecache ref */
> goto failed;
>
> BUG_ON(PagePrivate(page));
>@@ -255,7 +255,7 @@ unsigned long invalidate_mapping_pages(s
> goto unlock;
> if (page_mapped(page))
> goto unlock;
>- ret += invalidate_complete_page(mapping, page);
>+ ret += invalidate_complete_page(mapping, page, 0);
> unlock:
> unlock_page(page);
> if (next > end)
>@@ -339,7 +339,7 @@ int invalidate_inode_pages2_range(struct
> }
> }
> was_dirty = test_clear_page_dirty(page);
>- if (!invalidate_complete_page(mapping, page)) {
>+ if (!invalidate_complete_page(mapping, page, 1)) {
> if (was_dirty)
> set_page_dirty(page);
> ret = -EIO;
>
>--
>
>
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make invalidate_inode_pages2() work again
2006-09-25 23:59 ` Nick Piggin
@ 2006-09-26 1:20 ` Trond Myklebust
2006-09-26 1:39 ` Nick Piggin
2006-09-26 1:42 ` Andrew Morton
0 siblings, 2 replies; 7+ messages in thread
From: Trond Myklebust @ 2006-09-26 1:20 UTC (permalink / raw)
To: Nick Piggin; +Cc: Chuck Lever, apkm, linux-mm, steved
On Tue, 2006-09-26 at 09:59 +1000, Nick Piggin wrote:
> Chuck Lever wrote:
>
> >A recent change to fix a problem with invalidate_inode_pages() has weakened
> >the behavior of invalidate_inode_pages2() inadvertently. Add a flag to
> >tell the helper routines when stronger invalidation semantics are desired.
> >
>
> Question: if invalidate_inode_pages2 cares about not invalidating dirty
> pages, how can one avoid the page_count check and it still be correct
> (ie. not randomly lose dirty bits in some situations)?
Tests of page_count _suck_ 'cos they are 100% non-specific. Is there no
way to set a page flag or something to indicate that the page may have
been remapped while we were sleeping?
Cheers,
Trond
> >Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >---
> >
> > mm/truncate.c | 8 ++++----
> > 1 files changed, 4 insertions(+), 4 deletions(-)
> >
> >diff --git a/mm/truncate.c b/mm/truncate.c
> >index c6ab55e..b3097a2 100644
> >--- a/mm/truncate.c
> >+++ b/mm/truncate.c
> >@@ -59,7 +59,7 @@ truncate_complete_page(struct address_sp
> > * Returns non-zero if the page was successfully invalidated.
> > */
> > static int
> >-invalidate_complete_page(struct address_space *mapping, struct page *page)
> >+invalidate_complete_page(struct address_space *mapping, struct page *page, int try_harder)
> > {
> > if (page->mapping != mapping)
> > return 0;
> >@@ -70,7 +70,7 @@ invalidate_complete_page(struct address_
> > write_lock_irq(&mapping->tree_lock);
> > if (PageDirty(page))
> > goto failed;
> >- if (page_count(page) != 2) /* caller's ref + pagecache ref */
> >+ if (!try_harder && page_count(page) != 2) /* caller's ref + pagecache ref */
> > goto failed;
> >
> > BUG_ON(PagePrivate(page));
> >@@ -255,7 +255,7 @@ unsigned long invalidate_mapping_pages(s
> > goto unlock;
> > if (page_mapped(page))
> > goto unlock;
> >- ret += invalidate_complete_page(mapping, page);
> >+ ret += invalidate_complete_page(mapping, page, 0);
> > unlock:
> > unlock_page(page);
> > if (next > end)
> >@@ -339,7 +339,7 @@ int invalidate_inode_pages2_range(struct
> > }
> > }
> > was_dirty = test_clear_page_dirty(page);
> >- if (!invalidate_complete_page(mapping, page)) {
> >+ if (!invalidate_complete_page(mapping, page, 1)) {
> > if (was_dirty)
> > set_page_dirty(page);
> > ret = -EIO;
> >
> >--
> >
> >
>
> Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make invalidate_inode_pages2() work again
2006-09-26 1:20 ` Trond Myklebust
@ 2006-09-26 1:39 ` Nick Piggin
2006-09-26 6:24 ` Nick Piggin
2006-09-26 1:42 ` Andrew Morton
1 sibling, 1 reply; 7+ messages in thread
From: Nick Piggin @ 2006-09-26 1:39 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Chuck Lever, apkm, linux-mm, steved
Trond Myklebust wrote:
>On Tue, 2006-09-26 at 09:59 +1000, Nick Piggin wrote:
>
>>Chuck Lever wrote:
>>
>>
>>>A recent change to fix a problem with invalidate_inode_pages() has weakened
>>>the behavior of invalidate_inode_pages2() inadvertently. Add a flag to
>>>tell the helper routines when stronger invalidation semantics are desired.
>>>
>>>
>>Question: if invalidate_inode_pages2 cares about not invalidating dirty
>>pages, how can one avoid the page_count check and it still be correct
>>(ie. not randomly lose dirty bits in some situations)?
>>
>
>Tests of page_count _suck_ 'cos they are 100% non-specific. Is there no
>way to set a page flag or something to indicate that the page may have
>been remapped while we were sleeping?
>
We can exclude the page from being mapped again, if we put a lock_page in
the pagefault handler (which, we have decided, could be reasonable). But
that will only ensure it is not mapped.
If you want to ensure it never becomes *dirty*, then you need to test
page_count because it is the only way to know whether some page obtained
via get_user_pages will, without warning, get dirtied in some corner of
the kernel.
If it weren't for get_user_pages, once we are able to exclude all mappings,
it sounds sane for a filesystem to be able to then exclude anything else
that might dirty the page.
So I really dislike get_user_pages for reasons such as this. IMO it would
be cool if get_user_pages when the caller wants to write, would return with
the page dirty and a bit set to prevent writeout from cleaning it until it
has been finished with (via put_user_pages).
Actually, _ideally_, maybe keeping the mapping around (ie. holding at
least a read lock on mmap_sem) would do the trick. The presence of the
mapping will be seen by the invalidate routines[*], and in general things
might be simplified.
[*] although they'll still go ahead and invalidate the ptes because they
don't take mmap_sem. So something else might be needed. I haven't thought
this through.
--
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make invalidate_inode_pages2() work again
2006-09-26 1:20 ` Trond Myklebust
2006-09-26 1:39 ` Nick Piggin
@ 2006-09-26 1:42 ` Andrew Morton
2006-09-26 1:54 ` Nick Piggin
1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2006-09-26 1:42 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Nick Piggin, Chuck Lever, linux-mm, steved
On Mon, 25 Sep 2006 21:20:13 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> On Tue, 2006-09-26 at 09:59 +1000, Nick Piggin wrote:
> > Chuck Lever wrote:
> >
> > >A recent change to fix a problem with invalidate_inode_pages() has weakened
> > >the behavior of invalidate_inode_pages2() inadvertently. Add a flag to
> > >tell the helper routines when stronger invalidation semantics are desired.
> > >
> >
> > Question: if invalidate_inode_pages2 cares about not invalidating dirty
> > pages, how can one avoid the page_count check and it still be correct
> > (ie. not randomly lose dirty bits in some situations)?
>
> Tests of page_count _suck_ 'cos they are 100% non-specific. Is there no
> way to set a page flag or something to indicate that the page may have
> been remapped while we were sleeping?
Its a question of "what are these functions supposed to do"?
I'd suggest:
invalidate_inode_pages() -> best-effort, remove-it-if-it-isn't-busy
truncate_inode_pages() -> guaranteed, data-destroying takedown.
invalidate_inode_pages2() -> Somewhere in between. Any takers?
I'd suggest "guaranteed, non-data-destroying takedown". Maybe. So it
doesn't remove dirty pages, but it does remove otherwise-busy pages.
As definitions go, that really sucks.
I think testing page_count() makes sense for invalidate_inode_pages(),
because that page is clearly in use by someone for something and we
shouldn't go and whip it out of pagecache under that someone's feet. It
is, after all, "pinned".
For invalidate_inode_pages2(), proper behaviour would be to block until
whoever is busying that page stops being busy on it.
I perhaps we could do a wake_up(page_waitqueue(page)) in vmscan when it
drops the ref on a page. But that would mean that
invalidate_inode_pages2() would get permanently stuck on a
permanently-pinned page.
It's a bit of a pickle. Perhaps we just add the
invalidate_complete_page2(). That re-adds the direct-io race, which is
fixable by locking the page in the pagefault handler. argh.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make invalidate_inode_pages2() work again
2006-09-26 1:42 ` Andrew Morton
@ 2006-09-26 1:54 ` Nick Piggin
0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2006-09-26 1:54 UTC (permalink / raw)
To: Andrew Morton; +Cc: Trond Myklebust, Chuck Lever, linux-mm, steved
Andrew Morton wrote:
>On Mon, 25 Sep 2006 21:20:13 -0400
>Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
>
>
>>On Tue, 2006-09-26 at 09:59 +1000, Nick Piggin wrote:
>>
>>>Chuck Lever wrote:
>>>
>>>
>>>>A recent change to fix a problem with invalidate_inode_pages() has weakened
>>>>the behavior of invalidate_inode_pages2() inadvertently. Add a flag to
>>>>tell the helper routines when stronger invalidation semantics are desired.
>>>>
>>>>
>>>Question: if invalidate_inode_pages2 cares about not invalidating dirty
>>>pages, how can one avoid the page_count check and it still be correct
>>>(ie. not randomly lose dirty bits in some situations)?
>>>
>>Tests of page_count _suck_ 'cos they are 100% non-specific. Is there no
>>way to set a page flag or something to indicate that the page may have
>>been remapped while we were sleeping?
>>
>
>Its a question of "what are these functions supposed to do"?
>
>I'd suggest:
>
>invalidate_inode_pages() -> best-effort, remove-it-if-it-isn't-busy
>
>truncate_inode_pages() -> guaranteed, data-destroying takedown.
>
>invalidate_inode_pages2() -> Somewhere in between. Any takers?
>
>I'd suggest "guaranteed, non-data-destroying takedown". Maybe. So it
>doesn't remove dirty pages, but it does remove otherwise-busy pages.
>
>As definitions go, that really sucks.
>
What I want to know is, can invalidate_inode_pages2 ever be allowed to
throw out a dirty page?
>I think testing page_count() makes sense for invalidate_inode_pages(),
>because that page is clearly in use by someone for something and we
>shouldn't go and whip it out of pagecache under that someone's feet. It
>is, after all, "pinned".
>
Definitely.
>For invalidate_inode_pages2(), proper behaviour would be to block until
>whoever is busying that page stops being busy on it.
>
>I perhaps we could do a wake_up(page_waitqueue(page)) in vmscan when it
>drops the ref on a page. But that would mean that
>invalidate_inode_pages2() would get permanently stuck on a
>permanently-pinned page.
>
>It's a bit of a pickle. Perhaps we just add the
>invalidate_complete_page2(). That re-adds the direct-io race, which is
>fixable by locking the page in the pagefault handler. argh.
>
And AFAIKS, it will re add the race where it can be possible to invalidate
a dirty page. Which is fixable by my previous suggestion (having
get_user_pages for write somehow marking the page until put_user_pages).
Not a small job either, especially going through the callers. It will
simplify things though, if we know the page is guaranteed not to be marked
clean until we're finished with it.
--
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Make invalidate_inode_pages2() work again
2006-09-26 1:39 ` Nick Piggin
@ 2006-09-26 6:24 ` Nick Piggin
0 siblings, 0 replies; 7+ messages in thread
From: Nick Piggin @ 2006-09-26 6:24 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Chuck Lever, apkm, linux-mm, steved
Nick Piggin wrote:
> So I really dislike get_user_pages for reasons such as this. IMO it would
> be cool if get_user_pages when the caller wants to write, would return
> with
> the page dirty and a bit set to prevent writeout from cleaning it
> until it
> has been finished with (via put_user_pages).
>
> Actually, _ideally_, maybe keeping the mapping around (ie. holding at
> least a read lock on mmap_sem) would do the trick. The presence of the
> mapping will be seen by the invalidate routines[*], and in general things
> might be simplified.
No, I guess that isn't going to work, because it'll mean one thread can
DoS the others WRT mmap and brk. I'll look into the per-page flag bit
idea.
--
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-09-26 6:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-09-25 23:15 [PATCH] Make invalidate_inode_pages2() work again Chuck Lever
2006-09-25 23:59 ` Nick Piggin
2006-09-26 1:20 ` Trond Myklebust
2006-09-26 1:39 ` Nick Piggin
2006-09-26 6:24 ` Nick Piggin
2006-09-26 1:42 ` Andrew Morton
2006-09-26 1:54 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox