* "orphaned pagecache memleak fix" question.
@ 2005-04-05 16:02 Nikita Danilov
2005-04-06 7:58 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Nikita Danilov @ 2005-04-05 16:02 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-mm, Andrew Morton
Hello,
I have few question about recent "orphaned pagecache memleak fix"
change-set:
- how is it supposed to work with file systems that use page->private
(and PG_private) for something else than buffer head ring? Such file
systems may leak truncated pages for precisely the same reasons
reiserfs does, and try_to_free_buffers(page) will most likely oops;
- as I see it, nr_dirty shouldn't be updated after calling
ClearPageDirty() because page->mapping was NULL already at the time of
corresponding __set_page_dirty_nobuffers() call. Right?
- wouldn't it be simpler to unconditionally remove page from LRU in
->invalidatepage()?
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-05 16:02 "orphaned pagecache memleak fix" question Nikita Danilov
@ 2005-04-06 7:58 ` Andrew Morton
2005-04-06 12:06 ` Nikita Danilov
2005-04-06 21:12 ` Chris Mason
0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2005-04-06 7:58 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Andrea, linux-mm, AKPM
Nikita Danilov <nikita@clusterfs.com> wrote:
>
> Hello,
>
> I have few question about recent "orphaned pagecache memleak fix"
> change-set:
Something was wrong with that patch. I accidentally left a printk in there
and nobody has ever reported the printk coming out.
IOW: we can probably revert it (or fix it!) but first we need to get down
and work out what's happening.
> - how is it supposed to work with file systems that use page->private
> (and PG_private) for something else than buffer head ring? Such file
> systems may leak truncated pages for precisely the same reasons
> reiserfs does, and try_to_free_buffers(page) will most likely oops;
You're screwed, sorry. If PagePrivate is set and ->mapping is null we just
assume that the thing at ->private is buffers. It's awful.
If the fs doesn't leave buffers at ->private then it simply cannot allow
->invalidatepage() to return 0. It must invalidate the page. We could do
that right now in ext3 (for example) by blocking in ->invalidatepage().
(Run a commit, retry the invalidation).
The assumption that the thing at ->private is buffers should be viewed
as a performance hack for buffer-backed address_spaces only.
Note that the patch to which you refer doesn't add this hack - it's already
been there for a long time, in a different place:
if (PagePrivate(page)) {
if (!try_to_release_page(page, sc->gfp_mask))
goto activate_locked;
if (!mapping && page_count(page) == 1)
goto free_it;
}
If we get here with a null ->mapping, we'll assume that ->private contains
a buffer ring.
To which reiserfs do you refer? reiser4, I assume?
> - as I see it, nr_dirty shouldn't be updated after calling
> ClearPageDirty() because page->mapping was NULL already at the time of
> corresponding __set_page_dirty_nobuffers() call. Right?
That seems to be correct, yes.
> - wouldn't it be simpler to unconditionally remove page from LRU in
> ->invalidatepage()?
I guess that's an option, yes. If the fs cannot successfully invalidate
the page then it can either block (as described above) or remove the page
from the LRU. The fs then wholly owns the page.
I think it would be better to make ->invalidatepage always succeed though.
The situation is probably rare.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-06 7:58 ` Andrew Morton
@ 2005-04-06 12:06 ` Nikita Danilov
2005-04-06 19:27 ` Andrew Morton
2005-04-06 21:12 ` Chris Mason
1 sibling, 1 reply; 9+ messages in thread
From: Nikita Danilov @ 2005-04-06 12:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andrea, linux-mm
Andrew Morton writes:
[...]
>
> The assumption that the thing at ->private is buffers should be viewed
> as a performance hack for buffer-backed address_spaces only.
Yes, I was mainly concerned that this is gross layering violation that
silently adds new constraint all file systems have to follow. Were
non-buffer-head based in-tree file systems checked to never fail
->invalidatepage()?
>
> Note that the patch to which you refer doesn't add this hack - it's already
> been there for a long time, in a different place:
>
> if (PagePrivate(page)) {
> if (!try_to_release_page(page, sc->gfp_mask))
> goto activate_locked;
> if (!mapping && page_count(page) == 1)
> goto free_it;
> }
Ugh.. right.
> To which reiserfs do you refer? reiser4, I assume?
No, reiserfs v3. From bk revtool:
----------------------------------------------------------------------
ChangeSet 1.2028 2005/03/13 16:15:58 andrea@suse.de
[PATCH] orphaned pagecache memleak fix
Chris found that with data journaling a reiserfs pagecache may be truncate
while still pinned. The truncation removes the page->mapping, but the page
is still listed in the VM queues because it still has buffers. Then during
the journaling process, a buffer is marked dirty and that sets the PG_dirty
bitflag as well (in mark_buffer_dirty). After that the page is leaked
because it's both dirty and without a mapping.
So we must allow pages without mapping and dirty to reach the PagePrivate
check. The page->mapping will be checked again right after the PagePrivate
check.
Signed-off-by: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
mm/vmscan.c 1.247 2005/03/13 15:29:39 andrea@suse.de
orphaned pagecache memleak fix
----------------------------------------------------------------------
But as I just checked, reiser4 also may return error from
->invalidatepage(), with page pinned by transaction, so there is a
problem indeed.
[...]
>
> I think it would be better to make ->invalidatepage always succeed though.
> The situation is probably rare.
What about the following:
----------------------------------------------------------------------
diff -u bk-linux-2.5/Documentation/filesystems/Locking bk-linux/Documentation/filesystems/Locking
--- bk-linux-2.5/Documentation/filesystems/Locking 2005-04-04 19:40:53.000000000 +0400
+++ bk-linux/Documentation/filesystems/Locking 2005-04-06 15:57:46.000000000 +0400
@@ -266,10 +266,13 @@
instances do not actually need the BKL. Please, keep it that way and don't
breed new callers.
- ->invalidatepage() is called when the filesystem must attempt to drop
-some or all of the buffers from the page when it is being truncated. It
-returns zero on success. If ->invalidatepage is zero, the kernel uses
-block_invalidatepage() instead.
+ ->invalidatepage() is called when whole page or its portion is invalidated
+during truncate. PG_locked and PG_writeback bits of the page are acquired by
+the current thread before calling ->invalidatepage(), so it is guaranteed that
+no IO against this page is going on. Result of ->invalidatepage() is ignored
+and page is unconditionally removed from the mapping. File system has to
+either release all additional references to the page or to remove the page
+from ->lru list and to track its lifetime.
->releasepage() is called when the kernel is about to try to drop the
buffers from the page in preparation for freeing it. It returns zero to
----------------------------------------------------------------------
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-06 12:06 ` Nikita Danilov
@ 2005-04-06 19:27 ` Andrew Morton
2005-04-06 21:07 ` Nikita Danilov
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2005-04-06 19:27 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Andrea, linux-mm
Nikita Danilov <nikita@clusterfs.com> wrote:
>
> > I think it would be better to make ->invalidatepage always succeed though.
> > The situation is probably rare.
>
> What about the following:
> ----------------------------------------------------------------------
> diff -u bk-linux-2.5/Documentation/filesystems/Locking bk-linux/Documentation/filesystems/Locking
> --- bk-linux-2.5/Documentation/filesystems/Locking 2005-04-04 19:40:53.000000000 +0400
> +++ bk-linux/Documentation/filesystems/Locking 2005-04-06 15:57:46.000000000 +0400
> @@ -266,10 +266,13 @@
> instances do not actually need the BKL. Please, keep it that way and don't
> breed new callers.
>
> - ->invalidatepage() is called when the filesystem must attempt to drop
> -some or all of the buffers from the page when it is being truncated. It
> -returns zero on success. If ->invalidatepage is zero, the kernel uses
> -block_invalidatepage() instead.
> + ->invalidatepage() is called when whole page or its portion is invalidated
> +during truncate. PG_locked and PG_writeback bits of the page are acquired by
> +the current thread before calling ->invalidatepage(), so it is guaranteed that
> +no IO against this page is going on. Result of ->invalidatepage() is ignored
> +and page is unconditionally removed from the mapping. File system has to
> +either release all additional references to the page or to remove the page
> +from ->lru list and to track its lifetime.
I'd prefer to say "the fs _must_ release the page's private metadata,
unless, as a special concession to block-backed filesystems, that happens
to be buffer_heads".
Not for any deep reason: it's just that thus-far we've avoided fiddling
witht he LRU queues in filesystems and it'd be nice to retain that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-06 19:27 ` Andrew Morton
@ 2005-04-06 21:07 ` Nikita Danilov
2005-04-06 21:34 ` Andrew Morton
0 siblings, 1 reply; 9+ messages in thread
From: Nikita Danilov @ 2005-04-06 21:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andrea, linux-mm, Chris Mason
Andrew Morton writes:
[...]
>
> I'd prefer to say "the fs _must_ release the page's private metadata,
> unless, as a special concession to block-backed filesystems, that happens
> to be buffer_heads".
But this will legalize try_to_free_buffers() hack instead of outlawing
it. The right way is to fix reiserfs v3 (and ext3), unless Andrea or
Chris know the reason why this is impossible to do.
>
> Not for any deep reason: it's just that thus-far we've avoided fiddling
> witht he LRU queues in filesystems and it'd be nice to retain that.
What about do_invalidatepage() removing page from ->lru when
->invalidatepage() returns error?
Nikita.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-06 7:58 ` Andrew Morton
2005-04-06 12:06 ` Nikita Danilov
@ 2005-04-06 21:12 ` Chris Mason
2005-04-06 21:30 ` Andrew Morton
1 sibling, 1 reply; 9+ messages in thread
From: Chris Mason @ 2005-04-06 21:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: Nikita Danilov, Andrea, linux-mm
On Wednesday 06 April 2005 03:58, Andrew Morton wrote:
> > - wouldn't it be simpler to unconditionally remove page from LRU in
> > ->invalidatepage()?
>
> I guess that's an option, yes. If the fs cannot successfully invalidate
> the page then it can either block (as described above) or remove the page
> from the LRU. The fs then wholly owns the page.
>
> I think it would be better to make ->invalidatepage always succeed though.
> The situation is probably rare.
In data=journal it isn't rare at all. Dropping the page from the lru would be
the best solution I think.
-chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-06 21:12 ` Chris Mason
@ 2005-04-06 21:30 ` Andrew Morton
2005-04-06 21:50 ` Chris Mason
0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2005-04-06 21:30 UTC (permalink / raw)
To: Chris Mason; +Cc: nikita, Andrea, linux-mm
Chris Mason <mason@suse.com> wrote:
>
> On Wednesday 06 April 2005 03:58, Andrew Morton wrote:
>
> > > - wouldn't it be simpler to unconditionally remove page from LRU in
> > > ->invalidatepage()?
> >
> > I guess that's an option, yes. If the fs cannot successfully invalidate
> > the page then it can either block (as described above) or remove the page
> > from the LRU. The fs then wholly owns the page.
> >
> > I think it would be better to make ->invalidatepage always succeed though.
> > The situation is probably rare.
>
> In data=journal it isn't rare at all. Dropping the page from the lru would be
> the best solution I think.
>
Does that mean that my printk comes out?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-06 21:07 ` Nikita Danilov
@ 2005-04-06 21:34 ` Andrew Morton
0 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2005-04-06 21:34 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Andrea, linux-mm, Mason
Nikita Danilov <nikita@clusterfs.com> wrote:
>
> >
> > Not for any deep reason: it's just that thus-far we've avoided fiddling
> > witht he LRU queues in filesystems and it'd be nice to retain that.
>
> What about do_invalidatepage() removing page from ->lru when
> ->invalidatepage() returns error?
That could be made to work, I guess.
It's a behavioural change though, which might result in really-hard-to-find
and slow memory leaks in filesystems which were previously working OK.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: "orphaned pagecache memleak fix" question.
2005-04-06 21:30 ` Andrew Morton
@ 2005-04-06 21:50 ` Chris Mason
0 siblings, 0 replies; 9+ messages in thread
From: Chris Mason @ 2005-04-06 21:50 UTC (permalink / raw)
To: Andrew Morton; +Cc: nikita, Andrea, linux-mm
On Wednesday 06 April 2005 17:30, Andrew Morton wrote:
> Chris Mason <mason@suse.com> wrote:
> > On Wednesday 06 April 2005 03:58, Andrew Morton wrote:
> > > > - wouldn't it be simpler to unconditionally remove page from LRU in
> > > > ->invalidatepage()?
> > >
> > > I guess that's an option, yes. If the fs cannot successfully
> > > invalidate the page then it can either block (as described above) or
> > > remove the page from the LRU. The fs then wholly owns the page.
> > >
> > > I think it would be better to make ->invalidatepage always succeed
> > > though. The situation is probably rare.
> >
> > In data=journal it isn't rare at all. Dropping the page from the lru
> > would be the best solution I think.
>
> Does that mean that my printk comes out?
To trigger the printk, you've got to stand on your head (at least for reiser3)
1) data=journal needs to log a file data buffer
2) The transaction starts to commit
3) Before the committing transaction calls mark_buffer_dirty on the data
buffer, the file is truncated
4) The transaction commit calls mark_buffer_dirty, but only if the blocks
corresponding to this buffer have not been freed on disk.
It's the filesystem truncate call that frees the blocks, so the commit
operation has to pop up in between the truncate_inodes_pages and
inode->i_op->truncate(). The only way I found to make this happen reliably
is to sprinkle schedules and busy loops into critical sections.
-chris
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-04-06 21:50 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-04-05 16:02 "orphaned pagecache memleak fix" question Nikita Danilov
2005-04-06 7:58 ` Andrew Morton
2005-04-06 12:06 ` Nikita Danilov
2005-04-06 19:27 ` Andrew Morton
2005-04-06 21:07 ` Nikita Danilov
2005-04-06 21:34 ` Andrew Morton
2005-04-06 21:12 ` Chris Mason
2005-04-06 21:30 ` Andrew Morton
2005-04-06 21:50 ` Chris Mason
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox