* Re: 2.6.19 file content corruption on ext3
[not found] ` <20061216184310.GA891@unjust.cyrius.com>
@ 2006-12-16 19:18 ` Hugh Dickins
2006-12-16 21:29 ` Peter Zijlstra
2006-12-17 13:52 ` Jan Kara
0 siblings, 2 replies; 4+ messages in thread
From: Hugh Dickins @ 2006-12-16 19:18 UTC (permalink / raw)
To: Martin Michlmayr
Cc: Marc Haber, Jan Kara, Peter Zijlstra, linux-mm, linux-kernel
On Sat, 16 Dec 2006, Martin Michlmayr wrote:
> * Marc Haber <mh+linux-kernel@zugschlus.de> [2006-12-09 10:26]:
> > Unfortunately, I am lacking the knowledge needed to do this in an
> > informed way. I am neither familiar enough with git nor do I possess
> > the necessary C powers.
>
> I wonder if what you're seein is related to
> http://lkml.org/lkml/2006/12/16/73
>
> You said that you don't see any corruption with 2.6.18. Can you try
> to apply the patch from
> http://www2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
> to 2.6.18 to see if the corruption shows up?
I did wonder about the very first hunk of Peter's patch, where the
mapping->private_lock is unlocked earlier now in try_to_free_buffers,
before the clear_page_dirty. I'm not at all familiar with that area,
I wonder if Jan has looked at that change, and might be able to say
whether it's good or not (earlier he worried about his JBD changes,
but they wouldn't be implicated if just 2.6.18+Peter's gives trouble).
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.19 file content corruption on ext3
2006-12-16 19:18 ` 2.6.19 file content corruption on ext3 Hugh Dickins
@ 2006-12-16 21:29 ` Peter Zijlstra
2006-12-16 23:08 ` Hugh Dickins
2006-12-17 13:52 ` Jan Kara
1 sibling, 1 reply; 4+ messages in thread
From: Peter Zijlstra @ 2006-12-16 21:29 UTC (permalink / raw)
To: Hugh Dickins
Cc: Martin Michlmayr, Marc Haber, Jan Kara, linux-mm, linux-kernel
On Sat, 2006-12-16 at 19:18 +0000, Hugh Dickins wrote:
> On Sat, 16 Dec 2006, Martin Michlmayr wrote:
> > * Marc Haber <mh+linux-kernel@zugschlus.de> [2006-12-09 10:26]:
> > > Unfortunately, I am lacking the knowledge needed to do this in an
> > > informed way. I am neither familiar enough with git nor do I possess
> > > the necessary C powers.
> >
> > I wonder if what you're seein is related to
> > http://lkml.org/lkml/2006/12/16/73
> >
> > You said that you don't see any corruption with 2.6.18. Can you try
> > to apply the patch from
> > http://www2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
> > to 2.6.18 to see if the corruption shows up?
>
> I did wonder about the very first hunk of Peter's patch, where the
> mapping->private_lock is unlocked earlier now in try_to_free_buffers,
> before the clear_page_dirty. I'm not at all familiar with that area,
> I wonder if Jan has looked at that change, and might be able to say
> whether it's good or not (earlier he worried about his JBD changes,
> but they wouldn't be implicated if just 2.6.18+Peter's gives trouble).
fs/buffers.c:2775
/*
* try_to_free_buffers() checks if all the buffers on this particular page
* are unused, and releases them if so.
*
* Exclusion against try_to_free_buffers may be obtained by either
* locking the page or by holding its mapping's private_lock.
*
* If the page is dirty but all the buffers are clean then we need to
* be sure to mark the page clean as well. This is because the page
* may be against a block device, and a later reattachment of buffers
* to a dirty page will set *all* buffers dirty. Which would corrupt
* filesystem data on the same device.
*
* The same applies to regular filesystem pages: if all the buffers are
* clean then we set the page clean and proceed. To do that, we require
* total exclusion from __set_page_dirty_buffers(). That is obtained with
* private_lock.
*
* try_to_free_buffers() is non-blocking.
*/
Note the 3th paragraph. Would I have opened up a race by moving that
unlock upwards, such that it is possible to re-attach buffers to the
page before having it marked clean; which according to this text will
mark those buffers dirty and cause data corruption?
Hmm, how to go about something like this:
---
Moving the cleaning of the page out from under the private_lock opened
up a window where newly attached buffer might still see the page dirty
status and were thus marked (incorrectly) dirty themselves; resulting in
filesystem data corruption.
Close this by moving the cleaning of the page inside of the private_lock
scope again. However it is not possible to call page_mkclean() from
within the private_lock (this violates locking order); thus introduce a
variant of test_clear_page_dirty() that does not call page_mkclean() and
call it ourselves when we did do clean the page and call it outside of
the private_lock.
This is still safe because the page is still locked by means of
PG_locked.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
fs/buffer.c | 11 +++++++++--
include/linux/page-flags.h | 1 +
mm/page-writeback.c | 10 ++++++++--
3 files changed, 18 insertions(+), 4 deletions(-)
Index: linux-2.6-git/fs/buffer.c
===================================================================
--- linux-2.6-git.orig/fs/buffer.c 2006-12-16 22:18:24.000000000 +0100
+++ linux-2.6-git/fs/buffer.c 2006-12-16 22:22:17.000000000 +0100
@@ -42,6 +42,7 @@
#include <linux/bitops.h>
#include <linux/mpage.h>
#include <linux/bit_spinlock.h>
+#include <linux/rmap.h>
static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
static void invalidate_bh_lrus(void);
@@ -2832,6 +2833,7 @@ int try_to_free_buffers(struct page *pag
struct address_space * const mapping = page->mapping;
struct buffer_head *buffers_to_free = NULL;
int ret = 0;
+ int must_clean = 0;
BUG_ON(!PageLocked(page));
if (PageWriteback(page))
@@ -2844,7 +2846,6 @@ int try_to_free_buffers(struct page *pag
spin_lock(&mapping->private_lock);
ret = drop_buffers(page, &buffers_to_free);
- spin_unlock(&mapping->private_lock);
if (ret) {
/*
* If the filesystem writes its buffers by hand (eg ext3)
@@ -2858,9 +2859,15 @@ int try_to_free_buffers(struct page *pag
* the page's buffers clean. We discover that here and clean
* the page also.
*/
- if (test_clear_page_dirty(page))
+ if (__test_clear_page_dirty(page, 0)) {
task_io_account_cancelled_write(PAGE_CACHE_SIZE);
+ if (mapping_cap_account_dirty(mapping))
+ must_clean = 1;
+ }
}
+ spin_unlock(&mapping->private_lock);
+ if (must_clean)
+ page_mkclean(page);
out:
if (buffers_to_free) {
struct buffer_head *bh = buffers_to_free;
Index: linux-2.6-git/include/linux/page-flags.h
===================================================================
--- linux-2.6-git.orig/include/linux/page-flags.h 2006-12-16 22:19:56.000000000 +0100
+++ linux-2.6-git/include/linux/page-flags.h 2006-12-16 22:20:07.000000000 +0100
@@ -253,6 +253,7 @@ static inline void SetPageUptodate(struc
struct page; /* forward declaration */
+int __test_clear_page_dirty(struct page *page, int do_clean);
int test_clear_page_dirty(struct page *page);
int test_clear_page_writeback(struct page *page);
int test_set_page_writeback(struct page *page);
Index: linux-2.6-git/mm/page-writeback.c
===================================================================
--- linux-2.6-git.orig/mm/page-writeback.c 2006-12-16 22:18:18.000000000 +0100
+++ linux-2.6-git/mm/page-writeback.c 2006-12-16 22:19:42.000000000 +0100
@@ -854,7 +854,7 @@ EXPORT_SYMBOL(set_page_dirty_lock);
* Clear a page's dirty flag, while caring for dirty memory accounting.
* Returns true if the page was previously dirty.
*/
-int test_clear_page_dirty(struct page *page)
+int __test_clear_page_dirty(struct page *page, int do_clean)
{
struct address_space *mapping = page_mapping(page);
unsigned long flags;
@@ -872,7 +872,8 @@ int test_clear_page_dirty(struct page *p
* page is locked, which pins the address_space
*/
if (mapping_cap_account_dirty(mapping)) {
- page_mkclean(page);
+ if (do_clean)
+ page_mkclean(page);
dec_zone_page_state(page, NR_FILE_DIRTY);
}
return 1;
@@ -880,6 +881,11 @@ int test_clear_page_dirty(struct page *p
write_unlock_irqrestore(&mapping->tree_lock, flags);
return 0;
}
+
+int test_clear_page_dirty(struct page *page)
+{
+ return __test_clear_page_dirty(page, 1);
+}
EXPORT_SYMBOL(test_clear_page_dirty);
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.19 file content corruption on ext3
2006-12-16 21:29 ` Peter Zijlstra
@ 2006-12-16 23:08 ` Hugh Dickins
0 siblings, 0 replies; 4+ messages in thread
From: Hugh Dickins @ 2006-12-16 23:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Martin Michlmayr, Marc Haber, Jan Kara, linux-mm, linux-kernel
On Sat, 16 Dec 2006, Peter Zijlstra wrote:
> Moving the cleaning of the page out from under the private_lock opened
> up a window where newly attached buffer might still see the page dirty
> status and were thus marked (incorrectly) dirty themselves; resulting in
> filesystem data corruption.
I'm not going to pretend to understand the buffers issues here:
people thought that change was safe originally, and I can't say
it's not - it just stood out as a potentially weakening change.
The patch you propose certainly looks like a good way out, if
that moved unlock really is a problem: your patch is very well
worth trying by those people seeing their corruption problems,
let's wait to hear their feedback.
Thanks!
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: 2.6.19 file content corruption on ext3
2006-12-16 19:18 ` 2.6.19 file content corruption on ext3 Hugh Dickins
2006-12-16 21:29 ` Peter Zijlstra
@ 2006-12-17 13:52 ` Jan Kara
1 sibling, 0 replies; 4+ messages in thread
From: Jan Kara @ 2006-12-17 13:52 UTC (permalink / raw)
To: Hugh Dickins
Cc: Martin Michlmayr, Marc Haber, Jan Kara, Peter Zijlstra, linux-mm,
linux-kernel, Mikael Magnusson
> On Sat, 16 Dec 2006, Martin Michlmayr wrote:
> > * Marc Haber <mh+linux-kernel@zugschlus.de> [2006-12-09 10:26]:
> > > Unfortunately, I am lacking the knowledge needed to do this in an
> > > informed way. I am neither familiar enough with git nor do I possess
> > > the necessary C powers.
> >
> > I wonder if what you're seein is related to
> > http://lkml.org/lkml/2006/12/16/73
> >
> > You said that you don't see any corruption with 2.6.18. Can you try
> > to apply the patch from
> > http://www2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d08b3851da41d0ee60851f2c75b118e1f7a5fc89
> > to 2.6.18 to see if the corruption shows up?
>
> I did wonder about the very first hunk of Peter's patch, where the
> mapping->private_lock is unlocked earlier now in try_to_free_buffers,
> before the clear_page_dirty. I'm not at all familiar with that area,
> I wonder if Jan has looked at that change, and might be able to say
> whether it's good or not (earlier he worried about his JBD changes,
> but they wouldn't be implicated if just 2.6.18+Peter's gives trouble).
Thanks for pointer. I was not aware of this change, I'll have a look
at it on Monday. Actually Mickael has checked that he sees corruption
even if all the JBD changes are backed out so I was going to look for
other changes in VFS that could cause that.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-12-17 13:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20061207155740.GC1434@torres.l21.ma.zugschlus.de>
[not found] ` <4578465D.7030104@cfl.rr.com>
[not found] ` <20061209092639.GA15443@torres.l21.ma.zugschlus.de>
[not found] ` <20061216184310.GA891@unjust.cyrius.com>
2006-12-16 19:18 ` 2.6.19 file content corruption on ext3 Hugh Dickins
2006-12-16 21:29 ` Peter Zijlstra
2006-12-16 23:08 ` Hugh Dickins
2006-12-17 13:52 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox