* [patch] fix periodic superblock dirty inode flushing
@ 2007-07-12 4:21 Ken Chen
2007-07-12 19:05 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Ken Chen @ 2007-07-12 4:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
[-- Attachment #1: Type: text/plain, Size: 4898 bytes --]
Current -mm tree has bucketful of bug fixes in periodic writeback path.
However, we still hit a glitch where dirty pages on a given inode aren't
completely flushed to the disk, and system will accumulate large amount
of dirty pages pass beyond what dirty_expire_interval is designed for.
The problem is __sync_single_inode() will move inode to sb->s_dirty list
even when there are more pending dirty pages on that inode. If there is
another inode with small amount of dirty pages, we hit a case where loop
iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0.
Thus leaving the inode that has large amount of dirty pages behind and it has
to wait for another dirty_writeback_interval before we flush it again. It
effectively only writeout MAX_WRITEBACK_PAGES every dirty_writeback_interval.
If the rate of dirtying is sufficiently high, system will start accumulate
large amount of dirty pages.
So fix it by having another sb->s_more_io list to park the inode while we
iterate through sb->s_io and allow each dirty inode resides on that sb has
an equal chance of flushing some amount of dirty pages.
Signed-off-by: Ken Chen <kenchen@google.com>
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6d961d1..a0cf041 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -140,25 +140,11 @@ static int write_inode(struct inode *ino
}
/*
- * Redirty an inode, but mark it as the very next-to-be-written inode on its
- * superblock's dirty-inode list.
- * We need to preserve s_dirty's reverse-time-orderedness, so we cheat by
- * setting this inode's dirtied_when to the same value as that of the inode
- * which is presently head-of-list, if present head-of-list is newer than this
- * inode. (head-of-list is the least-recently-dirtied inode: the oldest one).
+ * requeue inode for re-scanning after sb->s_io list is exhausted.
*/
-static void redirty_head(struct inode *inode)
+static void requeue_io(struct inode *inode)
{
- struct super_block *sb = inode->i_sb;
-
- if (!list_empty(&sb->s_dirty)) {
- struct inode *head_inode;
-
- head_inode = list_entry(sb->s_dirty.prev, struct inode, i_list);
- if (time_after(inode->dirtied_when, head_inode->dirtied_when))
- inode->dirtied_when = head_inode->dirtied_when;
- }
- list_move_tail(&inode->i_list, &sb->s_dirty);
+ list_move(&inode->i_list, &inode->i_sb->s_more_io);
}
/*
@@ -254,7 +240,7 @@ __sync_single_inode(struct inode *inode,
* uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
- redirty_head(inode);
+ requeue_io(inode);
} else {
/*
* Otherwise fully redirty the inode so that
@@ -314,7 +300,7 @@ __writeback_single_inode(struct inode *i
* on s_io. We'll have another go at writing back this inode
* when the s_dirty iodes get moved back onto s_io.
*/
- redirty_head(inode);
+ requeue_io(inode);
/*
* Even if we don't actually write the inode itself here,
@@ -409,14 +395,14 @@ sync_sb_inodes(struct super_block *sb, s
wbc->encountered_congestion = 1;
if (!sb_is_blkdev_sb(sb))
break; /* Skip a congested fs */
- redirty_head(inode);
+ requeue_io(inode);
continue; /* Skip a congested blockdev */
}
if (wbc->bdi && bdi != wbc->bdi) {
if (!sb_is_blkdev_sb(sb))
break; /* fs has the wrong queue */
- redirty_head(inode);
+ requeue_io(inode);
continue; /* blockdev has wrong queue */
}
@@ -426,8 +412,10 @@ sync_sb_inodes(struct super_block *sb, s
/* Was this inode dirtied too recently? */
if (wbc->older_than_this && time_after(inode->dirtied_when,
- *wbc->older_than_this))
+ *wbc->older_than_this)) {
+ list_splice_init(&sb->s_io, sb->s_dirty.prev);
break;
+ }
/* Is another pdflush already flushing this queue? */
if (current_is_pdflush() && !writeback_acquire(bdi))
@@ -457,6 +445,10 @@ sync_sb_inodes(struct super_block *sb, s
if (wbc->nr_to_write <= 0)
break;
}
+
+ if (list_empty(&sb->s_io))
+ list_splice_init(&sb->s_more_io, &sb->s_io);
+
return; /* Leave any unwritten inodes on s_io */
}
diff --git a/fs/super.c b/fs/super.c
index 5260d62..8c6fa35 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -67,6 +67,7 @@ static struct super_block *alloc_super(s
}
INIT_LIST_HEAD(&s->s_dirty);
INIT_LIST_HEAD(&s->s_io);
+ INIT_LIST_HEAD(&s->s_more_io);
INIT_LIST_HEAD(&s->s_files);
INIT_LIST_HEAD(&s->s_instances);
INIT_HLIST_HEAD(&s->s_anon);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b3ae77c..e135913 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -934,6 +934,7 @@ #endif
struct list_head s_inodes; /* all inodes */
struct list_head s_dirty; /* dirty inodes */
struct list_head s_io; /* parked for writeback */
+ struct list_head s_more_io; /* parked for more writeback */
struct hlist_head s_anon; /* anonymous dentries for (nfs) exporting */
struct list_head s_files;
[-- Attachment #2: wb-s_more_io.patch --]
[-- Type: text/x-patch, Size: 3733 bytes --]
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6d961d1..a0cf041 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -140,25 +140,11 @@ static int write_inode(struct inode *ino
}
/*
- * Redirty an inode, but mark it as the very next-to-be-written inode on its
- * superblock's dirty-inode list.
- * We need to preserve s_dirty's reverse-time-orderedness, so we cheat by
- * setting this inode's dirtied_when to the same value as that of the inode
- * which is presently head-of-list, if present head-of-list is newer than this
- * inode. (head-of-list is the least-recently-dirtied inode: the oldest one).
+ * requeue inode for re-scanning after sb->s_io list is exhausted.
*/
-static void redirty_head(struct inode *inode)
+static void requeue_io(struct inode *inode)
{
- struct super_block *sb = inode->i_sb;
-
- if (!list_empty(&sb->s_dirty)) {
- struct inode *head_inode;
-
- head_inode = list_entry(sb->s_dirty.prev, struct inode, i_list);
- if (time_after(inode->dirtied_when, head_inode->dirtied_when))
- inode->dirtied_when = head_inode->dirtied_when;
- }
- list_move_tail(&inode->i_list, &sb->s_dirty);
+ list_move(&inode->i_list, &inode->i_sb->s_more_io);
}
/*
@@ -254,7 +240,7 @@ __sync_single_inode(struct inode *inode,
* uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
- redirty_head(inode);
+ requeue_io(inode);
} else {
/*
* Otherwise fully redirty the inode so that
@@ -314,7 +300,7 @@ __writeback_single_inode(struct inode *i
* on s_io. We'll have another go at writing back this inode
* when the s_dirty iodes get moved back onto s_io.
*/
- redirty_head(inode);
+ requeue_io(inode);
/*
* Even if we don't actually write the inode itself here,
@@ -409,14 +395,14 @@ sync_sb_inodes(struct super_block *sb, s
wbc->encountered_congestion = 1;
if (!sb_is_blkdev_sb(sb))
break; /* Skip a congested fs */
- redirty_head(inode);
+ requeue_io(inode);
continue; /* Skip a congested blockdev */
}
if (wbc->bdi && bdi != wbc->bdi) {
if (!sb_is_blkdev_sb(sb))
break; /* fs has the wrong queue */
- redirty_head(inode);
+ requeue_io(inode);
continue; /* blockdev has wrong queue */
}
@@ -426,8 +412,10 @@ sync_sb_inodes(struct super_block *sb, s
/* Was this inode dirtied too recently? */
if (wbc->older_than_this && time_after(inode->dirtied_when,
- *wbc->older_than_this))
+ *wbc->older_than_this)) {
+ list_splice_init(&sb->s_io, sb->s_dirty.prev);
break;
+ }
/* Is another pdflush already flushing this queue? */
if (current_is_pdflush() && !writeback_acquire(bdi))
@@ -457,6 +445,10 @@ sync_sb_inodes(struct super_block *sb, s
if (wbc->nr_to_write <= 0)
break;
}
+
+ if (list_empty(&sb->s_io))
+ list_splice_init(&sb->s_more_io, &sb->s_io);
+
return; /* Leave any unwritten inodes on s_io */
}
diff --git a/fs/super.c b/fs/super.c
index 5260d62..8c6fa35 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -67,6 +67,7 @@ static struct super_block *alloc_super(s
}
INIT_LIST_HEAD(&s->s_dirty);
INIT_LIST_HEAD(&s->s_io);
+ INIT_LIST_HEAD(&s->s_more_io);
INIT_LIST_HEAD(&s->s_files);
INIT_LIST_HEAD(&s->s_instances);
INIT_HLIST_HEAD(&s->s_anon);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b3ae77c..e135913 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -934,6 +934,7 @@ #endif
struct list_head s_inodes; /* all inodes */
struct list_head s_dirty; /* dirty inodes */
struct list_head s_io; /* parked for writeback */
+ struct list_head s_more_io; /* parked for more writeback */
struct hlist_head s_anon; /* anonymous dentries for (nfs) exporting */
struct list_head s_files;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
2007-07-12 4:21 [patch] fix periodic superblock dirty inode flushing Ken Chen
@ 2007-07-12 19:05 ` Andrew Morton
2007-07-13 22:17 ` Ken Chen
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-07-12 19:05 UTC (permalink / raw)
To: Ken Chen; +Cc: linux-mm
On Wed, 11 Jul 2007 21:21:19 -0700
"Ken Chen" <kenchen@google.com> wrote:
> Current -mm tree has bucketful of bug fixes in periodic writeback path.
> However, we still hit a glitch where dirty pages on a given inode aren't
> completely flushed to the disk, and system will accumulate large amount
> of dirty pages pass beyond what dirty_expire_interval is designed for.
>
> The problem is __sync_single_inode() will move inode to sb->s_dirty list
> even when there are more pending dirty pages on that inode. If there is
> another inode with small amount of dirty pages, we hit a case where loop
> iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0.
> Thus leaving the inode that has large amount of dirty pages behind and it has
> to wait for another dirty_writeback_interval before we flush it again. It
> effectively only writeout MAX_WRITEBACK_PAGES every dirty_writeback_interval.
> If the rate of dirtying is sufficiently high, system will start accumulate
> large amount of dirty pages.
>
> So fix it by having another sb->s_more_io list to park the inode while we
> iterate through sb->s_io and allow each dirty inode resides on that sb has
> an equal chance of flushing some amount of dirty pages.
>
Thanks. Was this tested in combination with check_dirty_inode_list.patch,
to make sure that the time-orderedness is being retained?
From: Andrew Morton <akpm@linux-foundation.org>
The per-superblock dirty-inode list super_block.s_dirty is supposed to be
sorted in reverse order of each inode's time-of-first-dirtying. This is so
that the kupdate function can avoid having to walk all the dirty inodes on the
list: it terminates the search as soon as it finds an inode which was dirtied
less than 30 seconds ago (dirty_expire_centisecs).
We have a bunch of several-year-old bugs which cause that list to not be in
the correct reverse-time-order. The result of this is that under certain
obscure circumstances, inodes get stuck and basically never get written back.
It has been reported a couple of times, but nobody really cared much because
most people use ordered-mode journalling filesystems, which take care of the
writeback independently. Plus we will _eventually_ get onto these inodes even
when the list is out of order, and a /bin/sync will still work OK.
However this is a pretty important data-integrity issue for filesystems such
as ext2.
As preparation for fixing these bugs, this patch adds a pile of fantastically
expensive debugging code which checks the sanity of the s_dirty list all over
the place, so we find out as soon as it goes bad.
The debugging code is controlled by /proc/sys/fs/inode_debug, which defaults
to off. The debugging will disable itself whenever it detects a misordering,
to avoid log spew.
We can remove all this code later.
Cc: Mike Waychison <mikew@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/fs-writeback.c | 62 +++++++++++++++++++++++++++++++++++-
include/linux/writeback.h | 1
kernel/sysctl.c | 8 ++++
3 files changed, 70 insertions(+), 1 deletion(-)
diff -puN fs/fs-writeback.c~check_dirty_inode_list fs/fs-writeback.c
--- a/fs/fs-writeback.c~check_dirty_inode_list
+++ a/fs/fs-writeback.c
@@ -24,6 +24,57 @@
#include <linux/buffer_head.h>
#include "internal.h"
+int sysctl_inode_debug __read_mostly;
+
+static int __check(struct super_block *sb, int print_stuff)
+{
+ struct list_head *cursor = &sb->s_dirty;
+ unsigned long dirtied_when = 0;
+
+ while ((cursor = cursor->prev) != &sb->s_dirty) {
+ struct inode *inode = list_entry(cursor, struct inode, i_list);
+ if (print_stuff) {
+ printk("%p:%lu\n", inode, inode->dirtied_when);
+ } else {
+ if (dirtied_when &&
+ time_before(inode->dirtied_when, dirtied_when))
+ return 1;
+ dirtied_when = inode->dirtied_when;
+ }
+ }
+ return 0;
+}
+
+static void __check_dirty_inode_list(struct super_block *sb,
+ struct inode *inode, const char *file, int line)
+{
+ if (!sysctl_inode_debug)
+ return;
+
+ if (__check(sb, 0)) {
+ sysctl_inode_debug = 0;
+ if (inode)
+ printk("%s:%d: s_dirty got screwed up. inode=%p:%lu\n",
+ file, line, inode, inode->dirtied_when);
+ else
+ printk("%s:%d: s_dirty got screwed up\n", file, line);
+ __check(sb, 1);
+ }
+}
+
+#define check_dirty_inode_list(sb) \
+ do { \
+ if (unlikely(sysctl_inode_debug)) \
+ __check_dirty_inode_list(sb, NULL, __FILE__, __LINE__); \
+ } while (0)
+
+#define check_dirty_inode(inode) \
+ do { \
+ if (unlikely(sysctl_inode_debug)) \
+ __check_dirty_inode_list(inode->i_sb, inode, \
+ __FILE__, __LINE__); \
+ } while (0)
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -122,8 +173,10 @@ void __mark_inode_dirty(struct inode *in
* reposition it (that would break s_dirty time-ordering).
*/
if (!was_dirty) {
+ check_dirty_inode(inode);
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
}
out:
@@ -152,6 +205,7 @@ static void redirty_tail(struct inode *i
{
struct super_block *sb = inode->i_sb;
+ check_dirty_inode(inode);
if (!list_empty(&sb->s_dirty)) {
struct inode *tail_inode;
@@ -161,6 +215,7 @@ static void redirty_tail(struct inode *i
inode->dirtied_when = jiffies;
}
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
/*
@@ -374,8 +429,11 @@ int generic_sync_sb_inodes(struct super_
spin_lock(&inode_lock);
- if (!wbc->for_kupdate || list_empty(&sb->s_io))
+ if (!wbc->for_kupdate || list_empty(&sb->s_io)) {
+ check_dirty_inode_list(sb);
list_splice_init(&sb->s_dirty, &sb->s_io);
+ check_dirty_inode_list(sb);
+ }
while (!list_empty(&sb->s_io)) {
int err;
@@ -440,8 +498,10 @@ int generic_sync_sb_inodes(struct super_
if (!ret)
ret = err;
if (wbc->sync_mode == WB_SYNC_HOLD) {
+ check_dirty_inode(inode);
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &sb->s_dirty);
+ check_dirty_inode(inode);
}
if (current_is_pdflush())
writeback_release(bdi);
diff -puN include/linux/writeback.h~check_dirty_inode_list include/linux/writeback.h
--- a/include/linux/writeback.h~check_dirty_inode_list
+++ a/include/linux/writeback.h
@@ -140,5 +140,6 @@ void writeback_set_ratelimit(void);
extern int nr_pdflush_threads; /* Global so it can be exported to sysctl
read-only. */
+extern int sysctl_inode_debug;
#endif /* WRITEBACK_H */
diff -puN kernel/sysctl.c~check_dirty_inode_list kernel/sysctl.c
--- a/kernel/sysctl.c~check_dirty_inode_list
+++ a/kernel/sysctl.c
@@ -1231,6 +1231,14 @@ static ctl_table fs_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "inode_debug",
+ .data = &sysctl_inode_debug,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
#if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
{
.ctl_name = CTL_UNNUMBERED,
_
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
2007-07-12 19:05 ` Andrew Morton
@ 2007-07-13 22:17 ` Ken Chen
2007-07-17 0:01 ` Ken Chen
0 siblings, 1 reply; 10+ messages in thread
From: Ken Chen @ 2007-07-13 22:17 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
On 7/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> Was this tested in combination with check_dirty_inode_list.patch,
> to make sure that the time-orderedness is being retained?
I think I tested with the debug patch. And just to be sure, I ran the
test again with the time-order check in place. It passed the test.
- Ken
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
2007-07-13 22:17 ` Ken Chen
@ 2007-07-17 0:01 ` Ken Chen
2007-07-17 0:15 ` Andrew Morton
[not found] ` <20070719025927.GA11874@mail.ustc.edu.cn>
0 siblings, 2 replies; 10+ messages in thread
From: Ken Chen @ 2007-07-17 0:01 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm
On 7/13/07, Ken Chen <kenchen@google.com> wrote:
> On 7/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > Was this tested in combination with check_dirty_inode_list.patch,
> > to make sure that the time-orderedness is being retained?
>
> I think I tested with the debug patch. And just to be sure, I ran the
> test again with the time-order check in place. It passed the test.
I ran some more tests over the weekend with the debug turned on. There
are a few fall out that the order-ness of sb-s_dirty is corrupted. We
probably should drop this patch until I figure out a real solution to
this.
One idea is to use rb-tree for sorting and use a in-tree dummy node as
a tree iterator. Do you think that will work better? I will hack on
that.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
2007-07-17 0:01 ` Ken Chen
@ 2007-07-17 0:15 ` Andrew Morton
[not found] ` <20070719025927.GA11874@mail.ustc.edu.cn>
1 sibling, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2007-07-17 0:15 UTC (permalink / raw)
To: Ken Chen; +Cc: linux-mm
On Mon, 16 Jul 2007 17:01:31 -0700
"Ken Chen" <kenchen@google.com> wrote:
> On 7/13/07, Ken Chen <kenchen@google.com> wrote:
> > On 7/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > > Was this tested in combination with check_dirty_inode_list.patch,
> > > to make sure that the time-orderedness is being retained?
> >
> > I think I tested with the debug patch. And just to be sure, I ran the
> > test again with the time-order check in place. It passed the test.
>
> I ran some more tests over the weekend with the debug turned on. There
> are a few fall out that the order-ness of sb-s_dirty is corrupted. We
> probably should drop this patch until I figure out a real solution to
> this.
drat.
> One idea is to use rb-tree for sorting and use a in-tree dummy node as
> a tree iterator. Do you think that will work better? I will hack on
> that.
Yeah, handling those list_heads is like juggling ten bars of soap.
I've long had vague thoughts that a new data structure is needed to fix all
this up. But I was thinking radix-tree because radix-trees have the very
important characteristic that you can remember where you were up to when
you drop the lock, so you can trivially restart the search at the correct
place. Although I never quiet worked out what an appropriate index would
be for that radix-tree.
I suppose we can do the same search-restarting with rb-trees, once we work
out what the index is.
It will all be a pretty big project - the *requirements* for that code are
long and complex, let alone the implementation, and the testing is tough.
Probably we'd be better off finding some nasty hack to (yet again) paper up
the existing code while we have time to think about a reimplementation.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
[not found] ` <20070719025927.GA11874@mail.ustc.edu.cn>
@ 2007-07-19 2:59 ` Fengguang Wu
2007-07-19 3:10 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Fengguang Wu @ 2007-07-19 2:59 UTC (permalink / raw)
To: Ken Chen; +Cc: Andrew Morton, linux-mm
On Mon, Jul 16, 2007 at 05:01:31PM -0700, Ken Chen wrote:
> On 7/13/07, Ken Chen <kenchen@google.com> wrote:
> >On 7/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> >> Was this tested in combination with check_dirty_inode_list.patch,
> >> to make sure that the time-orderedness is being retained?
> >
> >I think I tested with the debug patch. And just to be sure, I ran the
> >test again with the time-order check in place. It passed the test.
>
> I ran some more tests over the weekend with the debug turned on. There
> are a few fall out that the order-ness of sb-s_dirty is corrupted. We
> probably should drop this patch until I figure out a real solution to
> this.
>
> One idea is to use rb-tree for sorting and use a in-tree dummy node as
> a tree iterator. Do you think that will work better? I will hack on
> that.
Sorry if I'm not backgrounded.
But what's the problem of a list? If we always do the two actions
*together*:
1) update inode->dirtied_when
2) requeue inode in the correct place
the list will be in order.
linux-2.6.22-rc6-mm1/fs/fs-writeback.c obviously obeys this rule.
I don't see how can a new data structure make life easier.
1) and 2) should still be safeguarded, isn't it?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
2007-07-19 2:59 ` Fengguang Wu
@ 2007-07-19 3:10 ` Andrew Morton
[not found] ` <20070719080910.GA7459@mail.ustc.edu.cn>
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-07-19 3:10 UTC (permalink / raw)
To: Fengguang Wu; +Cc: Ken Chen, linux-mm
On Thu, 19 Jul 2007 10:59:27 +0800 Fengguang Wu <fengguang.wu@gmail.com> wrote:
> On Mon, Jul 16, 2007 at 05:01:31PM -0700, Ken Chen wrote:
> > On 7/13/07, Ken Chen <kenchen@google.com> wrote:
> > >On 7/12/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > >> Was this tested in combination with check_dirty_inode_list.patch,
> > >> to make sure that the time-orderedness is being retained?
> > >
> > >I think I tested with the debug patch. And just to be sure, I ran the
> > >test again with the time-order check in place. It passed the test.
> >
> > I ran some more tests over the weekend with the debug turned on. There
> > are a few fall out that the order-ness of sb-s_dirty is corrupted. We
> > probably should drop this patch until I figure out a real solution to
> > this.
> >
> > One idea is to use rb-tree for sorting and use a in-tree dummy node as
> > a tree iterator. Do you think that will work better? I will hack on
> > that.
>
> Sorry if I'm not backgrounded.
>
> But what's the problem of a list? If we always do the two actions
> *together*:
> 1) update inode->dirtied_when
> 2) requeue inode in the correct place
> the list will be in order.
> linux-2.6.22-rc6-mm1/fs/fs-writeback.c obviously obeys this rule.
>
> I don't see how can a new data structure make life easier.
> 1) and 2) should still be safeguarded, isn't it?
Well yes, the existing implementation does its best to work, and almost
does work correctly but it was really hard to do and it is hard to maintain.
Whereas if we had a better data structure it would be cleaner and easier to
implement and to maintain, I expect.
With an indexed data structure (ie: radix-tree or rbtree) the writeback
code can remember where it was up to in the ordered list of inodes so it
can drop locks, do writeback, remember where it was up to for the next
pass, etc.
Basically, the walk of the per-superblock inodes would follow the same
model as the walk of the per-inode pages. And the latter has worked out
*really* well. It would be great if the per-sb inode traversal was as
flexible and as powerful as the page walks.
Probably it never will be, because I suspect we'd need to order the inodes
by multiple indices. I hn't thought it through, really.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
[not found] ` <20070719080910.GA7459@mail.ustc.edu.cn>
@ 2007-07-19 8:09 ` Fengguang Wu
2007-07-19 8:18 ` Andrew Morton
0 siblings, 1 reply; 10+ messages in thread
From: Fengguang Wu @ 2007-07-19 8:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ken Chen, linux-mm
On Wed, Jul 18, 2007 at 08:10:18PM -0700, Andrew Morton wrote:
> On Thu, 19 Jul 2007 10:59:27 +0800 Fengguang Wu <fengguang.wu@gmail.com> wrote:
> > On Mon, Jul 16, 2007 at 05:01:31PM -0700, Ken Chen wrote:
> > > I ran some more tests over the weekend with the debug turned on. There
> > > are a few fall out that the order-ness of sb-s_dirty is corrupted. We
> > > probably should drop this patch until I figure out a real solution to
> > > this.
> > >
> > > One idea is to use rb-tree for sorting and use a in-tree dummy node as
> > > a tree iterator. Do you think that will work better? I will hack on
> > > that.
> >
> > Sorry if I'm not backgrounded.
> >
> > But what's the problem of a list? If we always do the two actions
> > *together*:
> > 1) update inode->dirtied_when
> > 2) requeue inode in the correct place
> > the list will be in order.
> > linux-2.6.22-rc6-mm1/fs/fs-writeback.c obviously obeys this rule.
> >
> > I don't see how can a new data structure make life easier.
> > 1) and 2) should still be safeguarded, isn't it?
>
> Well yes, the existing implementation does its best to work, and almost
> does work correctly but it was really hard to do and it is hard to maintain.
>
> Whereas if we had a better data structure it would be cleaner and easier to
> implement and to maintain, I expect.
>
> With an indexed data structure (ie: radix-tree or rbtree) the writeback
> code can remember where it was up to in the ordered list of inodes so it
> can drop locks, do writeback, remember where it was up to for the next
> pass, etc.
>
> Basically, the walk of the per-superblock inodes would follow the same
> model as the walk of the per-inode pages. And the latter has worked out
> *really* well. It would be great if the per-sb inode traversal was as
> flexible and as powerful as the page walks.
>
> Probably it never will be, because I suspect we'd need to order the inodes
> by multiple indices. I hn't thought it through, really.
Just one more possibility... an array of lists?
The array is cyclic and time-addressable, and
the lists can be ordered by other criterion(s).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
2007-07-19 8:09 ` Fengguang Wu
@ 2007-07-19 8:18 ` Andrew Morton
2007-07-19 22:18 ` David Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2007-07-19 8:18 UTC (permalink / raw)
To: Fengguang Wu; +Cc: Ken Chen, linux-mm
On Thu, 19 Jul 2007 16:09:10 +0800 Fengguang Wu <fengguang.wu@gmail.com> wrote:
> On Wed, Jul 18, 2007 at 08:10:18PM -0700, Andrew Morton wrote:
> > On Thu, 19 Jul 2007 10:59:27 +0800 Fengguang Wu <fengguang.wu@gmail.com> wrote:
> > > On Mon, Jul 16, 2007 at 05:01:31PM -0700, Ken Chen wrote:
> > > > I ran some more tests over the weekend with the debug turned on. There
> > > > are a few fall out that the order-ness of sb-s_dirty is corrupted. We
> > > > probably should drop this patch until I figure out a real solution to
> > > > this.
> > > >
> > > > One idea is to use rb-tree for sorting and use a in-tree dummy node as
> > > > a tree iterator. Do you think that will work better? I will hack on
> > > > that.
> > >
> > > Sorry if I'm not backgrounded.
> > >
> > > But what's the problem of a list? If we always do the two actions
> > > *together*:
> > > 1) update inode->dirtied_when
> > > 2) requeue inode in the correct place
> > > the list will be in order.
> > > linux-2.6.22-rc6-mm1/fs/fs-writeback.c obviously obeys this rule.
> > >
> > > I don't see how can a new data structure make life easier.
> > > 1) and 2) should still be safeguarded, isn't it?
> >
> > Well yes, the existing implementation does its best to work, and almost
> > does work correctly but it was really hard to do and it is hard to maintain.
> >
> > Whereas if we had a better data structure it would be cleaner and easier to
> > implement and to maintain, I expect.
> >
> > With an indexed data structure (ie: radix-tree or rbtree) the writeback
> > code can remember where it was up to in the ordered list of inodes so it
> > can drop locks, do writeback, remember where it was up to for the next
> > pass, etc.
> >
> > Basically, the walk of the per-superblock inodes would follow the same
> > model as the walk of the per-inode pages. And the latter has worked out
> > *really* well. It would be great if the per-sb inode traversal was as
> > flexible and as powerful as the page walks.
> >
> > Probably it never will be, because I suspect we'd need to order the inodes
> > by multiple indices. I hn't thought it through, really.
>
> Just one more possibility... an array of lists?
>
> The array is cyclic and time-addressable, and
> the lists can be ordered by other criterion(s).
Yeah, something like that.
The array would need to be dynamically sizeable and capable of
efficiently supporting large holes. ie: a radix-tree or rbtree ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] fix periodic superblock dirty inode flushing
2007-07-19 8:18 ` Andrew Morton
@ 2007-07-19 22:18 ` David Chinner
0 siblings, 0 replies; 10+ messages in thread
From: David Chinner @ 2007-07-19 22:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Fengguang Wu, Ken Chen, linux-mm
On Thu, Jul 19, 2007 at 01:18:45AM -0700, Andrew Morton wrote:
> On Thu, 19 Jul 2007 16:09:10 +0800 Fengguang Wu <fengguang.wu@gmail.com> wrote:
> > On Wed, Jul 18, 2007 at 08:10:18PM -0700, Andrew Morton wrote:
> > > With an indexed data structure (ie: radix-tree or rbtree) the writeback
> > > code can remember where it was up to in the ordered list of inodes so it
> > > can drop locks, do writeback, remember where it was up to for the next
> > > pass, etc.
> > >
> > > Basically, the walk of the per-superblock inodes would follow the same
> > > model as the walk of the per-inode pages. And the latter has worked out
> > > *really* well. It would be great if the per-sb inode traversal was as
> > > flexible and as powerful as the page walks.
> > >
> > > Probably it never will be, because I suspect we'd need to order the inodes
> > > by multiple indices. I hn't thought it through, really.
> >
> > Just one more possibility... an array of lists?
> >
> > The array is cyclic and time-addressable, and
> > the lists can be ordered by other criterion(s).
>
> Yeah, something like that.
>
> The array would need to be dynamically sizeable and capable of
> efficiently supporting large holes. ie: a radix-tree or rbtree ;)
You mean sorta like fs/xfs/xfs_mru_cache.[ch]?
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-07-19 22:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-12 4:21 [patch] fix periodic superblock dirty inode flushing Ken Chen
2007-07-12 19:05 ` Andrew Morton
2007-07-13 22:17 ` Ken Chen
2007-07-17 0:01 ` Ken Chen
2007-07-17 0:15 ` Andrew Morton
[not found] ` <20070719025927.GA11874@mail.ustc.edu.cn>
2007-07-19 2:59 ` Fengguang Wu
2007-07-19 3:10 ` Andrew Morton
[not found] ` <20070719080910.GA7459@mail.ustc.edu.cn>
2007-07-19 8:09 ` Fengguang Wu
2007-07-19 8:18 ` Andrew Morton
2007-07-19 22:18 ` David Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox