* Re: [ATCH v3 0/4] Improve visibility of writeback
2024-04-22 16:05 [ATCH v3 0/4] Improve visibility of writeback Kemeng Shi
@ 2024-04-22 7:52 ` Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 1/4] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Kemeng Shi @ 2024-04-22 7:52 UTC (permalink / raw)
To: akpm, willy, jack, bfoster, tj
Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel
on 4/23/2024 12:05 AM, Kemeng Shi wrote:
Forget to fix a build warning, please ignore this series. Sorry for the
noise.
> v2->v3:
> -Drop patches to protect non-exist race and to define GDTC_INIT_NO_WB to
> null.
> -Add wb_tryget to wb from which we collect stats to bdi stats.
> -Create wb_stats when CONFIG_CGROUP_WRITEBACK is no enabled.
> -Add a blank line between two wb stats in wb_stats.
>
> v1->v2:
> -Send cleanup to wq_monitor.py separately.
> -Add patch to avoid use after free of bdi.
> -Rename wb_calc_cg_thresh to cgwb_calc_thresh as Tejun suggested.
> -Use rcu walk to avoid use after free.
> -Add debug output to each related patches.
>
> This series tries to improve visilibity of writeback. Patch 1 make
> /sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
> instead of only writeback info in root cgroup. Patch 2 add a new
> debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
> info. Patch 3 add wb_monitor.py to monitor basic writeback info
> of running system, more info could be added on demand. Patch 4
> is a random cleanup. More details can be found in respective
> patches. Thanks!
>
> Following domain hierarchy is tested:
> global domain (320G)
> / \
> cgroup domain1(10G) cgroup domain2(10G)
> | |
> bdi wb1 wb2
>
> /* all writeback info of bdi is successfully collected */
> cat stats
> BdiWriteback: 4704 kB
> BdiReclaimable: 1294496 kB
> BdiDirtyThresh: 204208088 kB
> DirtyThresh: 195259944 kB
> BackgroundThresh: 32503588 kB
> BdiDirtied: 48519296 kB
> BdiWritten: 47225696 kB
> BdiWriteBandwidth: 1173892 kBps
> b_dirty: 1
> b_io: 0
> b_more_io: 1
> b_dirty_time: 0
> bdi_list: 1
> state: 1
>
> /* per wb writeback info of bdi is collected */
> cat /sys/kernel/debug/bdi/252:16/wb_stats
> WbCgIno: 1
> WbWriteback: 0 kB
> WbReclaimable: 0 kB
> WbDirtyThresh: 0 kB
> WbDirtied: 0 kB
> WbWritten: 0 kB
> WbWriteBandwidth: 102400 kBps
> b_dirty: 0
> b_io: 0
> b_more_io: 0
> b_dirty_time: 0
> state: 1
>
> WbCgIno: 4208
> WbWriteback: 59808 kB
> WbReclaimable: 676480 kB
> WbDirtyThresh: 6004624 kB
> WbDirtied: 23348192 kB
> WbWritten: 22614592 kB
> WbWriteBandwidth: 593204 kBps
> b_dirty: 1
> b_io: 1
> b_more_io: 0
> b_dirty_time: 0
> state: 7
>
> WbCgIno: 4249
> WbWriteback: 144256 kB
> WbReclaimable: 432096 kB
> WbDirtyThresh: 6004344 kB
> WbDirtied: 25727744 kB
> WbWritten: 25154752 kB
> WbWriteBandwidth: 577904 kBps
> b_dirty: 0
> b_io: 1
> b_more_io: 0
> b_dirty_time: 0
> state: 7
>
> The wb_monitor.py script output is as following:
> ./wb_monitor.py 252:16 -c
> writeback reclaimable dirtied written avg_bw
> 252:16_1 0 0 0 0 102400
> 252:16_4284 672 820064 9230368 8410304 685612
> 252:16_4325 896 819840 10491264 9671648 652348
> 252:16 1568 1639904 19721632 18081952 1440360
>
> writeback reclaimable dirtied written avg_bw
> 252:16_1 0 0 0 0 102400
> 252:16_4284 672 820064 9230368 8410304 685612
> 252:16_4325 896 819840 10491264 9671648 652348
> 252:16 1568 1639904 19721632 18081952 1440360
> ...
>
> Kemeng Shi (4):
> writeback: collect stats of all wb of bdi in bdi_debug_stats_show
> writeback: support retrieving per group debug writeback stats of bdi
> writeback: add wb_monitor.py script to monitor writeback info on bdi
> writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages
>
> include/linux/writeback.h | 1 +
> mm/backing-dev.c | 174 +++++++++++++++++++++++++++++-----
> mm/page-writeback.c | 27 +++++-
> tools/writeback/wb_monitor.py | 172 +++++++++++++++++++++++++++++++++
> 4 files changed, 345 insertions(+), 29 deletions(-)
> create mode 100644 tools/writeback/wb_monitor.py
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [ATCH v3 0/4] Improve visibility of writeback
@ 2024-04-22 16:05 Kemeng Shi
2024-04-22 7:52 ` Kemeng Shi
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Kemeng Shi @ 2024-04-22 16:05 UTC (permalink / raw)
To: akpm, willy, jack, bfoster, tj
Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel
v2->v3:
-Drop patches to protect non-exist race and to define GDTC_INIT_NO_WB to
null.
-Add wb_tryget to wb from which we collect stats to bdi stats.
-Create wb_stats when CONFIG_CGROUP_WRITEBACK is no enabled.
-Add a blank line between two wb stats in wb_stats.
v1->v2:
-Send cleanup to wq_monitor.py separately.
-Add patch to avoid use after free of bdi.
-Rename wb_calc_cg_thresh to cgwb_calc_thresh as Tejun suggested.
-Use rcu walk to avoid use after free.
-Add debug output to each related patches.
This series tries to improve visilibity of writeback. Patch 1 make
/sys/kernel/debug/bdi/xxx/stats show writeback info of whole bdi
instead of only writeback info in root cgroup. Patch 2 add a new
debug file /sys/kernel/debug/bdi/xxx/wb_stats to show per wb writeback
info. Patch 3 add wb_monitor.py to monitor basic writeback info
of running system, more info could be added on demand. Patch 4
is a random cleanup. More details can be found in respective
patches. Thanks!
Following domain hierarchy is tested:
global domain (320G)
/ \
cgroup domain1(10G) cgroup domain2(10G)
| |
bdi wb1 wb2
/* all writeback info of bdi is successfully collected */
cat stats
BdiWriteback: 4704 kB
BdiReclaimable: 1294496 kB
BdiDirtyThresh: 204208088 kB
DirtyThresh: 195259944 kB
BackgroundThresh: 32503588 kB
BdiDirtied: 48519296 kB
BdiWritten: 47225696 kB
BdiWriteBandwidth: 1173892 kBps
b_dirty: 1
b_io: 0
b_more_io: 1
b_dirty_time: 0
bdi_list: 1
state: 1
/* per wb writeback info of bdi is collected */
cat /sys/kernel/debug/bdi/252:16/wb_stats
WbCgIno: 1
WbWriteback: 0 kB
WbReclaimable: 0 kB
WbDirtyThresh: 0 kB
WbDirtied: 0 kB
WbWritten: 0 kB
WbWriteBandwidth: 102400 kBps
b_dirty: 0
b_io: 0
b_more_io: 0
b_dirty_time: 0
state: 1
WbCgIno: 4208
WbWriteback: 59808 kB
WbReclaimable: 676480 kB
WbDirtyThresh: 6004624 kB
WbDirtied: 23348192 kB
WbWritten: 22614592 kB
WbWriteBandwidth: 593204 kBps
b_dirty: 1
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7
WbCgIno: 4249
WbWriteback: 144256 kB
WbReclaimable: 432096 kB
WbDirtyThresh: 6004344 kB
WbDirtied: 25727744 kB
WbWritten: 25154752 kB
WbWriteBandwidth: 577904 kBps
b_dirty: 0
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7
The wb_monitor.py script output is as following:
./wb_monitor.py 252:16 -c
writeback reclaimable dirtied written avg_bw
252:16_1 0 0 0 0 102400
252:16_4284 672 820064 9230368 8410304 685612
252:16_4325 896 819840 10491264 9671648 652348
252:16 1568 1639904 19721632 18081952 1440360
writeback reclaimable dirtied written avg_bw
252:16_1 0 0 0 0 102400
252:16_4284 672 820064 9230368 8410304 685612
252:16_4325 896 819840 10491264 9671648 652348
252:16 1568 1639904 19721632 18081952 1440360
...
Kemeng Shi (4):
writeback: collect stats of all wb of bdi in bdi_debug_stats_show
writeback: support retrieving per group debug writeback stats of bdi
writeback: add wb_monitor.py script to monitor writeback info on bdi
writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages
include/linux/writeback.h | 1 +
mm/backing-dev.c | 174 +++++++++++++++++++++++++++++-----
mm/page-writeback.c | 27 +++++-
tools/writeback/wb_monitor.py | 172 +++++++++++++++++++++++++++++++++
4 files changed, 345 insertions(+), 29 deletions(-)
create mode 100644 tools/writeback/wb_monitor.py
--
2.30.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [ATCH v3 1/4] writeback: collect stats of all wb of bdi in bdi_debug_stats_show
2024-04-22 16:05 [ATCH v3 0/4] Improve visibility of writeback Kemeng Shi
2024-04-22 7:52 ` Kemeng Shi
@ 2024-04-22 16:05 ` Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 2/4] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Kemeng Shi @ 2024-04-22 16:05 UTC (permalink / raw)
To: akpm, willy, jack, bfoster, tj
Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel
/sys/kernel/debug/bdi/xxx/stats is supposed to show writeback information
of whole bdi, but only writeback information of bdi in root cgroup is
collected. So writeback information in non-root cgroup are missing now.
To be more specific, considering following case:
/* create writeback cgroup */
cd /sys/fs/cgroup
echo "+memory +io" > cgroup.subtree_control
mkdir group1
cd group1
echo $$ > cgroup.procs
/* do writeback in cgroup */
fio -name test -filename=/dev/vdb ...
/* get writeback info of bdi */
cat /sys/kernel/debug/bdi/xxx/stats
The cat result unexpectedly implies that there is no writeback on target
bdi.
Fix this by collecting stats of all wb in bdi instead of only wb in
root cgroup.
Following domain hierarchy is tested:
global domain (320G)
/ \
cgroup domain1(10G) cgroup domain2(10G)
| |
bdi wb1 wb2
/* all writeback info of bdi is successfully collected */
cat stats
BdiWriteback: 4704 kB
BdiReclaimable: 1294496 kB
BdiDirtyThresh: 204208088 kB
DirtyThresh: 195259944 kB
BackgroundThresh: 32503588 kB
BdiDirtied: 48519296 kB
BdiWritten: 47225696 kB
BdiWriteBandwidth: 1173892 kBps
b_dirty: 1
b_io: 0
b_more_io: 1
b_dirty_time: 0
bdi_list: 1
state: 1
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/backing-dev.c | 96 ++++++++++++++++++++++++++++++++++++------------
1 file changed, 73 insertions(+), 23 deletions(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 5fa3666356f9..089146feb830 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -39,6 +39,19 @@ struct workqueue_struct *bdi_wq;
#include <linux/debugfs.h>
#include <linux/seq_file.h>
+struct wb_stats {
+ unsigned long nr_dirty;
+ unsigned long nr_io;
+ unsigned long nr_more_io;
+ unsigned long nr_dirty_time;
+ unsigned long nr_writeback;
+ unsigned long nr_reclaimable;
+ unsigned long nr_dirtied;
+ unsigned long nr_written;
+ unsigned long dirty_thresh;
+ unsigned long wb_thresh;
+};
+
static struct dentry *bdi_debug_root;
static void bdi_debug_init(void)
@@ -46,31 +59,68 @@ static void bdi_debug_init(void)
bdi_debug_root = debugfs_create_dir("bdi", NULL);
}
-static int bdi_debug_stats_show(struct seq_file *m, void *v)
+static void collect_wb_stats(struct wb_stats *stats,
+ struct bdi_writeback *wb)
{
- struct backing_dev_info *bdi = m->private;
- struct bdi_writeback *wb = &bdi->wb;
- unsigned long background_thresh;
- unsigned long dirty_thresh;
- unsigned long wb_thresh;
- unsigned long nr_dirty, nr_io, nr_more_io, nr_dirty_time;
struct inode *inode;
- nr_dirty = nr_io = nr_more_io = nr_dirty_time = 0;
spin_lock(&wb->list_lock);
list_for_each_entry(inode, &wb->b_dirty, i_io_list)
- nr_dirty++;
+ stats->nr_dirty++;
list_for_each_entry(inode, &wb->b_io, i_io_list)
- nr_io++;
+ stats->nr_io++;
list_for_each_entry(inode, &wb->b_more_io, i_io_list)
- nr_more_io++;
+ stats->nr_more_io++;
list_for_each_entry(inode, &wb->b_dirty_time, i_io_list)
if (inode->i_state & I_DIRTY_TIME)
- nr_dirty_time++;
+ stats->nr_dirty_time++;
spin_unlock(&wb->list_lock);
+ stats->nr_writeback += wb_stat(wb, WB_WRITEBACK);
+ stats->nr_reclaimable += wb_stat(wb, WB_RECLAIMABLE);
+ stats->nr_dirtied += wb_stat(wb, WB_DIRTIED);
+ stats->nr_written += wb_stat(wb, WB_WRITTEN);
+ stats->wb_thresh += wb_calc_thresh(wb, stats->dirty_thresh);
+}
+
+#ifdef CONFIG_CGROUP_WRITEBACK
+static void bdi_collect_stats(struct backing_dev_info *bdi,
+ struct wb_stats *stats)
+{
+ struct bdi_writeback *wb;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
+ if (!wb_tryget(wb))
+ continue;
+
+ collect_wb_stats(stats, wb);
+ wb_put(wb);
+ }
+ rcu_read_unlock();
+}
+#else
+static void bdi_collect_stats(struct backing_dev_info *bdi,
+ struct wb_stats *stats)
+{
+ collect_wb_stats(stats, &bdi->wb);
+}
+#endif
+
+static int bdi_debug_stats_show(struct seq_file *m, void *v)
+{
+ struct backing_dev_info *bdi = m->private;
+ unsigned long background_thresh;
+ unsigned long dirty_thresh;
+ struct wb_stats stats;
+ unsigned long tot_bw;
+
global_dirty_limits(&background_thresh, &dirty_thresh);
- wb_thresh = wb_calc_thresh(wb, dirty_thresh);
+
+ memset(&stats, 0, sizeof(stats));
+ stats.dirty_thresh = dirty_thresh;
+ bdi_collect_stats(bdi, &stats);
+ tot_bw = atomic_long_read(&bdi->tot_write_bandwidth);
seq_printf(m,
"BdiWriteback: %10lu kB\n"
@@ -87,18 +137,18 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
"b_dirty_time: %10lu\n"
"bdi_list: %10u\n"
"state: %10lx\n",
- (unsigned long) K(wb_stat(wb, WB_WRITEBACK)),
- (unsigned long) K(wb_stat(wb, WB_RECLAIMABLE)),
- K(wb_thresh),
+ K(stats.nr_writeback),
+ K(stats.nr_reclaimable),
+ K(stats.wb_thresh),
K(dirty_thresh),
K(background_thresh),
- (unsigned long) K(wb_stat(wb, WB_DIRTIED)),
- (unsigned long) K(wb_stat(wb, WB_WRITTEN)),
- (unsigned long) K(wb->write_bandwidth),
- nr_dirty,
- nr_io,
- nr_more_io,
- nr_dirty_time,
+ K(stats.nr_dirtied),
+ K(stats.nr_written),
+ K(tot_bw),
+ stats.nr_dirty,
+ stats.nr_io,
+ stats.nr_more_io,
+ stats.nr_dirty_time,
!list_empty(&bdi->bdi_list), bdi->wb.state);
return 0;
--
2.30.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [ATCH v3 2/4] writeback: support retrieving per group debug writeback stats of bdi
2024-04-22 16:05 [ATCH v3 0/4] Improve visibility of writeback Kemeng Shi
2024-04-22 7:52 ` Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 1/4] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
@ 2024-04-22 16:05 ` Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 3/4] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 4/4] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
4 siblings, 0 replies; 6+ messages in thread
From: Kemeng Shi @ 2024-04-22 16:05 UTC (permalink / raw)
To: akpm, willy, jack, bfoster, tj
Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel
Add /sys/kernel/debug/bdi/xxx/wb_stats to show per group writeback stats
of bdi.
Following domain hierarchy is tested:
global domain (320G)
/ \
cgroup domain1(10G) cgroup domain2(10G)
| |
bdi wb1 wb2
/* per wb writeback info of bdi is collected */
cat /sys/kernel/debug/bdi/252:16/wb_stats
WbCgIno: 1
WbWriteback: 0 kB
WbReclaimable: 0 kB
WbDirtyThresh: 0 kB
WbDirtied: 0 kB
WbWritten: 0 kB
WbWriteBandwidth: 102400 kBps
b_dirty: 0
b_io: 0
b_more_io: 0
b_dirty_time: 0
state: 1
WbCgIno: 4208
WbWriteback: 59808 kB
WbReclaimable: 676480 kB
WbDirtyThresh: 6004624 kB
WbDirtied: 23348192 kB
WbWritten: 22614592 kB
WbWriteBandwidth: 593204 kBps
b_dirty: 1
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7
WbCgIno: 4249
WbWriteback: 144256 kB
WbReclaimable: 432096 kB
WbDirtyThresh: 6004344 kB
WbDirtied: 25727744 kB
WbWritten: 25154752 kB
WbWriteBandwidth: 577904 kBps
b_dirty: 0
b_io: 1
b_more_io: 0
b_dirty_time: 0
state: 7
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
include/linux/writeback.h | 1 +
mm/backing-dev.c | 78 ++++++++++++++++++++++++++++++++++++++-
mm/page-writeback.c | 19 ++++++++++
3 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 9845cb62e40b..112d806ddbe4 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -355,6 +355,7 @@ int dirtytime_interval_handler(struct ctl_table *table, int write,
void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty);
unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh);
+unsigned long cgwb_calc_thresh(struct bdi_writeback *wb);
void wb_update_bandwidth(struct bdi_writeback *wb);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 089146feb830..6ecd11bdce6e 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -155,19 +155,93 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
}
DEFINE_SHOW_ATTRIBUTE(bdi_debug_stats);
+static void wb_stats_show(struct seq_file *m, struct bdi_writeback *wb,
+ struct wb_stats *stats)
+{
+
+ seq_printf(m,
+ "WbCgIno: %10lu\n"
+ "WbWriteback: %10lu kB\n"
+ "WbReclaimable: %10lu kB\n"
+ "WbDirtyThresh: %10lu kB\n"
+ "WbDirtied: %10lu kB\n"
+ "WbWritten: %10lu kB\n"
+ "WbWriteBandwidth: %10lu kBps\n"
+ "b_dirty: %10lu\n"
+ "b_io: %10lu\n"
+ "b_more_io: %10lu\n"
+ "b_dirty_time: %10lu\n"
+ "state: %10lx\n\n",
+ cgroup_ino(wb->memcg_css->cgroup),
+ K(stats->nr_writeback),
+ K(stats->nr_reclaimable),
+ K(stats->wb_thresh),
+ K(stats->nr_dirtied),
+ K(stats->nr_written),
+ K(wb->avg_write_bandwidth),
+ stats->nr_dirty,
+ stats->nr_io,
+ stats->nr_more_io,
+ stats->nr_dirty_time,
+ wb->state);
+}
+
+static int cgwb_debug_stats_show(struct seq_file *m, void *v)
+{
+ struct backing_dev_info *bdi = m->private;
+ unsigned long background_thresh;
+ unsigned long dirty_thresh;
+ struct bdi_writeback *wb;
+ struct wb_stats stats;
+
+ global_dirty_limits(&background_thresh, &dirty_thresh);
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(wb, &bdi->wb_list, bdi_node) {
+ struct wb_stats stats = { .dirty_thresh = dirty_thresh };
+
+ if (!wb_tryget(wb))
+ continue;
+
+ collect_wb_stats(&stats, wb);
+
+ /*
+ * Calculate thresh of wb in writeback cgroup which is min of
+ * thresh in global domain and thresh in cgroup domain. Drop
+ * rcu lock because cgwb_calc_thresh may sleep in
+ * cgroup_rstat_flush. We can do so here because we have a ref.
+ */
+ if (mem_cgroup_wb_domain(wb)) {
+ rcu_read_unlock();
+ stats.wb_thresh = min(stats.wb_thresh, cgwb_calc_thresh(wb));
+ rcu_read_lock();
+ }
+
+ wb_stats_show(m, wb, &stats);
+
+ wb_put(wb);
+ }
+ rcu_read_unlock();
+
+ return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(cgwb_debug_stats);
+
static void bdi_debug_register(struct backing_dev_info *bdi, const char *name)
{
bdi->debug_dir = debugfs_create_dir(name, bdi_debug_root);
debugfs_create_file("stats", 0444, bdi->debug_dir, bdi,
&bdi_debug_stats_fops);
+ debugfs_create_file("wb_stats", 0444, bdi->debug_dir, bdi,
+ &cgwb_debug_stats_fops);
}
static void bdi_debug_unregister(struct backing_dev_info *bdi)
{
debugfs_remove_recursive(bdi->debug_dir);
}
-#else
+#else /* CONFIG_DEBUG_FS */
static inline void bdi_debug_init(void)
{
}
@@ -178,7 +252,7 @@ static inline void bdi_debug_register(struct backing_dev_info *bdi,
static inline void bdi_debug_unregister(struct backing_dev_info *bdi)
{
}
-#endif
+#endif /* CONFIG_DEBUG_FS */
static ssize_t read_ahead_kb_store(struct device *dev,
struct device_attribute *attr,
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3e19b87049db..38f143195bcb 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -892,6 +892,25 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh)
return __wb_calc_thresh(&gdtc);
}
+unsigned long cgwb_calc_thresh(struct bdi_writeback *wb)
+{
+ struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB };
+ struct dirty_throttle_control mdtc = { MDTC_INIT(wb, &gdtc) };
+ unsigned long filepages, headroom, writeback;
+
+ gdtc.avail = global_dirtyable_memory();
+ gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) +
+ global_node_page_state(NR_WRITEBACK);
+
+ mem_cgroup_wb_stats(wb, &filepages, &headroom,
+ &mdtc.dirty, &writeback);
+ mdtc.dirty += writeback;
+ mdtc_calc_avail(&mdtc, filepages, headroom);
+ domain_dirty_limits(&mdtc);
+
+ return __wb_calc_thresh(&mdtc);
+}
+
/*
* setpoint - dirty 3
* f(dirty) := 1.0 + (----------------)
--
2.30.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [ATCH v3 3/4] writeback: add wb_monitor.py script to monitor writeback info on bdi
2024-04-22 16:05 [ATCH v3 0/4] Improve visibility of writeback Kemeng Shi
` (2 preceding siblings ...)
2024-04-22 16:05 ` [ATCH v3 2/4] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
@ 2024-04-22 16:05 ` Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 4/4] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
4 siblings, 0 replies; 6+ messages in thread
From: Kemeng Shi @ 2024-04-22 16:05 UTC (permalink / raw)
To: akpm, willy, jack, bfoster, tj
Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel
Add wb_monitor.py script to monitor writeback information on backing dev
which makes it easier and more convenient to observe writeback behaviors
of running system.
The wb_monitor.py script is written based on wq_monitor.py.
Following domain hierarchy is tested:
global domain (320G)
/ \
cgroup domain1(10G) cgroup domain2(10G)
| |
bdi wb1 wb2
The wb_monitor.py script output is as following:
./wb_monitor.py 252:16 -c
writeback reclaimable dirtied written avg_bw
252:16_1 0 0 0 0 102400
252:16_4284 672 820064 9230368 8410304 685612
252:16_4325 896 819840 10491264 9671648 652348
252:16 1568 1639904 19721632 18081952 1440360
writeback reclaimable dirtied written avg_bw
252:16_1 0 0 0 0 102400
252:16_4284 672 820064 9230368 8410304 685612
252:16_4325 896 819840 10491264 9671648 652348
252:16 1568 1639904 19721632 18081952 1440360
...
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Suggested-by: Tejun Heo <tj@kernel.org>
---
tools/writeback/wb_monitor.py | 172 ++++++++++++++++++++++++++++++++++
1 file changed, 172 insertions(+)
create mode 100644 tools/writeback/wb_monitor.py
diff --git a/tools/writeback/wb_monitor.py b/tools/writeback/wb_monitor.py
new file mode 100644
index 000000000000..5e3591f1f9a9
--- /dev/null
+++ b/tools/writeback/wb_monitor.py
@@ -0,0 +1,172 @@
+#!/usr/bin/env drgn
+#
+# Copyright (C) 2024 Kemeng Shi <shikemeng@huaweicloud.com>
+# Copyright (C) 2024 Huawei Inc
+
+desc = """
+This is a drgn script based on wq_monitor.py to monitor writeback info on
+backing dev. For more info on drgn, visit https://github.com/osandov/drgn.
+
+ writeback(kB) Amount of dirty pages are currently being written back to
+ disk.
+
+ reclaimable(kB) Amount of pages are currently reclaimable.
+
+ dirtied(kB) Amount of pages have been dirtied.
+
+ wrttien(kB) Amount of dirty pages have been written back to disk.
+
+ avg_wb(kBps) Smoothly estimated write bandwidth of writing dirty pages
+ back to disk.
+"""
+
+import signal
+import re
+import time
+import json
+
+import drgn
+from drgn.helpers.linux.list import list_for_each_entry
+
+import argparse
+parser = argparse.ArgumentParser(description=desc,
+ formatter_class=argparse.RawTextHelpFormatter)
+parser.add_argument('bdi', metavar='REGEX', nargs='*',
+ help='Target backing device name patterns (all if empty)')
+parser.add_argument('-i', '--interval', metavar='SECS', type=float, default=1,
+ help='Monitoring interval (0 to print once and exit)')
+parser.add_argument('-j', '--json', action='store_true',
+ help='Output in json')
+parser.add_argument('-c', '--cgroup', action='store_true',
+ help='show writeback of bdi in cgroup')
+args = parser.parse_args()
+
+bdi_list = prog['bdi_list']
+
+WB_RECLAIMABLE = prog['WB_RECLAIMABLE']
+WB_WRITEBACK = prog['WB_WRITEBACK']
+WB_DIRTIED = prog['WB_DIRTIED']
+WB_WRITTEN = prog['WB_WRITTEN']
+NR_WB_STAT_ITEMS = prog['NR_WB_STAT_ITEMS']
+
+PAGE_SHIFT = prog['PAGE_SHIFT']
+
+def K(x):
+ return x << (PAGE_SHIFT - 10)
+
+class Stats:
+ def dict(self, now):
+ return { 'timestamp' : now,
+ 'name' : self.name,
+ 'writeback' : self.stats[WB_WRITEBACK],
+ 'reclaimable' : self.stats[WB_RECLAIMABLE],
+ 'dirtied' : self.stats[WB_DIRTIED],
+ 'written' : self.stats[WB_WRITTEN],
+ 'avg_wb' : self.avg_bw, }
+
+ def table_header_str():
+ return f'{"":>16} {"writeback":>10} {"reclaimable":>12} ' \
+ f'{"dirtied":>9} {"written":>9} {"avg_bw":>9}'
+
+ def table_row_str(self):
+ out = f'{self.name[-16:]:16} ' \
+ f'{self.stats[WB_WRITEBACK]:10} ' \
+ f'{self.stats[WB_RECLAIMABLE]:12} ' \
+ f'{self.stats[WB_DIRTIED]:9} ' \
+ f'{self.stats[WB_WRITTEN]:9} ' \
+ f'{self.avg_bw:9} '
+ return out
+
+ def show_header():
+ if Stats.table_fmt:
+ print()
+ print(Stats.table_header_str())
+
+ def show_stats(self):
+ if Stats.table_fmt:
+ print(self.table_row_str())
+ else:
+ print(self.dict(Stats.now))
+
+class WbStats(Stats):
+ def __init__(self, wb):
+ bdi_name = wb.bdi.dev_name.string_().decode()
+ # avoid to use bdi.wb.memcg_css which is only defined when
+ # CONFIG_CGROUP_WRITEBACK is enabled
+ if wb == wb.bdi.wb.address_of_():
+ ino = "1"
+ else:
+ ino = str(wb.memcg_css.cgroup.kn.id.value_())
+ self.name = bdi_name + '_' + ino
+
+ self.stats = [0] * NR_WB_STAT_ITEMS
+ for i in range(NR_WB_STAT_ITEMS):
+ if wb.stat[i].count >= 0:
+ self.stats[i] = int(K(wb.stat[i].count))
+ else:
+ self.stats[i] = 0
+
+ self.avg_bw = int(K(wb.avg_write_bandwidth))
+
+class BdiStats(Stats):
+ def __init__(self, bdi):
+ self.name = bdi.dev_name.string_().decode()
+ self.stats = [0] * NR_WB_STAT_ITEMS
+ self.avg_bw = 0
+
+ def collectStats(self, wb_stats):
+ for i in range(NR_WB_STAT_ITEMS):
+ self.stats[i] += wb_stats.stats[i]
+
+ self.avg_bw += wb_stats.avg_bw
+
+exit_req = False
+
+def sigint_handler(signr, frame):
+ global exit_req
+ exit_req = True
+
+def main():
+ # handle args
+ Stats.table_fmt = not args.json
+ interval = args.interval
+ cgroup = args.cgroup
+
+ re_str = None
+ if args.bdi:
+ for r in args.bdi:
+ if re_str is None:
+ re_str = r
+ else:
+ re_str += '|' + r
+
+ filter_re = re.compile(re_str) if re_str else None
+
+ # monitoring loop
+ signal.signal(signal.SIGINT, sigint_handler)
+
+ while not exit_req:
+ Stats.now = time.time()
+
+ Stats.show_header()
+ for bdi in list_for_each_entry('struct backing_dev_info', bdi_list.address_of_(), 'bdi_list'):
+ bdi_stats = BdiStats(bdi)
+ if filter_re and not filter_re.search(bdi_stats.name):
+ continue
+
+ for wb in list_for_each_entry('struct bdi_writeback', bdi.wb_list.address_of_(), 'bdi_node'):
+ wb_stats = WbStats(wb)
+ bdi_stats.collectStats(wb_stats)
+ if cgroup:
+ wb_stats.show_stats()
+
+ bdi_stats.show_stats()
+ if cgroup and Stats.table_fmt:
+ print()
+
+ if interval == 0:
+ break
+ time.sleep(interval)
+
+if __name__ == "__main__":
+ main()
--
2.30.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [ATCH v3 4/4] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages
2024-04-22 16:05 [ATCH v3 0/4] Improve visibility of writeback Kemeng Shi
` (3 preceding siblings ...)
2024-04-22 16:05 ` [ATCH v3 3/4] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
@ 2024-04-22 16:05 ` Kemeng Shi
4 siblings, 0 replies; 6+ messages in thread
From: Kemeng Shi @ 2024-04-22 16:05 UTC (permalink / raw)
To: akpm, willy, jack, bfoster, tj
Cc: dsterba, mjguzik, dhowells, linux-kernel, linux-mm, linux-fsdevel
Commit 8d92890bd6b85 ("mm/writeback: discard NR_UNSTABLE_NFS, use
NR_WRITEBACK instead") removed NR_UNSTABLE_NFS and nr_reclaimable
only contains dirty page now.
Rename nr_reclaimable to nr_dirty properly.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
mm/page-writeback.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 38f143195bcb..f2c0237be602 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1694,7 +1694,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
struct dirty_throttle_control * const mdtc = mdtc_valid(&mdtc_stor) ?
&mdtc_stor : NULL;
struct dirty_throttle_control *sdtc;
- unsigned long nr_reclaimable; /* = file_dirty */
+ unsigned long nr_dirty;
long period;
long pause;
long max_pause;
@@ -1715,9 +1715,9 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
unsigned long m_thresh = 0;
unsigned long m_bg_thresh = 0;
- nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
+ nr_dirty = global_node_page_state(NR_FILE_DIRTY);
gdtc->avail = global_dirtyable_memory();
- gdtc->dirty = nr_reclaimable + global_node_page_state(NR_WRITEBACK);
+ gdtc->dirty = nr_dirty + global_node_page_state(NR_WRITEBACK);
domain_dirty_limits(gdtc);
@@ -1768,7 +1768,7 @@ static int balance_dirty_pages(struct bdi_writeback *wb,
* In normal mode, we start background writeout at the lower
* background_thresh, to keep the amount of dirty memory low.
*/
- if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh &&
+ if (!laptop_mode && nr_dirty > gdtc->bg_thresh &&
!writeback_in_progress(wb))
wb_start_background_writeback(wb);
--
2.30.0
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-04-22 7:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-22 16:05 [ATCH v3 0/4] Improve visibility of writeback Kemeng Shi
2024-04-22 7:52 ` Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 1/4] writeback: collect stats of all wb of bdi in bdi_debug_stats_show Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 2/4] writeback: support retrieving per group debug writeback stats of bdi Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 3/4] writeback: add wb_monitor.py script to monitor writeback info on bdi Kemeng Shi
2024-04-22 16:05 ` [ATCH v3 4/4] writeback: rename nr_reclaimable to nr_dirty in balance_dirty_pages Kemeng Shi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox