From: Hillf Danton <hdanton@sina.com>
To: linux-mm <linux-mm@kvack.org>
Cc: fsdev <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Fengguang Wu <fengguang.wu@intel.com>, Tejun Heo <tj@kernel.org>,
Jan Kara <jack@suse.com>, Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeelb@google.com>,
Minchan Kim <minchan@kernel.org>, Mel Gorman <mgorman@suse.de>,
Hillf Danton <hdanton@sina.com>
Subject: [RFC v2] writeback: add elastic bdi in cgwb bdp
Date: Sat, 26 Oct 2019 18:46:56 +0800 [thread overview]
Message-ID: <20191026104656.15176-1-hdanton@sina.com> (raw)
The elastic bdi is the mirror bdi of spinning disks, SSD, USB and
other storage devices/instruments on market. The performance of
ebdi goes up and down as the pattern of IO dispatched changes, as
approximately estimated as below.
P = j(..., IO pattern);
In ebdi's view, the bandwidth currently measured in balancing dirty
pages has close relation to its performance because the former is a
part of the latter.
B = y(P);
The functions above suggest there may be a layer violation if it
could be better measured somewhere below fs.
It is measured however to the extent that makes every judge happy,
and is playing a role in dispatching IO with the IO pattern entirely
ignored that is volatile in nature.
And it helps to throttle the dirty speed, with the figure ignored
that DRAM in general is x10 faster than ebdi. If B is half of P for
instance, then it is near 5% of dirty speed, just 2 points from the
figure in the snippet below.
/*
* If ratelimit_pages is too high then we can get into dirty-data overload
* if a large number of processes all perform writes at the same time.
* If it is too low then SMP machines will call the (expensive)
* get_writeback_state too often.
*
* Here we set ratelimit_pages to a level which ensures that when all CPUs are
* dirtying in parallel, we cannot go more than 3% (1/32) over the dirty memory
* thresholds.
*/
To prevent dirty speed from running away from laundry speed, ebdi
suggests the walk-dog method to put in bdp as a leash seems to
churn less in IO pattern.
V2 is based on next-20191025.
Changes since v1
- drop CGWB_BDP_WITH_EBDI
Changes since v0
- add CGWB_BDP_WITH_EBDI in mm/Kconfig
- drop wakeup in wbc_detach_inode()
- add wakeup in wb_workfn()
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hillf Danton <hdanton@sina.com>
---
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -170,6 +170,8 @@ struct bdi_writeback {
struct list_head bdi_node; /* anchored at bdi->wb_list */
+ struct wait_queue_head bdp_waitq;
+
#ifdef CONFIG_CGROUP_WRITEBACK
struct percpu_ref refcnt; /* used only for !root wb's */
struct fprop_local_percpu memcg_completions;
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -324,6 +324,8 @@ static int wb_init(struct bdi_writeback
goto out_destroy_stat;
}
+ init_waitqueue_head(&wb->bdp_waitq);
+
return 0;
out_destroy_stat:
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1551,6 +1551,39 @@ static inline void wb_dirty_limits(struc
}
}
+static bool cgwb_bdp_should_throttle(struct bdi_writeback *wb)
+{
+ struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB };
+
+ if (fatal_signal_pending(current))
+ return false;
+
+ gdtc.avail = global_dirtyable_memory();
+
+ domain_dirty_limits(&gdtc);
+
+ gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) +
+ global_node_page_state(NR_UNSTABLE_NFS) +
+ global_node_page_state(NR_WRITEBACK);
+
+ if (gdtc.dirty < gdtc.bg_thresh)
+ return false;
+
+ if (!writeback_in_progress(wb))
+ wb_start_background_writeback(wb);
+
+ return gdtc.dirty > gdtc.thresh &&
+ wb_stat(wb, WB_DIRTIED) >
+ wb_stat(wb, WB_WRITTEN) +
+ wb_stat_error();
+}
+
+static inline void cgwb_bdp(struct bdi_writeback *wb)
+{
+ wait_event_interruptible_timeout(wb->bdp_waitq,
+ !cgwb_bdp_should_throttle(wb), HZ);
+}
+
/*
* balance_dirty_pages() must be called by processes which are generating dirty
* data. It looks at the number of dirty pages in the machine and will force
@@ -1910,7 +1943,7 @@ void balance_dirty_pages_ratelimited(str
preempt_enable();
if (unlikely(current->nr_dirtied >= ratelimit))
- balance_dirty_pages(wb, current->nr_dirtied);
+ cgwb_bdp(wb);
wb_put(wb);
}
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -811,6 +811,8 @@ static long wb_split_bdi_pages(struct bd
if (nr_pages == LONG_MAX)
return LONG_MAX;
+ return nr_pages;
+
/*
* This may be called on clean wb's and proportional distribution
* may not make sense, just use the original @nr_pages in those
@@ -1604,6 +1606,7 @@ static long writeback_chunk_size(struct
pages = min(pages, work->nr_pages);
pages = round_down(pages + MIN_WRITEBACK_PAGES,
MIN_WRITEBACK_PAGES);
+ pages = work->nr_pages;
}
return pages;
@@ -2092,6 +2095,9 @@ void wb_workfn(struct work_struct *work)
wb_wakeup_delayed(wb);
current->flags &= ~PF_SWAPWRITE;
+
+ if (waitqueue_active(&wb->bdp_waitq))
+ wake_up_all(&wb->bdp_waitq);
}
/*
--
next reply other threads:[~2019-10-26 10:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-26 10:46 Hillf Danton [this message]
2019-11-07 8:38 ` [writeback] 2b871886bb: fio.write_bw_MBps 40.6% improvement kernel test robot
2019-11-08 21:00 ` [RFC v2] writeback: add elastic bdi in cgwb bdp Andrew Morton
2019-11-09 10:31 ` Hillf Danton
2019-11-14 12:17 ` Jan Kara
2019-11-15 3:32 ` Hillf Danton
2019-11-15 9:53 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191026104656.15176-1-hdanton@sina.com \
--to=hdanton@sina.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=hannes@cmpxchg.org \
--cc=jack@suse.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=shakeelb@google.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox