From: Jim Zhao <jimzhao.ai@gmail.com>
To: jack@suse.cz, shikemeng@huaweicloud.com
Cc: akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
willy@infradead.org, jimzhao.ai@gmail.com
Subject: Re: [PATCH v2] mm/page-writeback: Raise wb_thresh to prevent write blocking with strictlimit
Date: Tue, 19 Nov 2024 20:29:22 +0800 [thread overview]
Message-ID: <20241119122922.3939538-1-jimzhao.ai@gmail.com> (raw)
In-Reply-To: <20241119114444.3925495-1-jimzhao.ai@gmail.com>
Thanks, Jan, I just sent patch v2, could you please review it ?
And I found the debug info in the bdi stats.
The BdiDirtyThresh value may be greater than DirtyThresh, and after applying this patch, the value of BdiDirtyThresh could become even larger.
without patch:
---
root@ubuntu:/sys/kernel/debug/bdi/8:0# cat stats
BdiWriteback: 0 kB
BdiReclaimable: 96 kB
BdiDirtyThresh: 1346824 kB
DirtyThresh: 673412 kB
BackgroundThresh: 336292 kB
BdiDirtied: 19872 kB
BdiWritten: 19776 kB
BdiWriteBandwidth: 0 kBps
b_dirty: 0
b_io: 0
b_more_io: 0
b_dirty_time: 0
bdi_list: 1
state: 1
with patch:
---
root@ubuntu:/sys/kernel/debug/bdi/8:0# cat stats
BdiWriteback: 96 kB
BdiReclaimable: 192 kB
BdiDirtyThresh: 3090736 kB
DirtyThresh: 650716 kB
BackgroundThresh: 324960 kB
BdiDirtied: 472512 kB
BdiWritten: 470592 kB
BdiWriteBandwidth: 106268 kBps
b_dirty: 2
b_io: 0
b_more_io: 0
b_dirty_time: 0
bdi_list: 1
state: 1
@kemeng, is this a normal behavior or an issue ?
Thanks,
Jim Zhao
> With the strictlimit flag, wb_thresh acts as a hard limit in
> balance_dirty_pages() and wb_position_ratio(). When device write
> operations are inactive, wb_thresh can drop to 0, causing writes to be
> blocked. The issue occasionally occurs in fuse fs, particularly with
> network backends, the write thread is blocked frequently during a period.
> To address it, this patch raises the minimum wb_thresh to a controllable
> level, similar to the non-strictlimit case.
>
> Signed-off-by: Jim Zhao <jimzhao.ai@gmail.com>
> ---
> Changes in v2:
> 1. Consolidate all wb_thresh bumping logic in __wb_calc_thresh for consistency;
> 2. Replace the limit variable with thresh for calculating the bump value,
> as __wb_calc_thresh is also used to calculate the background threshold;
> 3. Add domain_dirty_avail in wb_calc_thresh to get dtc->dirty.
> ---
> mm/page-writeback.c | 48 ++++++++++++++++++++++-----------------------
> 1 file changed, 23 insertions(+), 25 deletions(-)
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index e5a9eb795f99..8b13bcb42de3 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -917,7 +917,9 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc,
> unsigned long thresh)
> {
> struct wb_domain *dom = dtc_dom(dtc);
> + struct bdi_writeback *wb = dtc->wb;
> u64 wb_thresh;
> + u64 wb_max_thresh;
> unsigned long numerator, denominator;
> unsigned long wb_min_ratio, wb_max_ratio;
>
> @@ -931,11 +933,27 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc,
> wb_thresh *= numerator;
> wb_thresh = div64_ul(wb_thresh, denominator);
>
> - wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio);
> + wb_min_max_ratio(wb, &wb_min_ratio, &wb_max_ratio);
>
> wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE);
> - if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE))
> - wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE);
> +
> + /*
> + * It's very possible that wb_thresh is close to 0 not because the
> + * device is slow, but that it has remained inactive for long time.
> + * Honour such devices a reasonable good (hopefully IO efficient)
> + * threshold, so that the occasional writes won't be blocked and active
> + * writes can rampup the threshold quickly.
> + */
> + if (thresh > dtc->dirty) {
> + if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT))
> + wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 100);
> + else
> + wb_thresh = max(wb_thresh, (thresh - dtc->dirty) / 8);
> + }
> +
> + wb_max_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE);
> + if (wb_thresh > wb_max_thresh)
> + wb_thresh = wb_max_thresh;
>
> return wb_thresh;
> }
> @@ -944,6 +962,7 @@ unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh)
> {
> struct dirty_throttle_control gdtc = { GDTC_INIT(wb) };
>
> + domain_dirty_avail(&gdtc, true);
> return __wb_calc_thresh(&gdtc, thresh);
> }
>
> @@ -1120,12 +1139,6 @@ static void wb_position_ratio(struct dirty_throttle_control *dtc)
> if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
> long long wb_pos_ratio;
>
> - if (dtc->wb_dirty < 8) {
> - dtc->pos_ratio = min_t(long long, pos_ratio * 2,
> - 2 << RATELIMIT_CALC_SHIFT);
> - return;
> - }
> -
> if (dtc->wb_dirty >= wb_thresh)
> return;
>
> @@ -1196,14 +1209,6 @@ static void wb_position_ratio(struct dirty_throttle_control *dtc)
> */
> if (unlikely(wb_thresh > dtc->thresh))
> wb_thresh = dtc->thresh;
> - /*
> - * It's very possible that wb_thresh is close to 0 not because the
> - * device is slow, but that it has remained inactive for long time.
> - * Honour such devices a reasonable good (hopefully IO efficient)
> - * threshold, so that the occasional writes won't be blocked and active
> - * writes can rampup the threshold quickly.
> - */
> - wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8);
> /*
> * scale global setpoint to wb's:
> * wb_setpoint = setpoint * wb_thresh / thresh
> @@ -1459,17 +1464,10 @@ static void wb_update_dirty_ratelimit(struct dirty_throttle_control *dtc,
> * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate).
> * Hence, to calculate "step" properly, we have to use wb_dirty as
> * "dirty" and wb_setpoint as "setpoint".
> - *
> - * We rampup dirty_ratelimit forcibly if wb_dirty is low because
> - * it's possible that wb_thresh is close to zero due to inactivity
> - * of backing device.
> */
> if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
> dirty = dtc->wb_dirty;
> - if (dtc->wb_dirty < 8)
> - setpoint = dtc->wb_dirty + 1;
> - else
> - setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2;
> + setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2;
> }
>
> if (dirty < setpoint) {
> --
> 2.20.1
next prev parent reply other threads:[~2024-12-05 15:33 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 10:00 [PATCH] " Jim Zhao
2024-10-23 23:24 ` Andrew Morton
2024-10-24 6:09 ` Jim Zhao
2024-10-24 6:20 ` Andrew Morton
2024-10-24 6:52 ` jim zhao
2024-10-24 7:29 ` Jim Zhao
2024-10-26 0:02 ` Andrew Morton
2024-11-01 7:17 ` Jim Zhao
2024-11-07 15:32 ` Jan Kara
2024-11-08 3:19 ` Jim Zhao
2024-11-08 22:02 ` Jan Kara
2024-11-12 8:45 ` Jim Zhao
2024-11-13 10:07 ` Jan Kara
2024-11-19 11:44 ` [PATCH v2] mm/page-writeback: raise " Jim Zhao
2024-11-19 12:29 ` Jim Zhao [this message]
2024-11-20 8:03 ` [PATCH v2] mm/page-writeback: Raise " Kemeng Shi
2024-11-21 8:05 ` Jim Zhao
2024-12-12 12:32 ` Kemeng Shi
2024-11-20 11:57 ` Jan Kara
2024-11-21 10:20 ` Jim Zhao
2024-11-21 11:49 ` [PATCH v2] mm/page-writeback: raise " Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241119122922.3939538-1-jimzhao.ai@gmail.com \
--to=jimzhao.ai@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=shikemeng@huaweicloud.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox