linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Jim Zhao <jimzhao.ai@gmail.com>
Cc: jack@suse.cz, akpm@linux-foundation.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, willy@infradead.org
Subject: Re: [PATCH] mm/page-writeback: Raise wb_thresh to prevent write blocking with strictlimit
Date: Fri, 8 Nov 2024 23:02:15 +0100	[thread overview]
Message-ID: <20241108220215.s27rziym6mn5nzv4@quack3> (raw)
In-Reply-To: <20241108031949.2984319-1-jimzhao.ai@gmail.com>

On Fri 08-11-24 11:19:49, Jim Zhao wrote:
> > On Wed 23-10-24 18:00:32, Jim Zhao wrote:
> > > With the strictlimit flag, wb_thresh acts as a hard limit in
> > > balance_dirty_pages() and wb_position_ratio(). When device write
> > > operations are inactive, wb_thresh can drop to 0, causing writes to
> > > be blocked. The issue occasionally occurs in fuse fs, particularly
> > > with network backends, the write thread is blocked frequently during
> > > a period. To address it, this patch raises the minimum wb_thresh to a
> > > controllable level, similar to the non-strictlimit case.
> > > 
> > > Signed-off-by: Jim Zhao <jimzhao.ai@gmail.com>
> > 
> > ...
> > 
> > > +	/*
> > > +	 * With strictlimit flag, the wb_thresh is treated as
> > > +	 * a hard limit in balance_dirty_pages() and wb_position_ratio().
> > > +	 * It's possible that wb_thresh is close to zero, not because
> > > +	 * the device is slow, but because it has been inactive.
> > > +	 * To prevent occasional writes from being blocked, we raise wb_thresh.
> > > +	 */
> > > +	if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
> > > +		unsigned long limit = hard_dirty_limit(dom, dtc->thresh);
> > > +		u64 wb_scale_thresh = 0;
> > > +
> > > +		if (limit > dtc->dirty)
> > > +			wb_scale_thresh = (limit - dtc->dirty) / 100;
> > > +		wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / 4));
> > > +	}
> > 
> > What you propose makes sense in principle although I'd say this is mostly a
> > userspace setup issue - with strictlimit enabled, you're kind of expected
> > to set min_ratio exactly if you want to avoid these startup issues. But I
> > tend to agree that we can provide a bit of a slack for a bdi without
> > min_ratio configured to ramp up.
> > 
> > But I'd rather pick the logic like:
> > 
> > 	/*
> > 	 * If bdi does not have min_ratio configured and it was inactive,
> > 	 * bump its min_ratio to 0.1% to provide it some room to ramp up.
> > 	 */
> > 	if (!wb_min_ratio && !numerator)
> > 		wb_min_ratio = min(BDI_RATIO_SCALE / 10, wb_max_ratio / 2);
> > 
> > That would seem like a bit more systematic way than the formula you propose
> > above...
> 
> Thanks for the advice.
> Here's the explanation of the formula:
> 1. when writes are small and intermittent,wb_thresh can approach 0, not
> just 0, making the numerator value difficult to verify.

I see, ok.

> 2. The ramp-up margin, whether 0.1% or another value, needs
> consideration.
> I based this on the logic of wb_position_ratio in the non-strictlimit
> scenario: wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8); It seems
> provides more room and ensures ramping up within a controllable range.

I see, thanks for explanation. So I was thinking how to make the code more
consistent instead of adding another special constant and workaround. What
I'd suggest is:

1) There's already code that's supposed to handle ramping up with
strictlimit in wb_update_dirty_ratelimit():

        /*
         * For strictlimit case, calculations above were based on wb counters
         * and limits (starting from pos_ratio = wb_position_ratio() and up to
         * balanced_dirty_ratelimit = task_ratelimit * write_bw / dirty_rate).
         * Hence, to calculate "step" properly, we have to use wb_dirty as
         * "dirty" and wb_setpoint as "setpoint".
         *
         * We rampup dirty_ratelimit forcibly if wb_dirty is low because
         * it's possible that wb_thresh is close to zero due to inactivity
         * of backing device.
         */
        if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
                dirty = dtc->wb_dirty;
                if (dtc->wb_dirty < 8)
                        setpoint = dtc->wb_dirty + 1;
                else
                        setpoint = (dtc->wb_thresh + dtc->wb_bg_thresh) / 2;
        }

Now I agree that increasing wb_thresh directly is more understandable and
transparent so I'd just drop this special case.

2) I'd just handle all the bumping of wb_thresh in a single place instead
of having is spread over multiple places. So __wb_calc_thresh() could have
a code like:

        wb_thresh = (thresh * (100 * BDI_RATIO_SCALE - bdi_min_ratio)) / (100 * BDI_RATIO_SCALE)
        wb_thresh *= numerator;
        wb_thresh = div64_ul(wb_thresh, denominator);

        wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio);

        wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE);
	limit = hard_dirty_limit(dtc_dom(dtc), dtc->thresh);
        /*
         * It's very possible that wb_thresh is close to 0 not because the
         * device is slow, but that it has remained inactive for long time.
         * Honour such devices a reasonable good (hopefully IO efficient)
         * threshold, so that the occasional writes won't be blocked and active
         * writes can rampup the threshold quickly.
         */
	if (limit > dtc->dirty)
		wb_thresh = max(wb_thresh, (limit - dtc->dirty) / 8);
	if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE))
		wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE);

and we can drop the bumping from wb_position)_ratio(). This way have the
wb_thresh bumping in a single logical place. Since we still limit wb_tresh
with max_ratio, untrusted bdis for which max_ratio should be configured
(otherwise they can grow amount of dirty pages upto global treshold anyway)
are still under control.

If we really wanted, we could introduce a different bumping in case of
strictlimit, but at this point I don't think it is warranted so I'd leave
that as an option if someone comes with a situation where this bumping
proves to be too aggressive.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


  reply	other threads:[~2024-11-09 18:53 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-23 10:00 Jim Zhao
2024-10-23 23:24 ` Andrew Morton
2024-10-24  6:09   ` Jim Zhao
2024-10-24  6:20     ` Andrew Morton
2024-10-24  6:52       ` jim zhao
2024-10-24  7:29       ` Jim Zhao
2024-10-26  0:02         ` Andrew Morton
2024-11-01  7:17           ` Jim Zhao
2024-11-07 15:32 ` Jan Kara
2024-11-08  3:19   ` Jim Zhao
2024-11-08 22:02     ` Jan Kara [this message]
2024-11-12  8:45       ` Jim Zhao
2024-11-13 10:07         ` Jan Kara
2024-11-19 11:44           ` [PATCH v2] mm/page-writeback: raise " Jim Zhao
2024-11-19 12:29             ` [PATCH v2] mm/page-writeback: Raise " Jim Zhao
2024-11-20  8:03               ` Kemeng Shi
2024-11-21  8:05                 ` Jim Zhao
2024-12-12 12:32                   ` Kemeng Shi
2024-11-20 11:57               ` Jan Kara
2024-11-21 10:20                 ` Jim Zhao
2024-11-21 11:49             ` [PATCH v2] mm/page-writeback: raise " Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241108220215.s27rziym6mn5nzv4@quack3 \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=jimzhao.ai@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox