From: Joanne Koong <joannelkoong@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org,
"Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Improving large folio writeback performance
Date: Tue, 21 Jan 2025 16:29:57 -0800 [thread overview]
Message-ID: <CAJnrk1Z21NU0GCjj+GzsudyT1LAKx3TNqHt2oO22u1MZAZ4Lug@mail.gmail.com> (raw)
In-Reply-To: <xuf742w2v2rir6tfumuu5ll2ow3kgzzbhjgvu47vquc3vgrdxf@blrmpfwvre4y>
On Mon, Jan 20, 2025 at 2:42 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 17-01-25 14:45:01, Joanne Koong wrote:
> > On Fri, Jan 17, 2025 at 3:53 AM Jan Kara <jack@suse.cz> wrote:
> > > On Thu 16-01-25 15:38:54, Joanne Koong wrote:
> > > I think tweaking min_pause is a wrong way to do this. I think that is just a
> > > symptom. Can you run something like:
> > >
> > > while true; do
> > > cat /sys/kernel/debug/bdi/<fuse-bdi>/stats
> > > echo "---------"
> > > sleep 1
> > > done >bdi-debug.txt
> > >
> > > while you are writing to the FUSE filesystem and share the output file?
> > > That should tell us a bit more about what's happening inside the writeback
> > > throttling. Also do you somehow configure min/max_ratio for the FUSE bdi?
> > > You can check in /sys/block/<fuse-bdi>/bdi/{min,max}_ratio . I suspect the
> > > problem is that the BDI dirty limit does not ramp up properly when we
> > > increase dirtied pages in large chunks.
> >
> > This is the debug info I see for FUSE large folio writes where bs=1M
> > and size=1G:
> >
> >
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 896 kB
> > DirtyThresh: 359824 kB
> > BackgroundThresh: 179692 kB
> > BdiDirtied: 1071104 kB
> > BdiWritten: 4096 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3596 kB
> > DirtyThresh: 359824 kB
> > BackgroundThresh: 179692 kB
> > BdiDirtied: 1290240 kB
> > BdiWritten: 4992 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3596 kB
> > DirtyThresh: 359824 kB
> > BackgroundThresh: 179692 kB
> > BdiDirtied: 1517568 kB
> > BdiWritten: 5824 kB
> > BdiWriteBandwidth: 25692 kBps
> > b_dirty: 0
> > b_io: 1
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 7
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3596 kB
> > DirtyThresh: 359824 kB
> > BackgroundThresh: 179692 kB
> > BdiDirtied: 1747968 kB
> > BdiWritten: 6720 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 896 kB
> > DirtyThresh: 359824 kB
> > BackgroundThresh: 179692 kB
> > BdiDirtied: 1949696 kB
> > BdiWritten: 7552 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3612 kB
> > DirtyThresh: 361300 kB
> > BackgroundThresh: 180428 kB
> > BdiDirtied: 2097152 kB
> > BdiWritten: 8128 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> >
> >
> > I didn't do anything to configure/change the FUSE bdi min/max_ratio.
> > This is what I see on my system:
> >
> > cat /sys/class/bdi/0:52/min_ratio
> > 0
> > cat /sys/class/bdi/0:52/max_ratio
> > 1
>
> OK, we can see that BdiDirtyThresh stabilized more or less at 3.6MB.
> Checking the code, this shows we are hitting __wb_calc_thresh() logic:
>
> if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
> unsigned long limit = hard_dirty_limit(dom, dtc->thresh);
> u64 wb_scale_thresh = 0;
>
> if (limit > dtc->dirty)
> wb_scale_thresh = (limit - dtc->dirty) / 100;
> wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh /
> }
>
> so BdiDirtyThresh is set to DirtyThresh/100. This also shows bdi never
> generates enough throughput to ramp up it's share from this initial value.
>
> > > Actually, there's a patch queued in mm tree that improves the ramping up of
> > > bdi dirty limit for strictlimit bdis [1]. It would be nice if you could
> > > test whether it changes something in the behavior you observe. Thanks!
> > >
> > > Honza
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche
> > > s/mm-page-writeback-consolidate-wb_thresh-bumping-logic-into-__wb_calc_thresh.pa
> > > tch
> >
> > I still see the same results (~230 MiB/s throughput using fio) with
> > this patch applied, unfortunately. Here's the debug info I see with
> > this patch (same test scenario as above on FUSE large folio writes
> > where bs=1M and size=1G):
> >
> > BdiWriteback: 0 kB
> > BdiReclaimable: 2048 kB
> > BdiDirtyThresh: 3588 kB
> > DirtyThresh: 359132 kB
> > BackgroundThresh: 179348 kB
> > BdiDirtied: 51200 kB
> > BdiWritten: 128 kB
> > BdiWriteBandwidth: 102400 kBps
> > b_dirty: 1
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 5
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3588 kB
> > DirtyThresh: 359144 kB
> > BackgroundThresh: 179352 kB
> > BdiDirtied: 331776 kB
> > BdiWritten: 1216 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3588 kB
> > DirtyThresh: 359144 kB
> > BackgroundThresh: 179352 kB
> > BdiDirtied: 562176 kB
> > BdiWritten: 2176 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> > BdiWriteback: 0 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3588 kB
> > DirtyThresh: 359144 kB
> > BackgroundThresh: 179352 kB
> > BdiDirtied: 792576 kB
> > BdiWritten: 3072 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
> > BdiWriteback: 64 kB
> > BdiReclaimable: 0 kB
> > BdiDirtyThresh: 3588 kB
> > DirtyThresh: 359144 kB
> > BackgroundThresh: 179352 kB
> > BdiDirtied: 1026048 kB
> > BdiWritten: 3904 kB
> > BdiWriteBandwidth: 0 kBps
> > b_dirty: 0
> > b_io: 0
> > b_more_io: 0
> > b_dirty_time: 0
> > bdi_list: 1
> > state: 1
> > ---------
>
> Yeah, here the situation is really the same. As an experiment can you
> experiment with setting min_ratio for the FUSE bdi to 1, 2, 3, ..., 10 (I
> don't expect you should need to go past 10) and figure out when there's
> enough slack space for the writeback bandwidth to ramp up to a full speed?
> Thanks!
>
> Honza
When locally testing this, I'm seeing that the max_ratio affects the
bandwidth more so than min_ratio (eg the different min_ratios have
roughly the same bandwidth per max_ratio). I'm also seeing somewhat
high variance across runs which makes it hard to gauge what's
accurate, but on average this is what I'm seeing:
max_ratio=1 --- bandwidth= ~230 MiB/s
max_ratio=2 --- bandwidth= ~420 MiB/s
max_ratio=3 --- bandwidth= ~550 MiB/s
max_ratio=4 --- bandwidth= ~653 MiB/s
max_ratio=5 --- bandwidth= ~700 MiB/s
max_ratio=6 --- bandwidth= ~810 MiB/s
max_ratio=7 --- bandwidth= ~1040 MiB/s (and then a lot of times, 561
MiB/s on subsequent runs)
Thanks,
Joanne
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
next prev parent reply other threads:[~2025-01-22 0:30 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 0:50 Joanne Koong
2025-01-15 1:21 ` Dave Chinner
2025-01-16 20:14 ` Joanne Koong
2025-01-15 1:50 ` Darrick J. Wong
2025-01-16 11:01 ` [Lsf-pc] " Jan Kara
2025-01-16 23:38 ` Joanne Koong
2025-01-17 11:53 ` Jan Kara
2025-01-17 22:45 ` Joanne Koong
2025-01-20 22:42 ` Jan Kara
2025-01-22 0:29 ` Joanne Koong [this message]
2025-01-22 9:22 ` Jan Kara
2025-01-22 22:17 ` Joanne Koong
2025-01-17 11:40 ` Vlastimil Babka
2025-01-17 11:56 ` [Lsf-pc] " Jan Kara
2025-01-17 14:17 ` Matthew Wilcox
2025-01-22 11:15 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJnrk1Z21NU0GCjj+GzsudyT1LAKx3TNqHt2oO22u1MZAZ4Lug@mail.gmail.com \
--to=joannelkoong@gmail.com \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox