linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joanne Koong <joannelkoong@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	 linux-mm@kvack.org,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Improving large folio writeback performance
Date: Tue, 21 Jan 2025 16:29:57 -0800	[thread overview]
Message-ID: <CAJnrk1Z21NU0GCjj+GzsudyT1LAKx3TNqHt2oO22u1MZAZ4Lug@mail.gmail.com> (raw)
In-Reply-To: <xuf742w2v2rir6tfumuu5ll2ow3kgzzbhjgvu47vquc3vgrdxf@blrmpfwvre4y>

On Mon, Jan 20, 2025 at 2:42 PM Jan Kara <jack@suse.cz> wrote:
>
> On Fri 17-01-25 14:45:01, Joanne Koong wrote:
> > On Fri, Jan 17, 2025 at 3:53 AM Jan Kara <jack@suse.cz> wrote:
> > > On Thu 16-01-25 15:38:54, Joanne Koong wrote:
> > > I think tweaking min_pause is a wrong way to do this. I think that is just a
> > > symptom. Can you run something like:
> > >
> > > while true; do
> > >         cat /sys/kernel/debug/bdi/<fuse-bdi>/stats
> > >         echo "---------"
> > >         sleep 1
> > > done >bdi-debug.txt
> > >
> > > while you are writing to the FUSE filesystem and share the output file?
> > > That should tell us a bit more about what's happening inside the writeback
> > > throttling. Also do you somehow configure min/max_ratio for the FUSE bdi?
> > > You can check in /sys/block/<fuse-bdi>/bdi/{min,max}_ratio . I suspect the
> > > problem is that the BDI dirty limit does not ramp up properly when we
> > > increase dirtied pages in large chunks.
> >
> > This is the debug info I see for FUSE large folio writes where bs=1M
> > and size=1G:
> >
> >
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:            896 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1071104 kB
> > BdiWritten:               4096 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3596 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1290240 kB
> > BdiWritten:               4992 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3596 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1517568 kB
> > BdiWritten:               5824 kB
> > BdiWriteBandwidth:       25692 kBps
> > b_dirty:                     0
> > b_io:                        1
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       7
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3596 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1747968 kB
> > BdiWritten:               6720 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:            896 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1949696 kB
> > BdiWritten:               7552 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3612 kB
> > DirtyThresh:            361300 kB
> > BackgroundThresh:       180428 kB
> > BdiDirtied:            2097152 kB
> > BdiWritten:               8128 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> >
> >
> > I didn't do anything to configure/change the FUSE bdi min/max_ratio.
> > This is what I see on my system:
> >
> > cat /sys/class/bdi/0:52/min_ratio
> > 0
> > cat /sys/class/bdi/0:52/max_ratio
> > 1
>
> OK, we can see that BdiDirtyThresh stabilized more or less at 3.6MB.
> Checking the code, this shows we are hitting __wb_calc_thresh() logic:
>
>         if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
>                 unsigned long limit = hard_dirty_limit(dom, dtc->thresh);
>                 u64 wb_scale_thresh = 0;
>
>                 if (limit > dtc->dirty)
>                         wb_scale_thresh = (limit - dtc->dirty) / 100;
>                 wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh /
>         }
>
> so BdiDirtyThresh is set to DirtyThresh/100. This also shows bdi never
> generates enough throughput to ramp up it's share from this initial value.
>
> > > Actually, there's a patch queued in mm tree that improves the ramping up of
> > > bdi dirty limit for strictlimit bdis [1]. It would be nice if you could
> > > test whether it changes something in the behavior you observe. Thanks!
> > >
> > >                                                                 Honza
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche
> > > s/mm-page-writeback-consolidate-wb_thresh-bumping-logic-into-__wb_calc_thresh.pa
> > > tch
> >
> > I still see the same results (~230 MiB/s throughput using fio) with
> > this patch applied, unfortunately. Here's the debug info I see with
> > this patch (same test scenario as above on FUSE large folio writes
> > where bs=1M and size=1G):
> >
> > BdiWriteback:                0 kB
> > BdiReclaimable:           2048 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359132 kB
> > BackgroundThresh:       179348 kB
> > BdiDirtied:              51200 kB
> > BdiWritten:                128 kB
> > BdiWriteBandwidth:      102400 kBps
> > b_dirty:                     1
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       5
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:             331776 kB
> > BdiWritten:               1216 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:             562176 kB
> > BdiWritten:               2176 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:             792576 kB
> > BdiWritten:               3072 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:               64 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:            1026048 kB
> > BdiWritten:               3904 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
>
> Yeah, here the situation is really the same. As an experiment can you
> experiment with setting min_ratio for the FUSE bdi to 1, 2, 3, ..., 10 (I
> don't expect you should need to go past 10) and figure out when there's
> enough slack space for the writeback bandwidth to ramp up to a full speed?
> Thanks!
>
>                                                                 Honza

When locally testing this, I'm seeing that the max_ratio affects the
bandwidth more so than min_ratio (eg the different min_ratios have
roughly the same bandwidth per max_ratio). I'm also seeing somewhat
high variance across runs which makes it hard to gauge what's
accurate, but on average this is what I'm seeing:

max_ratio=1 --- bandwidth= ~230 MiB/s
max_ratio=2 --- bandwidth= ~420 MiB/s
max_ratio=3 --- bandwidth= ~550 MiB/s
max_ratio=4 --- bandwidth= ~653 MiB/s
max_ratio=5 --- bandwidth= ~700 MiB/s
max_ratio=6 --- bandwidth= ~810 MiB/s
max_ratio=7 --- bandwidth= ~1040 MiB/s (and then a lot of times, 561
MiB/s on subsequent runs)


Thanks,
Joanne

> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR


  reply	other threads:[~2025-01-22  0:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-15  0:50 Joanne Koong
2025-01-15  1:21 ` Dave Chinner
2025-01-16 20:14   ` Joanne Koong
2025-01-15  1:50 ` Darrick J. Wong
2025-01-16 11:01 ` [Lsf-pc] " Jan Kara
2025-01-16 23:38   ` Joanne Koong
2025-01-17 11:53     ` Jan Kara
2025-01-17 22:45       ` Joanne Koong
2025-01-20 22:42         ` Jan Kara
2025-01-22  0:29           ` Joanne Koong [this message]
2025-01-22  9:22             ` Jan Kara
2025-01-22 22:17               ` Joanne Koong
2025-01-17 11:40 ` Vlastimil Babka
2025-01-17 11:56   ` [Lsf-pc] " Jan Kara
2025-01-17 14:17     ` Matthew Wilcox
2025-01-22 11:15       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJnrk1Z21NU0GCjj+GzsudyT1LAKx3TNqHt2oO22u1MZAZ4Lug@mail.gmail.com \
    --to=joannelkoong@gmail.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox