linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Chris Mason <clm@meta.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	 lsf-pc@lists.linux-foundation.org,
	linux-fsdevel@vger.kernel.org,  linux-mm <linux-mm@kvack.org>,
	Daniel Gomez <da.gomez@samsung.com>,
	 Pankaj Raghav <p.raghav@samsung.com>,
	Jens Axboe <axboe@kernel.dk>,  Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@lst.de>, Chris Mason <clm@fb.com>,
	 Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO
Date: Sat, 24 Feb 2024 15:40:48 -0800	[thread overview]
Message-ID: <CAHk-=wgeaMU_zY95QM+KUO1RmiuykbKgKVyBi9G1pH_kPgO9kQ@mail.gmail.com> (raw)
In-Reply-To: <bb2e87d7-a706-4dc8-9c09-9257b69ebd5c@meta.com>

On Sat, 24 Feb 2024 at 14:58, Chris Mason <clm@meta.com> wrote:
>
> For teams that really more control over dirty pages with existing APIs,
> I've suggested using sync_file_range periodically.  It seems to work
> pretty well, and they can adjust the sizes and frequency as needed.

Yes. I've written code like that myself.

That said, that is also fairly close to what the write-behind patches
I pointed at did.

One issue (and maybe that was what killed that write-behind patch) is
that there are *other* benchmarks that are actually slightly more
realistic that do things like "untar a tar-file, do something with it,
and them 'rm -rf' it all again".

And *those* benchmarks behave best when the IO is never ever actually
done at all. And unlike the "write a terabyte with random IO", those
benchmarks actually approximate a few somewhat real loads (I'm not
claiming they are good, but the "create files, do something, then
remove them" pattern at least _exists_ in real life).

For things like block device write for a 'mkfs' run, the whole "this
file may be deleted soon, so let's not even start the write in the
first place" behavior doesn't exist, of course. Starting writeback
much more aggressively for those is probably not a bad idea.

> From time to time, our random crud that maintains the system will need a
> lot of memory and kswapd will saturate a core, but this tends to resolve
> itself after 10-20 seconds.  Our ultra sensitive workloads would
> complain, but they manage the page cache more explicitly to avoid these
> situations.

You can see these things with slow USB devices with much more obvious
results. Including long spikes of total inactivity if some system
piece ends up doing a "sync" for some reason. It happens. It's very
annoying.

My gut feel is that it happens a lot less these days than it used to,
but I suspect that's at least partly because I don't see the slow USB
devices very much any more.

> Ignoring widly slow devices, the dirty limits seem to work well enough
> on both big and small systems that I haven't needed to investigate
> issues there as often.

One particular problem point used to be backing devices with wildly
different IO throughput, because I think the speed heuristics don't
necessarily always work all that well at least initially.

And things like that may partly explain your "filesystems work better
than block devices".  It doesn't necessarily have to be about
filesystems vs block devices per se, and be instead about things like
"on a filesystem, the bdi throughput numbers have had time to
stabilize".

In contrast, a benchmark that uses soem other random device that
doesn't look like a regular disk (whether it's really slow like a bad
USB device, or really fast like pmem), you might see more issues. And
I wouldn't be in the least surprised if that is part of the situation
Luis sees.

              Linus


  reply	other threads:[~2024-02-24 23:41 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-23 23:59 Luis Chamberlain
2024-02-24  4:12 ` Matthew Wilcox
2024-02-24 17:31   ` Linus Torvalds
2024-02-24 18:13     ` Matthew Wilcox
2024-02-24 18:24       ` Linus Torvalds
2024-02-24 18:20     ` Linus Torvalds
2024-02-24 19:11       ` Linus Torvalds
2024-02-24 21:42         ` Theodore Ts'o
2024-02-24 22:57         ` Chris Mason
2024-02-24 23:40           ` Linus Torvalds [this message]
2024-05-10 23:57           ` Luis Chamberlain
2024-02-25  5:18     ` Kent Overstreet
2024-02-25  6:04       ` Kent Overstreet
2024-02-25 13:10       ` Matthew Wilcox
2024-02-25 17:03         ` Linus Torvalds
2024-02-25 21:14           ` Matthew Wilcox
2024-02-25 23:45             ` Linus Torvalds
2024-02-26  1:02               ` Kent Overstreet
2024-02-26  1:32                 ` Linus Torvalds
2024-02-26  1:58                   ` Kent Overstreet
2024-02-26  2:06                     ` Kent Overstreet
2024-02-26  2:34                     ` Linus Torvalds
2024-02-26  2:50                   ` Al Viro
2024-02-26 17:17                     ` Linus Torvalds
2024-02-26 21:07                       ` Matthew Wilcox
2024-02-26 21:17                         ` Kent Overstreet
2024-02-26 21:19                           ` Kent Overstreet
2024-02-26 21:55                             ` Paul E. McKenney
2024-02-26 23:29                               ` Kent Overstreet
2024-02-27  0:05                                 ` Paul E. McKenney
2024-02-27  0:29                                   ` Kent Overstreet
2024-02-27  0:55                                     ` Paul E. McKenney
2024-02-27  1:08                                       ` Kent Overstreet
2024-02-27  5:17                                         ` Paul E. McKenney
2024-02-27  6:21                                           ` Kent Overstreet
2024-02-27 15:32                                             ` Paul E. McKenney
2024-02-27 15:52                                               ` Kent Overstreet
2024-02-27 16:06                                                 ` Paul E. McKenney
2024-02-27 15:54                                               ` Matthew Wilcox
2024-02-27 16:21                                                 ` Paul E. McKenney
2024-02-27 16:34                                                   ` Kent Overstreet
2024-02-27 17:58                                                     ` Paul E. McKenney
2024-02-28 23:55                                                       ` Kent Overstreet
2024-02-29 19:42                                                         ` Paul E. McKenney
2024-02-29 20:51                                                           ` Kent Overstreet
2024-03-05  2:19                                                             ` Paul E. McKenney
2024-02-27  0:43                                 ` Dave Chinner
2024-02-26 22:46                       ` Linus Torvalds
2024-02-26 23:48                         ` Linus Torvalds
2024-02-27  7:21                           ` Kent Overstreet
2024-02-27 15:39                             ` Matthew Wilcox
2024-02-27 15:54                               ` Kent Overstreet
2024-02-27 16:34                             ` Linus Torvalds
2024-02-27 16:47                               ` Kent Overstreet
2024-02-27 17:07                                 ` Linus Torvalds
2024-02-27 17:20                                   ` Kent Overstreet
2024-02-27 18:02                                     ` Linus Torvalds
2024-05-14 11:52                         ` Luis Chamberlain
2024-05-14 16:04                           ` Linus Torvalds
2024-11-15 19:43                           ` Linus Torvalds
2024-11-15 20:42                             ` Matthew Wilcox
2024-11-15 21:52                               ` Linus Torvalds
2024-02-25 21:29           ` Kent Overstreet
2024-02-25 17:32         ` Kent Overstreet
2024-02-24 17:55   ` Luis Chamberlain
2024-02-25  5:24 ` Kent Overstreet
2024-02-26 12:22 ` Dave Chinner
2024-02-27 10:07 ` Kent Overstreet
2024-02-27 14:08   ` Luis Chamberlain
2024-02-27 14:57     ` Kent Overstreet
2024-02-27 22:13   ` Dave Chinner
2024-02-27 22:21     ` Kent Overstreet
2024-02-27 22:42       ` Dave Chinner
2024-02-28  7:48         ` [Lsf-pc] " Amir Goldstein
2024-02-28 14:01           ` Chris Mason
2024-02-29  0:25           ` Dave Chinner
2024-02-29  0:57             ` Kent Overstreet
2024-03-04  0:46               ` Dave Chinner
2024-02-27 22:46       ` Linus Torvalds
2024-02-27 23:00         ` Linus Torvalds
2024-02-28  2:22         ` Kent Overstreet
2024-02-28  3:00           ` Matthew Wilcox
2024-02-28  4:22             ` Matthew Wilcox
2024-02-28 17:34               ` Kent Overstreet
2024-02-28 18:04                 ` Matthew Wilcox
2024-02-28 18:18         ` Kent Overstreet
2024-02-28 19:09           ` Linus Torvalds
2024-02-28 19:29             ` Kent Overstreet
2024-02-28 20:17               ` Linus Torvalds
2024-02-28 23:21                 ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgeaMU_zY95QM+KUO1RmiuykbKgKVyBi9G1pH_kPgO9kQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=clm@fb.com \
    --cc=clm@meta.com \
    --cc=da.gomez@samsung.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox