From: Bernd Schubert <bernd.schubert@fastmail.fm>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org
Subject: Re: sequential 1MB mmap read ends in 1 page sync read-ahead
Date: Thu, 13 Apr 2023 23:33:09 +0200 [thread overview]
Message-ID: <df5c4698-46e1-cbfe-b1f6-cc054b12f6fe@fastmail.fm> (raw)
In-Reply-To: <aae918da-833f-7ec5-ac8a-115d66d80d0e@fastmail.fm>
Sorry, forgot to add Andrew and linux-mm into CC.
On 4/13/23 23:27, Bernd Schubert wrote:
> Hello,
>
> I found a weird mmap read behavior while benchmarking the fuse-uring
> patches.
> I did not verify yet, but it does not look fuse specific.
> Basically, I started to check because fio results were much lower
> than expected (better with the new code, though)
>
> fio cmd line:
> fio --size=1G --numjobs=1 --ioengine=mmap --output-format=normal,terse
> --directory=/scratch/dest/ --rw=read multi-file.fio
>
>
> bernd@squeeze1 test2>cat multi-file.fio
> [global]
> group_reporting
> bs=1M
> runtime=300
>
> [test]
>
> This sequential fio sets POSIX_MADV_SEQUENTIAL and then does memcpy
> beginning at offset 0 in 1MB steps (verified with additional
> logging in fios engines/mmap.c).
>
> And additional log in fuse_readahead() gives
>
> [ 1396.215084] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64
> index=0
> [ 1396.237466] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64
> index=255
> [ 1396.263175] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1
> index=254
> [ 1396.282055] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1
> index=253
> ... <count is always 1 page>
> [ 1496.353745] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1
> index=64
> [ 1496.381105] fuse: 000000003fdec504 inode=00000000be0f29d3 count=64
> index=511
> [ 1496.397487] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1
> index=510
> [ 1496.416385] fuse: 000000003fdec504 inode=00000000be0f29d3 count=1
> index=509
> ... <count is always 1 page>
>
> Logging in do_sync_mmap_readahead()
>
> [ 1493.130764] do_sync_mmap_readahead:3015 ino=132 index=0 count=0
> ras_start=0 ras_size=0 ras_async=0 ras_ra_pages=64 ras_mmap_miss=0
> ras_prev_pos=-1
> [ 1493.147173] do_sync_mmap_readahead:3015 ino=132 index=255 count=0
> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0
> ras_prev_pos=-1
> [ 1493.165952] do_sync_mmap_readahead:3015 ino=132 index=254 count=0
> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0
> ras_prev_pos=-1
> [ 1493.185566] do_sync_mmap_readahead:3015 ino=132 index=253 count=0
> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0
> ras_prev_pos=-1
> ...
> [ 1496.341890] do_sync_mmap_readahead:3015 ino=132 index=64 count=0
> ras_start=0 ras_size=64 ras_async=32 ras_ra_pages=64 ras_mmap_miss=0
> ras_prev_pos=-1
> [ 1496.361385] do_sync_mmap_readahead:3015 ino=132 index=511 count=0
> ras_start=96 ras_size=64 ras_async=64 ras_ra_pages=64 ras_mmap_miss=0
> ras_prev_pos=-1
>
>
> So we can see from fuse that it starts to read at page index 0, wants
> 64 pages (which is actually the double of bdi read_ahead_kb), then
> skips index 64...254) and immediately goes to index 255. For the mmaped
> memcpy pages are missing and then it goes back in 1 page steps to get
> these.
>
> A workaround here is to set read_ahead_kb in the bdi to a larger
> value, another workaround might be (untested) to increase the read-ahead
> window. Either of these two seem to be workarounds for the index order
> above.
>
> I understand that read-ahead gets limited by the bdi value (although
> exceeded above), but why does it go back in 1 page steps? My expectation
> would have been
>
> index=0 count=32 (128kb read-head)
> index=32 count=32
> index=64 count=32
> ...
>
>
> This is with plain 6.2 + fuse-uring patches.
>
> Thanks,
> Bernd
next parent reply other threads:[~2023-04-13 21:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <aae918da-833f-7ec5-ac8a-115d66d80d0e@fastmail.fm>
2023-04-13 21:33 ` Bernd Schubert [this message]
2023-04-14 4:07 ` Matthew Wilcox
2023-04-14 15:05 ` Bernd Schubert
2023-04-14 19:52 ` Bernd Schubert
2023-04-14 20:47 ` Bernd Schubert
2023-04-14 22:02 ` Matthew Wilcox
2023-04-17 15:10 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df5c4698-46e1-cbfe-b1f6-cc054b12f6fe@fastmail.fm \
--to=bernd.schubert@fastmail.fm \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox