From: Pedro Falcato <pfalcato@suse.de>
To: Anatoly Stepanov <stepanov.anatoly@huawei.com>
Cc: willy@infradead.org, akpm@linux-foundation.org, david@kernel.org,
ljs@kernel.org, Liam.Howlett@oracle.com, vbabka@kernel.org,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
wangkefeng.wang@huawei.com, yanquanmin1@huawei.com,
zuoze1@huawei.com, artem.kuzin@huawei.com,
gutierrez.asier@huawei-partners.com,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 2/2] filemap: use high-order folios in filemap sync RA
Date: Wed, 15 Apr 2026 13:06:09 +0100 [thread overview]
Message-ID: <3cr6ppe6bic47tan2iapuh67s67hiroangvdiap4jbn7ypru2o@rbvqv3sxifwp> (raw)
In-Reply-To: <20260415192853.3470423-3-stepanov.anatoly@huawei.com>
On Thu, Apr 16, 2026 at 03:28:53AM +0800, Anatoly Stepanov wrote:
> [Idea]
>
> If a mmap'ed file being accessed such that async RA never
> kicks in, we might end up with only 0-order folios in the page cache.
>
> if fault_around_bytes is larger than 1 single page, then
> it's beneficial to use high-order folios, which brings significant
> filemap_map_pages() speedup.
> So, let's just use fault_around_bytes as a starting point here.
Well, this heuristic looks arbitrary. I don't like to mix different concepts.
With this, in practice most file folios will be 64K. Why? Why is it related
to faultaround when faultaround is a separate mechanism that isn't particularly
relevant here?
>
> if an arch supports PTE-coalescing we can get more of those for free.
> (see arm64 example below)
>
> We don't save the new order to "ra->order", so if async RA will happen
> it would normally start from order-0.
>
> [Things to be discussed]
>
> But at the same time, i can see drawback for 16K, 64K pages, in this case fault_around will still be 64K by default.
> In this case, it seems makes sense to make the fault_around_bytes be like order-N of PAGE_SIZE, not fixed bytes number.
>
> Another issue is - when fault_around=0, but we'd like to use high-order folios for sync_RA, for cont-PTE for example,
> For this we can use kind of "max(fault_around_order, cont_pte_order)".
>
> Or introduce some dedicated tunable like "sync_mmap_order".
>
> [Benchmark]
>
> Simple benchmark below reading 100M file in 4M (RA size) chunks
> such that async RA doesn't kick in and the page cache ends up being
> filled up with 0-order folios.
Well, the problem is that you are _never_ getting RA to kick in. Folio
size is the least of your concern, you are effectively not doing much
readahead since the kernel thinks you're doing random accesses.
>
> The patched kernel gives ~3 times increase in throughput,
> considering the page cache is filled up at the moment.
>
> The main speedup comes from filemap_map_pages() due to high-order
> folios usage.
>
> As a bonus, we get better cont_pte bit coverage for Arm64.
>
> Example:
> // Open 100M file and read every 4M chunk, given max_ra=4M
> // Perform 10 runs, measure the throughput.
> ...
> char *map = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE, fd, 0);
> if (map == MAP_FAILED) {
> perror("Error mapping file");
> close(fd);
> return 1;
> }
>
> struct timespec start, end;
> clock_gettime(CLOCK_MONOTONIC, &start);
>
> unsigned int size_4M = 4*1024*1024;
> unsigned int num_reads = filesize / size_4M;
> volatile char val;
> for (int i = 0; i < num_reads; i++) {
> off_t offset = (off_t)i * size_4M;
> val = map[offset];
> }
This doesn't seem like a real issue. And if it is, you can always issue
readahead manually. But the whole pattern of "every perfectly-sized RA
window, access 4 bytes and advance" is completely bizarre. And _if_ this
is your workload, then having order-0 folios at the read site is much better
than filling your page cache with data you are not accessing.
Do you have an actual use case for this? Where have you observed these
problems?
--
Pedro
next prev parent reply other threads:[~2026-04-15 12:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 19:28 [RFC PATCH 0/2] Use high-order folios in mmap " Anatoly Stepanov
2026-04-15 13:18 ` Matthew Wilcox
2026-04-15 13:33 ` Stepanov Anatoly
2026-04-15 19:28 ` [RFC PATCH 1/2] procfs: add contpte info into smaps Anatoly Stepanov
2026-04-15 12:52 ` David Hildenbrand (Arm)
2026-04-15 19:28 ` [RFC PATCH 2/2] filemap: use high-order folios in filemap sync RA Anatoly Stepanov
2026-04-15 12:06 ` Pedro Falcato [this message]
2026-04-15 12:31 ` Stepanov Anatoly
2026-04-15 12:46 ` Stepanov Anatoly
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3cr6ppe6bic47tan2iapuh67s67hiroangvdiap4jbn7ypru2o@rbvqv3sxifwp \
--to=pfalcato@suse.de \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=artem.kuzin@huawei.com \
--cc=david@kernel.org \
--cc=gutierrez.asier@huawei-partners.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=stepanov.anatoly@huawei.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=yanquanmin1@huawei.com \
--cc=zuoze1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox