From: Jan Kara <jack@suse.cz>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
David Hildenbrand <david@redhat.com>,
Dave Chinner <david@fromorbit.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Will Deacon <will@kernel.org>,
Kalesh Singh <kaleshsingh@google.com>, Zi Yan <ziy@nvidia.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [RFC PATCH v4 2/5] mm/readahead: Terminate async readahead on natural boundary
Date: Mon, 5 May 2025 11:37:38 +0200 [thread overview]
Message-ID: <67wws7qs5v3poq6sefrrt4dgdn4ejh52mg5x7ycbxqvrfdvow3@zraqczowrvrl> (raw)
In-Reply-To: <3myknukhnrtdb4y5i6ewcgpubg2fopxc35ii6a4oy5ffgn7xdf@uileryotgd7z>
On Mon 05-05-25 11:13:26, Jan Kara wrote:
> On Wed 30-04-25 15:59:15, Ryan Roberts wrote:
> > Previously asynchonous readahead would read ra_pages (usually 128K)
> > directly after the end of the synchonous readahead and given the
> > synchronous readahead portion had no alignment guarantees (beyond page
> > boundaries) it is possible (and likely) that the end of the initial 128K
> > region would not fall on a natural boundary for the folio size being
> > used. Therefore smaller folios were used to align down to the required
> > boundary, both at the end of the previous readahead block and at the
> > start of the new one.
> >
> > In the worst cases, this can result in never properly ramping up the
> > folio size, and instead getting stuck oscillating between order-0, -1
> > and -2 folios. The next readahead will try to use folios whose order is
> > +2 bigger than the folio that had the readahead marker. But because of
> > the alignment requirements, that folio (the first one in the readahead
> > block) can end up being order-0 in some cases.
> >
> > There will be 2 modifications to solve this issue:
> >
> > 1) Calculate the readahead size so the end is aligned to a folio
> > boundary. This prevents needing to allocate small folios to align
> > down at the end of the window and fixes the oscillation problem.
> >
> > 2) Remember the "preferred folio order" in the ra state instead of
> > inferring it from the folio with the readahead marker. This solves
> > the slow ramp up problem (discussed in a subsequent patch).
> >
> > This patch addresses (1) only. A subsequent patch will address (2).
> >
> > Worked example:
> >
> > The following shows the previous pathalogical behaviour when the initial
> > synchronous readahead is unaligned. We start reading at page 17 in the
> > file and read sequentially from there. I'm showing a dump of the pages
> > in the page cache just after we read the first page of the folio with
> > the readahead marker.
<snip>
> Looks good. When I was reading this code some time ago, I also felt we
> should rather do some rounding instead of creating small folios so thanks
> for working on this. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
But now I've also remembered why what you do here isn't an obvious win.
There are storage devices (mostly RAID arrays) where optimum read size
isn't a power of 2. Think for example a RAID-0 device composed from three
disks. It will have max_pages something like 384 (512k * 3). Suppose we are
on x86 and max_order is 9. Then previously (if we were lucky with
alignment) we were alternating between order 7 and order 8 pages in the
page cache and do optimally sized IOs od 1536k. Now you will allocate all
folios of order 8 (nice) but reads will be just 1024k and you'll see
noticeable drop in read throughput (not nice). Note that this is not just a
theoretical example but a real case we have hit when doing performance
testing of servers and for which I was tweaking readahead code in the past.
So I think we need to tweak this logic a bit. Perhaps we should round_down
end to the minimum alignment dictated by 'order' and maxpages? Like:
1 << min(order, ffs(max_pages) + PAGE_SHIFT - 1)
If you set badly aligned readahead size manually, you will get small pages
in the page cache but that's just you being stupid. In practice, hardware
induced readahead size need not be powers of 2 but they are *sane* :).
Honza
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index 8bb316f5a842..82f9f623f2d7 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -625,7 +625,7 @@ void page_cache_async_ra(struct readahead_control *ractl,
> > unsigned long max_pages;
> > struct file_ra_state *ra = ractl->ra;
> > pgoff_t index = readahead_index(ractl);
> > - pgoff_t expected, start;
> > + pgoff_t expected, start, end, aligned_end;
> > unsigned int order = folio_order(folio);
> >
> > /* no readahead */
> > @@ -657,7 +657,6 @@ void page_cache_async_ra(struct readahead_control *ractl,
> > * the readahead window.
> > */
> > ra->size = max(ra->size, get_next_ra_size(ra, max_pages));
> > - ra->async_size = ra->size;
> > goto readit;
> > }
> >
> > @@ -678,9 +677,13 @@ void page_cache_async_ra(struct readahead_control *ractl,
> > ra->size = start - index; /* old async_size */
> > ra->size += req_count;
> > ra->size = get_next_ra_size(ra, max_pages);
> > - ra->async_size = ra->size;
> > readit:
> > order += 2;
> > + end = ra->start + ra->size;
> > + aligned_end = round_down(end, 1UL << order);
> > + if (aligned_end > ra->start)
> > + ra->size -= end - aligned_end;
> > + ra->async_size = ra->size;
> > ractl->_index = ra->start;
> > page_cache_ra_order(ractl, ra, order);
> > }
> > --
> > 2.43.0
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2025-05-05 9:37 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-30 14:59 [RFC PATCH v4 0/5] Readahead tweaks for larger folios Ryan Roberts
2025-04-30 14:59 ` [RFC PATCH v4 1/5] mm/readahead: Honour new_order in page_cache_ra_order() Ryan Roberts
2025-05-05 8:49 ` Jan Kara
2025-05-05 9:51 ` David Hildenbrand
2025-05-05 10:09 ` Jan Kara
2025-05-05 10:25 ` David Hildenbrand
2025-05-05 12:51 ` Ryan Roberts
2025-05-05 16:14 ` Jan Kara
2025-05-05 10:09 ` Anshuman Khandual
2025-05-05 13:00 ` Ryan Roberts
2025-05-08 12:55 ` Pankaj Raghav (Samsung)
2025-05-09 13:30 ` Ryan Roberts
2025-05-09 20:50 ` Pankaj Raghav (Samsung)
2025-05-13 12:33 ` Ryan Roberts
2025-05-13 6:19 ` Chaitanya S Prakash
2025-04-30 14:59 ` [RFC PATCH v4 2/5] mm/readahead: Terminate async readahead on natural boundary Ryan Roberts
2025-05-05 9:13 ` Jan Kara
2025-05-05 9:37 ` Jan Kara [this message]
2025-05-06 9:28 ` Ryan Roberts
2025-05-06 11:29 ` Jan Kara
2025-05-06 15:31 ` Ryan Roberts
2025-04-30 14:59 ` [RFC PATCH v4 3/5] mm/readahead: Make space in struct file_ra_state Ryan Roberts
2025-05-05 9:39 ` Jan Kara
2025-05-05 9:57 ` David Hildenbrand
2025-05-09 10:00 ` Pankaj Raghav (Samsung)
2025-04-30 14:59 ` [RFC PATCH v4 4/5] mm/readahead: Store folio order " Ryan Roberts
2025-05-05 9:52 ` Jan Kara
2025-05-06 9:53 ` Ryan Roberts
2025-05-06 10:45 ` Jan Kara
2025-05-05 10:08 ` David Hildenbrand
2025-05-06 10:03 ` Ryan Roberts
2025-05-06 14:24 ` David Hildenbrand
2025-05-06 15:06 ` Ryan Roberts
2025-04-30 14:59 ` [RFC PATCH v4 5/5] mm/filemap: Allow arch to request folio size for exec memory Ryan Roberts
2025-05-05 10:06 ` Jan Kara
2025-05-09 13:52 ` Will Deacon
2025-05-13 12:46 ` Ryan Roberts
2025-05-14 15:14 ` Will Deacon
2025-05-14 15:31 ` Ryan Roberts
2025-05-06 10:05 ` [RFC PATCH v4 0/5] Readahead tweaks for larger folios Ryan Roberts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67wws7qs5v3poq6sefrrt4dgdn4ejh52mg5x7ycbxqvrfdvow3@zraqczowrvrl \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=david@fromorbit.com \
--cc=david@redhat.com \
--cc=kaleshsingh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox