* [PATCH 0/2] readahead: Reintroduce fix for improper RA window sizing
@ 2024-12-04 18:10 Jan Kara
2024-12-04 18:10 ` [PATCH 1/2] readahead: Don't shorted readahead window in read_pages() Jan Kara
2024-12-04 18:10 ` [PATCH 2/2] readahead: properly shorten readahead when falling back to do_page_cache_ra() Jan Kara
0 siblings, 2 replies; 3+ messages in thread
From: Jan Kara @ 2024-12-04 18:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Matthew Wilcox, linux-fsdevel, Jan Kara
Hello,
this small patch series reintroduces a fix of readahead window confusion (and
thus read throughput reduction) when page_cache_ra_order() ends up failing due
to folios already present in the page cache. After thinking about this for
a while I have ended up with a dumb fix that just rechecks if we have something
to read before calling do_page_cache_ra(). This fixes the problem reported in
[1]. I still think it doesn't make much sense to update readahead window size
in read_pages() so patch 1 removes that but the real fix in patch 2 does not
depend on it.
Patches are based on top of my revert that's in MM tree as of today but I
expect it lands in Linus' tree very soon.
Honza
[1] https://lore.kernel.org/all/49648605-d800-4859-be49-624bbe60519d@gmail.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 1/2] readahead: Don't shorted readahead window in read_pages()
2024-12-04 18:10 [PATCH 0/2] readahead: Reintroduce fix for improper RA window sizing Jan Kara
@ 2024-12-04 18:10 ` Jan Kara
2024-12-04 18:10 ` [PATCH 2/2] readahead: properly shorten readahead when falling back to do_page_cache_ra() Jan Kara
1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2024-12-04 18:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Matthew Wilcox, linux-fsdevel, Jan Kara
When ->readahead callback doesn't read all requested pages, read_pages()
shortens the readahead window (ra->size). However we don't know why
pages were not read and what appropriate window size is. So don't try to
secondguess the filesystem. If it needs different readahead window, it
should set it manually similary as during expansion the filesystem can
use readahead_expand().
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/readahead.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index ea650b8b02fb..78d7f4db9966 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -158,20 +158,10 @@ static void read_pages(struct readahead_control *rac)
if (aops->readahead) {
aops->readahead(rac);
- /*
- * Clean up the remaining folios. The sizes in ->ra
- * may be used to size the next readahead, so make sure
- * they accurately reflect what happened.
- */
+ /* Clean up the remaining folios. */
while ((folio = readahead_folio(rac)) != NULL) {
- unsigned long nr = folio_nr_pages(folio);
-
folio_get(folio);
- rac->ra->size -= nr;
- if (rac->ra->async_size >= nr) {
- rac->ra->async_size -= nr;
- filemap_remove_folio(folio);
- }
+ filemap_remove_folio(folio);
folio_unlock(folio);
folio_put(folio);
}
--
2.35.3
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH 2/2] readahead: properly shorten readahead when falling back to do_page_cache_ra()
2024-12-04 18:10 [PATCH 0/2] readahead: Reintroduce fix for improper RA window sizing Jan Kara
2024-12-04 18:10 ` [PATCH 1/2] readahead: Don't shorted readahead window in read_pages() Jan Kara
@ 2024-12-04 18:10 ` Jan Kara
1 sibling, 0 replies; 3+ messages in thread
From: Jan Kara @ 2024-12-04 18:10 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Matthew Wilcox, linux-fsdevel, Jan Kara
When we succeed in creating some folios in page_cache_ra_order() but
then need to fallback to single page folios, we don't shorten the amount
to read passed to do_page_cache_ra() by the amount we've already read.
This then results in reading more and also in placing another readahead
mark in the middle of the readahead window which confuses readahead
code. Fix the problem by properly reducing number of pages to read.
Unlike previous attempt at this fix (commit 7c877586da31) which had to
be reverted, we are now careful to check there is indeed something to
read so that we don't submit negative-sized readahead.
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/readahead.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index 78d7f4db9966..006954c76652 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -448,7 +448,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
struct file_ra_state *ra, unsigned int new_order)
{
struct address_space *mapping = ractl->mapping;
- pgoff_t index = readahead_index(ractl);
+ pgoff_t start = readahead_index(ractl);
+ pgoff_t index = start;
unsigned int min_order = mapping_min_folio_order(mapping);
pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
pgoff_t mark = index + ra->size - ra->async_size;
@@ -506,12 +507,18 @@ void page_cache_ra_order(struct readahead_control *ractl,
/*
* If there were already pages in the page cache, then we may have
* left some gaps. Let the regular readahead code take care of this
- * situation.
+ * situation below.
*/
if (!err)
return;
fallback:
- do_page_cache_ra(ractl, ra->size, ra->async_size);
+ /*
+ * ->readahead() may have updated readahead window size so we have to
+ * check there's still something to read.
+ */
+ if (ra->size > index - start)
+ do_page_cache_ra(ractl, ra->size - (index - start),
+ ra->async_size);
}
static unsigned long ractl_max_pages(struct readahead_control *ractl,
--
2.35.3
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-12-04 18:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-04 18:10 [PATCH 0/2] readahead: Reintroduce fix for improper RA window sizing Jan Kara
2024-12-04 18:10 ` [PATCH 1/2] readahead: Don't shorted readahead window in read_pages() Jan Kara
2024-12-04 18:10 ` [PATCH 2/2] readahead: properly shorten readahead when falling back to do_page_cache_ra() Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox