From: Jan Kara <jack@suse.cz>
To: <linux-mm@kvack.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
<linux-fsdevel@vger.kernel.org>, Jan Kara <jack@suse.cz>
Subject: [PATCH 01/10] readahead: Make sure sync readahead reads needed page
Date: Tue, 25 Jun 2024 12:18:51 +0200 [thread overview]
Message-ID: <20240625101909.12234-1-jack@suse.cz> (raw)
In-Reply-To: <20240625100859.15507-1-jack@suse.cz>
page_cache_sync_ra() is called when a folio we want to read is not in
the page cache. It is expected that it creates the folio (and perhaps
the following folios as well) and submits reads for them unless some
error happens. However if index == ra->start + ra->size,
ondemand_readahead() will treat the call as another async readahead hit.
Thus ra->start will be advanced and we create pages and queue reads from
ra->start + ra->size further. Consequentially the page at 'index' is not
created and filemap_get_pages() has to always go through
filemap_create_folio() path.
This behavior has particularly unfortunate consequences
when we have two IO threads sequentially reading from a shared file (as
is the case when NFS serves sequential reads). In that case what can
happen is:
suppose ra->size == ra->async_size == 128, ra->start = 512
T1 T2
reads 128 pages at index 512
- hits async readahead mark
filemap_readahead()
ondemand_readahead()
if (index == expected ...)
ra->start = 512 + 128 = 640
ra->size = 128
ra->async_size = 128
page_cache_ra_order()
blocks in ra_alloc_folio()
reads 128 pages at index 640
- no page found
page_cache_sync_readahead()
ondemand_readahead()
if (index == expected ...)
ra->start = 640 + 128 = 768
ra->size = 128
ra->async_size = 128
page_cache_ra_order()
submits reads from 768
- still no page found at index 640
filemap_create_folio()
- goes on to index 641
page_cache_sync_readahead()
ondemand_readahead()
- founds ra is confused,
trims is to small size
finds pages were already inserted
And as a result read performance suffers.
Fix the problem by triggering async readahead case in
ondemand_readahead() only if we are calling the function because we hit
the readahead marker. In any other case we need to read the folio at
'index' and thus we cannot really use the current ra state.
Note that the above situation could be viewed as a special case of
file->f_ra state corruption. In fact two thread reading using the shared
file can also seemingly corrupt file->f_ra in interesting ways due to
concurrent access. I never saw that in practice and the fix is going to
be much more complex so for now at least fix this practical problem
while we ponder about the theoretically correct solution.
Signed-off-by: Jan Kara <jack@suse.cz>
---
mm/readahead.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index c1b23989d9ca..af0fbd302a38 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -580,7 +580,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
*/
expected = round_down(ra->start + ra->size - ra->async_size,
1UL << order);
- if (index == expected || index == (ra->start + ra->size)) {
+ if (folio && index == expected) {
ra->start += ra->size;
ra->size = get_next_ra_size(ra, max_pages);
ra->async_size = ra->size;
--
2.35.3
next prev parent reply other threads:[~2024-06-25 10:19 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-25 10:18 [PATCH 0/10] mm: Fix various readahead quirks Jan Kara
2024-06-25 10:18 ` Jan Kara [this message]
2024-06-25 10:18 ` [PATCH 02/10] filemap: Fix page_cache_next_miss() when no hole found Jan Kara
2024-06-25 10:18 ` [PATCH 03/10] readahead: Properly shorten readahead when falling back to do_page_cache_ra() Jan Kara
2024-06-25 10:18 ` [PATCH 04/10] readahead: Drop pointless index from force_page_cache_ra() Jan Kara
2024-06-25 10:18 ` [PATCH 05/10] readahead: Drop index argument of page_cache_async_readahead() Jan Kara
2024-06-25 10:18 ` [PATCH 06/10] readahead: Drop dead code in page_cache_ra_order() Jan Kara
2024-06-25 10:18 ` [PATCH 07/10] readahead: Drop dead code in ondemand_readahead() Jan Kara
2024-06-25 10:18 ` [PATCH 08/10] readahead: Disentangle async and sync readahead Jan Kara
2024-06-25 10:18 ` [PATCH 09/10] readahead: Fold try_context_readahead() into its single caller Jan Kara
2024-06-25 10:19 ` [PATCH 10/10] readahead: Simplify gotos in page_cache_sync_ra() Jan Kara
2024-06-25 17:12 ` [PATCH 0/10] mm: Fix various readahead quirks Josef Bacik
2024-06-27 3:04 ` Zhang Peng
2024-06-27 6:10 ` zippermonkey
2024-06-27 21:13 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240625101909.12234-1-jack@suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox