From: Yafang Shao <laoar.shao@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: willy@infradead.org, akpm@linux-foundation.org,
linux-mm@kvack.org, stable@vger.kernel.org
Subject: Re: [PATCH v2] mm/readahead: Fix large folio support in async readahead
Date: Mon, 11 Nov 2024 22:28:09 +0800 [thread overview]
Message-ID: <CALOAHbAe8GSf2=+sqzy32pWM2jtENmDnZcMhBEYruJVyWa_dww@mail.gmail.com> (raw)
In-Reply-To: <85cfc467-320f-4388-b027-2cbad85dfbed@redhat.com>
On Mon, Nov 11, 2024 at 6:33 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.11.24 15:17, Yafang Shao wrote:
> > When testing large folio support with XFS on our servers, we observed that
> > only a few large folios are mapped when reading large files via mmap.
> > After a thorough analysis, I identified it was caused by the
> > `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this
> > parameter is set to 128KB. After I tune it to 2MB, the large folio can
> > work as expected. However, I believe the large folio behavior should not be
> > dependent on the value of read_ahead_kb. It would be more robust if the
> > kernel can automatically adopt to it.
>
> Now I am extremely confused.
>
> Documentation/ABI/stable/sysfs-block:
>
> "[RW] Maximum number of kilobytes to read-ahead for filesystems on this
> block device."
>
>
> So, with your patch, will we also be changing the readahead size to
> exceed that, or simply allocate larger folios and not exceeding the
> readahead size (e.g., leaving them partially non-filled)?
Exceeding the readahead size for the MADV_HUGEPAGE case is
straightforward; this is what the current patch accomplishes.
>
> If you're also changing the readahead behavior to exceed the
> configuration parameter it would sound to me like "I am pushing the
> brake pedal and my care brakes; fix the brakes to adopt whether to brake
> automatically" :)
>
> Likely I am missing something here, and how the read_ahead_kb parameter
> is used after your patch.
The read_ahead_kb parameter continues to function for
non-MADV_HUGEPAGE scenarios, whereas special handling is required for
the MADV_HUGEPAGE case. It appears that we ought to update the
Documentation/ABI/stable/sysfs-block to reflect the changes related to
large folios, correct?
>
>
> >
> > With /sys/block/*/queue/read_ahead_kb set to 128KB and performing a
> > sequential read on a 1GB file using MADV_HUGEPAGE, the differences in
> > /proc/meminfo are as follows:
> >
> > - before this patch
> > FileHugePages: 18432 kB
> > FilePmdMapped: 4096 kB
> >
> > - after this patch
> > FileHugePages: 1067008 kB
> > FilePmdMapped: 1048576 kB
> >
> > This shows that after applying the patch, the entire 1GB file is mapped to
> > huge pages. The stable list is CCed, as without this patch, large folios
> > don’t function optimally in the readahead path.
> >> It's worth noting that if read_ahead_kb is set to a larger value
> that isn't
> > aligned with huge page sizes (e.g., 4MB + 128KB), it may still fail to map
> > to hugepages.
> >
> > Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: stable@vger.kernel.org
> >
> > ---
> > mm/readahead.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > Changes:
> > v1->v2:
> > - Drop the align (Matthew)
> > - Improve commit log (Andrew)
> >
> > RFC->v1: https://lore.kernel.org/linux-mm/20241106092114.8408-1-laoar.shao@gmail.com/
> > - Simplify the code as suggested by Matthew
> >
> > RFC: https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao@gmail.com/
> >
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index 3dc6c7a128dd..9b8a48e736c6 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -385,6 +385,8 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra,
> > return 4 * cur;
> > if (cur <= max / 2)
> > return 2 * cur;
> > + if (cur > max)
> > + return cur;
> > return max;
>
> Maybe something like
>
> return max_t(unsigned long, cur, max);
>
> might be more readable (likely "max()" cannot be used because of the
> local variable name "max" ...).
>
>
> ... but it's rather weird having a "max" and then returning something
> larger than the "max" ... especially with code like
Indeed, that could lead to confusion ;)
>
> "ra->size = get_next_ra_size(ra, max_pages);"
>
>
> Maybe we can improve that by renaming "max_pages" / "max" to what it
> actually is supposed to be (which I haven't quite understood yet).
Perhaps a more straightforward solution would be to implement it
directly at the callsite, as demonstrated below?
diff --git a/mm/readahead.c b/mm/readahead.c
index 3dc6c7a128dd..187efae95b02 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -642,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl,
1UL << order);
if (index == expected) {
ra->start += ra->size;
- ra->size = get_next_ra_size(ra, max_pages);
+ /*
+ * Allow the actual size to exceed the readahead window for a
+ * large folio.
+ */
+ ra->size = get_next_ra_size(ra, max(max_pages, ra->size));
ra->async_size = ra->size;
goto readit;
}
--
Regards
Yafang
next prev parent reply other threads:[~2024-11-11 14:28 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-08 14:17 Yafang Shao
2024-11-11 10:33 ` David Hildenbrand
2024-11-11 14:28 ` Yafang Shao [this message]
2024-11-11 15:05 ` David Hildenbrand
2024-11-11 15:26 ` David Hildenbrand
2024-11-11 16:13 ` Yafang Shao
2024-11-11 16:08 ` Yafang Shao
2024-11-11 18:31 ` David Hildenbrand
2024-11-11 19:10 ` Yafang Shao
2024-11-12 15:19 ` David Hildenbrand
2024-11-13 2:16 ` Yafang Shao
2024-11-13 8:28 ` David Hildenbrand
2024-11-13 9:46 ` David Hildenbrand
2024-11-13 9:54 ` Yafang Shao
2024-11-13 10:24 ` David Hildenbrand
2024-11-13 4:19 ` Matthew Wilcox
2024-11-13 8:12 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALOAHbAe8GSf2=+sqzy32pWM2jtENmDnZcMhBEYruJVyWa_dww@mail.gmail.com' \
--to=laoar.shao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox