linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: willy@infradead.org, akpm@linux-foundation.org,
	linux-mm@kvack.org,  stable@vger.kernel.org
Subject: Re: [PATCH v2] mm/readahead: Fix large folio support in async readahead
Date: Mon, 11 Nov 2024 22:28:09 +0800	[thread overview]
Message-ID: <CALOAHbAe8GSf2=+sqzy32pWM2jtENmDnZcMhBEYruJVyWa_dww@mail.gmail.com> (raw)
In-Reply-To: <85cfc467-320f-4388-b027-2cbad85dfbed@redhat.com>

On Mon, Nov 11, 2024 at 6:33 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 08.11.24 15:17, Yafang Shao wrote:
> > When testing large folio support with XFS on our servers, we observed that
> > only a few large folios are mapped when reading large files via mmap.
> > After a thorough analysis, I identified it was caused by the
> > `/sys/block/*/queue/read_ahead_kb` setting. On our test servers, this
> > parameter is set to 128KB. After I tune it to 2MB, the large folio can
> > work as expected. However, I believe the large folio behavior should not be
> > dependent on the value of read_ahead_kb. It would be more robust if the
> > kernel can automatically adopt to it.
>
> Now I am extremely confused.
>
> Documentation/ABI/stable/sysfs-block:
>
> "[RW] Maximum number of kilobytes to read-ahead for filesystems on this
> block device."
>
>
> So, with your patch, will we also be changing the readahead size to
> exceed that, or simply allocate larger folios and not exceeding the
> readahead size (e.g., leaving them partially non-filled)?

Exceeding the readahead size for the MADV_HUGEPAGE case is
straightforward; this is what the current patch accomplishes.

>
> If you're also changing the readahead behavior to exceed the
> configuration parameter it would sound to me like "I am pushing the
> brake pedal and my care brakes; fix the brakes to adopt whether to brake
> automatically" :)
>
> Likely I am missing something here, and how the read_ahead_kb parameter
> is used after your patch.

The read_ahead_kb parameter continues to function for
non-MADV_HUGEPAGE scenarios, whereas special handling is required for
the MADV_HUGEPAGE case. It appears that we ought to update the
Documentation/ABI/stable/sysfs-block to reflect the changes related to
large folios, correct?

>
>
> >
> > With /sys/block/*/queue/read_ahead_kb set to 128KB and performing a
> > sequential read on a 1GB file using MADV_HUGEPAGE, the differences in
> > /proc/meminfo are as follows:
> >
> > - before this patch
> >    FileHugePages:     18432 kB
> >    FilePmdMapped:      4096 kB
> >
> > - after this patch
> >    FileHugePages:   1067008 kB
> >    FilePmdMapped:   1048576 kB
> >
> > This shows that after applying the patch, the entire 1GB file is mapped to
> > huge pages. The stable list is CCed, as without this patch, large folios
> > don’t function optimally in the readahead path.
>  >> It's worth noting that if read_ahead_kb is set to a larger value
> that isn't
> > aligned with huge page sizes (e.g., 4MB + 128KB), it may still fail to map
> > to hugepages.
> >
> > Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings")
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > Cc: stable@vger.kernel.org
> >
> > ---
> >   mm/readahead.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > Changes:
> > v1->v2:
> > - Drop the align (Matthew)
> > - Improve commit log (Andrew)
> >
> > RFC->v1: https://lore.kernel.org/linux-mm/20241106092114.8408-1-laoar.shao@gmail.com/
> > - Simplify the code as suggested by Matthew
> >
> > RFC: https://lore.kernel.org/linux-mm/20241104143015.34684-1-laoar.shao@gmail.com/
> >
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index 3dc6c7a128dd..9b8a48e736c6 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -385,6 +385,8 @@ static unsigned long get_next_ra_size(struct file_ra_state *ra,
> >               return 4 * cur;
> >       if (cur <= max / 2)
> >               return 2 * cur;
> > +     if (cur > max)
> > +             return cur;
> >       return max;
>
> Maybe something like
>
> return max_t(unsigned long, cur, max);
>
> might be more readable (likely "max()" cannot be used because of the
> local variable name "max" ...).
>
>
> ... but it's rather weird having a "max" and then returning something
> larger than the "max" ... especially with code like

Indeed, that could lead to confusion ;)

>
> "ra->size = get_next_ra_size(ra, max_pages);"
>
>
> Maybe we can improve that by renaming "max_pages" / "max" to what it
> actually is supposed to be (which I haven't quite understood yet).

Perhaps a more straightforward solution would be to implement it
directly at the callsite, as demonstrated below?

diff --git a/mm/readahead.c b/mm/readahead.c
index 3dc6c7a128dd..187efae95b02 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -642,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl,
                        1UL << order);
        if (index == expected) {
                ra->start += ra->size;
-               ra->size = get_next_ra_size(ra, max_pages);
+               /*
+                * Allow the actual size to exceed the readahead window for a
+                * large folio.
+                */
+               ra->size = get_next_ra_size(ra, max(max_pages, ra->size));
                ra->async_size = ra->size;
                goto readit;
        }


--
Regards
Yafang


  reply	other threads:[~2024-11-11 14:28 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08 14:17 Yafang Shao
2024-11-11 10:33 ` David Hildenbrand
2024-11-11 14:28   ` Yafang Shao [this message]
2024-11-11 15:05     ` David Hildenbrand
2024-11-11 15:26       ` David Hildenbrand
2024-11-11 16:13         ` Yafang Shao
2024-11-11 16:08       ` Yafang Shao
2024-11-11 18:31         ` David Hildenbrand
2024-11-11 19:10           ` Yafang Shao
2024-11-12 15:19             ` David Hildenbrand
2024-11-13  2:16               ` Yafang Shao
2024-11-13  8:28                 ` David Hildenbrand
2024-11-13  9:46                   ` David Hildenbrand
2024-11-13  9:54                   ` Yafang Shao
2024-11-13 10:24                     ` David Hildenbrand
2024-11-13  4:19               ` Matthew Wilcox
2024-11-13  8:12                 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALOAHbAe8GSf2=+sqzy32pWM2jtENmDnZcMhBEYruJVyWa_dww@mail.gmail.com' \
    --to=laoar.shao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox