From: Michal Hocko <mhocko@suse.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Oscar Salvador <OSalvador@suse.com>,
Yuanxi Liu <y.liu@naruida.com>,
David Hildenbrand <david@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: page_alloc: Skip regions with hugetlbfs pages when allocating 1G pages
Date: Fri, 14 Apr 2023 16:53:09 +0200 [thread overview]
Message-ID: <ZDlo1ex3Tx5/rwsQ@dhcp22.suse.cz> (raw)
In-Reply-To: <20230414141429.pwgieuwluxwez3rj@techsingularity.net>
On Fri 14-04-23 15:14:29, Mel Gorman wrote:
> A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is
> taking an excessive amount of time for large amounts of memory. Further
> testing allocating huge pages that the cost is linear i.e. if allocating
> 1G pages in batches of 10 then the time to allocate nr_hugepages from
> 10->20->30->etc increases linearly even though 10 pages are allocated at
> each step. Profiles indicated that much of the time is spent checking the
> validity within already existing huge pages and then attempting a migration
> that fails after isolating the range, draining pages and a whole lot of
> other useless work.
>
> Commit eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from
> pfn_range_valid_contig") removed two checks, one which ignored huge pages
> for contiguous allocations as huge pages can sometimes migrate. While
> there may be value on migrating a 2M page to satisfy a 1G allocation, it's
> potentially expensive if the 1G allocation fails and it's pointless to
> try moving a 1G page for a new 1G allocation or scan the tail pages for
> valid PFNs.
>
> Reintroduce the PageHuge check and assume any contiguous region with
> hugetlbfs pages is unsuitable for a new 1G allocation.
>
> The hpagealloc test allocates huge pages in batches and reports the
> average latency per page over time. This test happens just after boot when
> fragmentation is not an issue. Units are in milliseconds.
>
> hpagealloc
> 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6
> vanilla hugeallocrevert-v1r1 hugeallocsimple-v1r2
> Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%)
> 1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 19.85 ( 94.43%)
> 2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.44 ( 97.07%)
> 3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 20.81 ( 97.86%)
> Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%)
> Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 19.31 ( 76.49%)
> Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 19.43 ( 87.09%)
> Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 20.97 ( 98.20%)
> Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.06 ( 98.28%)
> Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.56 ( 98.24%)
> Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 26.62 ( 97.97%)
> Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.44 * 96.99%*
>
> 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6
> vanilla revert-v1 hugeallocfix-v2
> Duration User 0.28 0.27 0.30
> Duration System 808.66 17.77 35.99
> Duration Elapsed 830.87 18.08 36.33
>
> The vanilla kernel is poor, taking up to 1.3 second to allocate a huge page
> and almost 10 minutes in total to run the test. Reverting the problematic
> commit reduces it to 8ms at worst and the patch takes 26ms. This patch
> fixes the main issue with skipping huge pages but leaves the page_count()
> out because a page with an elevated count potentially can migrate.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022
> Fixes: eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig")
> Reported-by: Yuanxi Liu <y.liu@naruida.com>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> ---
> mm/page_alloc.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7136c36c5d01..b47f520c3051 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -9450,6 +9450,9 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>
> if (PageReserved(page))
> return false;
> +
> + if (PageHuge(page))
> + return false;
> }
> return true;
> }
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2023-04-14 14:53 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-14 14:14 Mel Gorman
2023-04-14 14:38 ` Vlastimil Babka
2023-04-14 14:39 ` David Hildenbrand
2023-04-14 14:53 ` Michal Hocko [this message]
2023-04-14 19:46 ` Andrew Morton
2023-04-14 23:20 ` Doug Berger
2023-04-17 8:09 ` Mel Gorman
2023-04-16 15:31 ` Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDlo1ex3Tx5/rwsQ@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=OSalvador@suse.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=y.liu@naruida.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox