From: Juan Yescas <jyescas@google.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
tjmercier@google.com, isaacmanjarres@google.com,
surenb@google.com, kaleshsingh@google.com,
Vlastimil Babka <vbabka@suse.cz>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
David Hildenbrand <david@redhat.com>,
Mike Rapoport <rppt@kernel.org>, Minchan Kim <minchan@kernel.org>
Subject: Re: [PATCH v4] mm: Add CONFIG_PAGE_BLOCK_ORDER to select page block order
Date: Tue, 20 May 2025 16:14:46 -0700 [thread overview]
Message-ID: <CAJDx_rgmKi4_=zOrJEgux=dZ-Hf6MJevNZD6GHRJc6AkNqi_DA@mail.gmail.com> (raw)
In-Reply-To: <8E999CBA-6F55-4DCA-8BE3-569B1C537802@nvidia.com>
On Sat, May 17, 2025 at 11:51 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 9 May 2025, at 21:02, Juan Yescas wrote:
>
> > Problem: On large page size configurations (16KiB, 64KiB), the CMA
> > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably,
> > and this causes the CMA reservations to be larger than necessary.
> > This means that system will have less available MIGRATE_UNMOVABLE and
> > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to them.
> >
> > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on
> > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of
> > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels.
> >
> > For example, in ARM, the CMA alignment requirement when:
> >
> > - CONFIG_ARCH_FORCE_MAX_ORDER default value is used
> > - CONFIG_TRANSPARENT_HUGEPAGE is set:
> >
> > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > -----------------------------------------------------------------------
> > 4KiB | 10 | 10 | 4KiB * (2 ^ 10) = 4MiB
> > 16Kib | 11 | 11 | 16KiB * (2 ^ 11) = 32MiB
> > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> >
> > There are some extreme cases for the CMA alignment requirement when:
> >
> > - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set
> > - CONFIG_TRANSPARENT_HUGEPAGE is NOT set:
> > - CONFIG_HUGETLB_PAGE is NOT set
> >
> > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES
> > ------------------------------------------------------------------------
> > 4KiB | 15 | 15 | 4KiB * (2 ^ 15) = 128MiB
> > 16Kib | 13 | 13 | 16KiB * (2 ^ 13) = 128MiB
> > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) = 512MiB
> >
> > This affects the CMA reservations for the drivers. If a driver in a
> > 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal
> > reservation has to be 32MiB due to the alignment requirements:
> >
> > reserved-memory {
> > ...
> > cma_test_reserve: cma_test_reserve {
> > compatible = "shared-dma-pool";
> > size = <0x0 0x400000>; /* 4 MiB */
> > ...
> > };
> > };
> >
> > reserved-memory {
> > ...
> > cma_test_reserve: cma_test_reserve {
> > compatible = "shared-dma-pool";
> > size = <0x0 0x2000000>; /* 32 MiB */
> > ...
> > };
> > };
> >
> > Solution: Add a new config CONFIG_PAGE_BLOCK_ORDER that
> > allows to set the page block order in all the architectures.
> > The maximum page block order will be given by
> > ARCH_FORCE_MAX_ORDER.
> >
> > By default, CONFIG_PAGE_BLOCK_ORDER will have the same
> > value that ARCH_FORCE_MAX_ORDER. This will make sure that
> > current kernel configurations won't be affected by this
> > change. It is a opt-in change.
> >
> > This patch will allow to have the same CMA alignment
> > requirements for large page sizes (16KiB, 64KiB) as that
> > in 4kb kernels by setting a lower pageblock_order.
> >
> > Tests:
> >
> > - Verified that HugeTLB pages work when pageblock_order is 1, 7, 10
> > on 4k and 16k kernels.
> >
> > - Verified that Transparent Huge Pages work when pageblock_order
> > is 1, 7, 10 on 4k and 16k kernels.
> >
> > - Verified that dma-buf heaps allocations work when pageblock_order
> > is 1, 7, 10 on 4k and 16k kernels.
> >
> > Benchmarks:
> >
> > The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The
> > reason for the pageblock_order 7 is because this value makes the min
> > CMA alignment requirement the same as that in 4kb kernels (2MB).
> >
> > - Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of
> > SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf
> > (https://developer.android.com/ndk/guides/simpleperf) to measure
> > the # of instructions and page-faults on 16k kernels.
> > The benchmark was executed 10 times. The averages are below:
> >
> > # instructions | #page-faults
> > order 10 | order 7 | order 10 | order 7
> > --------------------------------------------------------
> > 13,891,765,770 | 11,425,777,314 | 220 | 217
> > 14,456,293,487 | 12,660,819,302 | 224 | 219
> > 13,924,261,018 | 13,243,970,736 | 217 | 221
> > 13,910,886,504 | 13,845,519,630 | 217 | 221
> > 14,388,071,190 | 13,498,583,098 | 223 | 224
> > 13,656,442,167 | 12,915,831,681 | 216 | 218
> > 13,300,268,343 | 12,930,484,776 | 222 | 218
> > 13,625,470,223 | 14,234,092,777 | 219 | 218
> > 13,508,964,965 | 13,432,689,094 | 225 | 219
> > 13,368,950,667 | 13,683,587,37 | 219 | 225
> > -------------------------------------------------------------------
> > 13,803,137,433 | 13,131,974,268 | 220 | 220 Averages
> >
> > There were 4.85% #instructions when order was 7, in comparison
> > with order 10.
> >
> > 13,803,137,433 - 13,131,974,268 = -671,163,166 (-4.86%)
> >
> > The number of page faults in order 7 and 10 were the same.
> >
> > These results didn't show any significant regression when the
> > pageblock_order is set to 7 on 16kb kernels.
> >
> > - Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 times
> > on the 16k kernels with pageblock_order 7 and 10.
> >
> > order 10 | order 7 | order 7 - order 10 | (order 7 - order 10) %
> > -------------------------------------------------------------------
> > 15.8 | 16.4 | 0.6 | 3.80%
> > 16.4 | 16.2 | -0.2 | -1.22%
> > 16.6 | 16.3 | -0.3 | -1.81%
> > 16.8 | 16.3 | -0.5 | -2.98%
> > 16.6 | 16.8 | 0.2 | 1.20%
> > -------------------------------------------------------------------
> > 16.44 16.4 -0.04 -0.24% Averages
> >
> > The results didn't show any significant regression when the
> > pageblock_order is set to 7 on 16kb kernels.
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > Cc: David Hildenbrand <david@redhat.com>
> > CC: Mike Rapoport <rppt@kernel.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Signed-off-by: Juan Yescas <jyescas@google.com>
> > Acked-by: Zi Yan <ziy@nvidia.com>
> > ---
> > Changes in v4:
> > - Set PAGE_BLOCK_ORDER in incluxe/linux/mmzone.h to
> > validate that MAX_PAGE_ORDER >= PAGE_BLOCK_ORDER at
> > compile time.
> > - This change fixes the warning in:
> > https://lore.kernel.org/oe-kbuild-all/202505091548.FuKO4b4v-lkp@intel.com/
> >
> > Changes in v3:
> > - Rename ARCH_FORCE_PAGE_BLOCK_ORDER to PAGE_BLOCK_ORDER
> > as per Matthew's suggestion.
> > - Update comments in pageblock-flags.h for pageblock_order
> > value when THP or HugeTLB are not used.
> >
> > Changes in v2:
> > - Add Zi's Acked-by tag.
> > - Move ARCH_FORCE_PAGE_BLOCK_ORDER config to mm/Kconfig as
> > per Zi and Matthew suggestion so it is available to
> > all the architectures.
> > - Set ARCH_FORCE_PAGE_BLOCK_ORDER to 10 by default when
> > ARCH_FORCE_MAX_ORDER is not available.
> >
> >
> > include/linux/mmzone.h | 16 ++++++++++++++++
> > include/linux/pageblock-flags.h | 8 ++++----
> > mm/Kconfig | 31 +++++++++++++++++++++++++++++++
> > 3 files changed, 51 insertions(+), 4 deletions(-)
>
> Hi Juan,
>
> The patch below on top of your v4 fixed powerpc build issue, as I tested
> it locally.
>
> From 5c2ae4dfca135e99da45302e4f5d96a315a99603 Mon Sep 17 00:00:00 2001
> From: Zi Yan <ziy@nvidia.com>
> Date: Sat, 17 May 2025 14:49:39 -0400
> Subject: [PATCH] fix CONFIG_PAGE_BLOCK_ORDER
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> mm/Kconfig | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 79237842f7e2..af0dd42e3506 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1016,10 +1016,10 @@ config ARCH_FORCE_MAX_ORDER
> # as per include/linux/mmzone.h.
> config PAGE_BLOCK_ORDER
> int "Page Block Order"
> - range 1 10 if !ARCH_FORCE_MAX_ORDER
> - default 10 if !ARCH_FORCE_MAX_ORDER
> - range 1 ARCH_FORCE_MAX_ORDER if ARCH_FORCE_MAX_ORDER
> - default ARCH_FORCE_MAX_ORDER if ARCH_FORCE_MAX_ORDER
> + range 1 10 if ARCH_FORCE_MAX_ORDER = 0
> + default 10 if ARCH_FORCE_MAX_ORDER = 0
> + range 1 ARCH_FORCE_MAX_ORDER if ARCH_FORCE_MAX_ORDER != 0
> + default ARCH_FORCE_MAX_ORDER if ARCH_FORCE_MAX_ORDER != 0
>
> help
> The page block order refers to the power of two number of pages that
> --
> 2.47.2
>
Thanks Zi, the changes were applied in v6
https://lore.kernel.org/all/20250520225945.991229-1-jyescas@google.com/
>
>
>
> --
> Best Regards,
> Yan, Zi
prev parent reply other threads:[~2025-05-20 23:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-10 1:02 Juan Yescas
2025-05-10 17:16 ` kernel test robot
2025-05-13 15:08 ` Zi Yan
2025-05-13 16:41 ` Juan Yescas
2025-05-13 16:47 ` Zi Yan
2025-05-13 16:52 ` Zi Yan
2025-05-13 17:33 ` Juan Yescas
2025-05-13 17:32 ` Juan Yescas
2025-05-17 18:51 ` Zi Yan
2025-05-20 23:14 ` Juan Yescas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJDx_rgmKi4_=zOrJEgux=dZ-Hf6MJevNZD6GHRJc6AkNqi_DA@mail.gmail.com' \
--to=jyescas@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=isaacmanjarres@google.com \
--cc=kaleshsingh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=minchan@kernel.org \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tjmercier@google.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox