linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: Usama Arif <usamaarif642@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org,
	shakeel.butt@linux.dev, riel@surriel.com,
	baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	dev.jain@arm.com, hughd@google.com, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kernel-team@meta.com,
	Juan Yescas <jyescas@google.com>
Subject: Re: [RFC] mm: khugepaged: use largest enabled hugepage order for min_free_kbytes
Date: Fri, 06 Jun 2025 11:18:31 -0400	[thread overview]
Message-ID: <A409F7B3-A901-40F9-A694-DC3FB00B57FE@nvidia.com> (raw)
In-Reply-To: <20250606143700.3256414-1-usamaarif642@gmail.com>

On 6 Jun 2025, at 10:37, Usama Arif wrote:

> On arm64 machines with 64K PAGE_SIZE, the min_free_kbytes and hence the
> watermarks are evaluated to extremely high values, for e.g. a server with
> 480G of memory, only 2M mTHP hugepage size set to madvise, with the rest
> of the sizes set to never, the min, low and high watermarks evaluate to
> 11.2G, 14G and 16.8G respectively.
> In contrast for 4K PAGE_SIZE of the same machine, with only 2M THP hugepage
> size set to madvise, the min, low and high watermarks evaluate to 86M, 566M
> and 1G respectively.
> This is because set_recommended_min_free_kbytes is designed for PMD
> hugepages (pageblock_order = min(HPAGE_PMD_ORDER, PAGE_BLOCK_ORDER)).
> Such high watermark values can cause performance and latency issues in
> memory bound applications on arm servers that use 64K PAGE_SIZE, eventhough
> most of them would never actually use a 512M PMD THP.
>
> Instead of using HPAGE_PMD_ORDER for pageblock_order use the highest large
> folio order enabled in set_recommended_min_free_kbytes.
> With this patch, when only 2M THP hugepage size is set to madvise for the
> same machine with 64K page size, with the rest of the sizes set to never,
> the min, low and high watermarks evaluate to 2.08G, 2.6G and 3.1G
> respectively. When 512M THP hugepage size is set to madvise for the same
> machine with 64K page size, the min, low and high watermarks evaluate to
> 11.2G, 14G and 16.8G respectively, the same as without this patch.

Getting pageblock_order involved here might be confusing. I think you just
want to adjust min, low and high watermarks to reasonable values.
Is it OK to rename min_thp_pageblock_nr_pages to min_nr_free_pages_per_zone
and move MIGRATE_PCPTYPES * MIGRATE_PCPTYPES inside? Otherwise, the changes
look reasonable to me.

Another concern on tying watermarks to highest THP order is that if
user enables PMD THP on such systems with 2MB mTHP enabled initially,
it could trigger unexpected memory reclaim and compaction, right?
That might surprise user, since they just want to adjust availability
of THP sizes, but the whole system suddenly begins to be busy.
Have you experimented with it?

>
> An alternative solution would be to change PAGE_BLOCK_ORDER by changing
> ARCH_FORCE_MAX_ORDER to a lower value for ARM64_64K_PAGES. However, this

PAGE_BLOCK_ORDER can be changed in Kconfig without changing
ARCH_FORCE_MAX_ORDER by Juan’s recent patch[1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything&id=e13e7922d03439e374c263049af5f740ceae6346

> is not dynamic with hugepage size, will need different kernel builds for
> different hugepage sizes and most users won't know that this needs to be
> done as it can be difficult to detmermine that the performance and latency
> issues are coming from the high watermark values.
>
> All watermark numbers are for zones of nodes that had the highest number
> of pages, i.e. the value for min size for 4K is obtained using:
> cat /proc/zoneinfo  | grep -i min | awk '{print $2}' | sort -n  | tail -n 1 | awk '{print $1 * 4096 / 1024 / 1024}';
> and for 64K using:
> cat /proc/zoneinfo  | grep -i min | awk '{print $2}' | sort -n  | tail -n 1 | awk '{print $1 * 65536 / 1024 / 1024}';
>
> An arbirtary min of 128 pages is used for when no hugepage sizes are set
> enabled.
>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
>  include/linux/huge_mm.h | 25 +++++++++++++++++++++++++
>  mm/khugepaged.c         | 32 ++++++++++++++++++++++++++++----
>  mm/shmem.c              | 29 +++++------------------------
>  3 files changed, 58 insertions(+), 28 deletions(-)
>

Thanks.

Best Regards,
Yan, Zi


  parent reply	other threads:[~2025-06-06 15:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06 14:37 Usama Arif
2025-06-06 15:01 ` Usama Arif
2025-06-06 15:18 ` Zi Yan [this message]
2025-06-06 15:38   ` Usama Arif
2025-06-06 16:10     ` Zi Yan
2025-06-07  8:35       ` Lorenzo Stoakes
2025-06-08  0:04         ` Zi Yan
2025-06-09 11:13       ` Usama Arif
2025-06-09 13:19         ` Zi Yan
2025-06-09 14:11           ` Usama Arif
2025-06-09 14:16             ` Lorenzo Stoakes
2025-06-09 14:37               ` Zi Yan
2025-06-09 14:50                 ` Lorenzo Stoakes
2025-06-09 15:20                   ` Zi Yan
2025-06-09 19:40                     ` Lorenzo Stoakes
2025-06-09 19:49                       ` Zi Yan
2025-06-09 20:03                         ` Usama Arif
2025-06-09 20:24                           ` Zi Yan
2025-06-10 10:41                             ` Usama Arif
2025-06-10 14:03                         ` Lorenzo Stoakes
2025-06-10 14:20                           ` Zi Yan
2025-06-10 15:16                             ` Usama Arif
2025-06-09 15:32             ` Zi Yan
2025-06-06 17:37 ` David Hildenbrand
2025-06-09 11:34   ` Usama Arif
2025-06-09 13:28     ` Zi Yan
2025-06-07  8:18 ` Lorenzo Stoakes
2025-06-07  8:44   ` Lorenzo Stoakes
2025-06-09 12:07   ` Usama Arif
2025-06-09 12:12     ` Usama Arif
2025-06-09 14:58       ` Lorenzo Stoakes
2025-06-09 14:57     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A409F7B3-A901-40F9-A694-DC3FB00B57FE@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jyescas@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox