[PATCH v3 0/3] Add a command line option that enables control of how many threads should be used to allocate huge pages.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Thomas Prescher via B4 Relay <devnull+thomas.prescher.cyberus-technology.de@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Muchun Song <muchun.song@linux.dev>,
	 Andrew Morton <akpm@linux-foundation.org>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-mm@kvack.org,
	Thomas Prescher <thomas.prescher@cyberus-technology.de>
Subject: [PATCH v3 0/3] Add a command line option that enables control of how many threads should be used to allocate huge pages.
Date: Fri, 28 Feb 2025 00:02:09 +0100	[thread overview]
Message-ID: <20250228-hugepage-parameter-v3-0-2628e9b2b5c0@cyberus-technology.de> (raw)

Allocating huge pages can take a very long time on servers
with terabytes of memory even when they are allocated at
boot time where the allocation happens in parallel.

Before this series, the kernel used a hard coded value of 2 threads per
NUMA node for these allocations. This value might have been good
enough in the past but it is not sufficient to fully utilize
newer systems.

This series changes the default so the kernel uses 25% of the available
hardware threads for these allocations. In addition, we allow the user
that wish to micro-optimize the allocation time to override this value
via a new kernel parameter.

We tested this on 2 generations of Xeon CPUs and the results
show a big improvement of the overall allocation time.

+-----------------------+-------+-------+-------+-------+-------+
| threads               |   8   |   16  |   32  |   64  |   128 |
+-----------------------+-------+-------+-------+-------+-------+
| skylake      144 cpus |   44s |   22s |   16s |   19s |   20s |
| cascade lake 192 cpus |   39s |   20s |   11s |   10s |    9s |
+-----------------------+-------+-------+-------+-------+-------+

On skylake, we see an improvment of 2.75x when using 32 threads,
on cascade lake we can get even better at 4.3x when we use
128 threads.

This speedup is quite significant and users of large machines
like these should have the option to make the machines boot
as fast as possible.

Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
---
Changes in v3:
- Fix log message (the name of the parameter was incorrect)
- Link to v2: https://lore.kernel.org/r/20250227-hugepage-parameter-v2-0-7db8c6dc0453@cyberus-technology.de

Changes in v2:
- the default thread value has been changed from 2 threads per node to
  25% of the available hardware threads
- the hugepage_alloc_threads parameter now specifies the total number of
  threads being used instead of threads per node
- the kernel now logs the time needed to allocate the huge pages and the
  number of threads being used
- update the documentation so that users are aware that this does not
  apply to gigantic huge pages
- Link to v1: https://lore.kernel.org/r/20250221-hugepage-parameter-v1-0-fa49a77c87c8@cyberus-technology.de

---
Thomas Prescher (3):
      mm: hugetlb: improve parallel huge page allocation time
      mm: hugetlb: add hugetlb_alloc_threads cmdline option
      mm: hugetlb: log time needed to allocate hugepages

 Documentation/admin-guide/kernel-parameters.txt |  9 ++++
 Documentation/admin-guide/mm/hugetlbpage.rst    | 10 ++++
 mm/hugetlb.c                                    | 67 +++++++++++++++++++------
 3 files changed, 70 insertions(+), 16 deletions(-)
---
base-commit: dd83757f6e686a2188997cb58b5975f744bb7786
change-id: 20250221-hugepage-parameter-e8542fdfc0ae

Best regards,
-- 
Thomas Prescher <thomas.prescher@cyberus-technology.de>

next             reply	other threads:[~2025-02-27 23:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27 23:02 Thomas Prescher via B4 Relay [this message]
2025-02-27 23:02 ` [PATCH v3 1/3] mm: hugetlb: improve parallel huge page allocation time Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 2/3] mm: hugetlb: add hugetlb_alloc_threads cmdline option Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 3/3] mm: hugetlb: log time needed to allocate hugepages Thomas Prescher via B4 Relay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250228-hugepage-parameter-v3-0-2628e9b2b5c0@cyberus-technology.de \
    --to=devnull+thomas.prescher.cyberus-technology.de@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=thomas.prescher@cyberus-technology.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox