linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add a command line option that enables control of how many threads per NUMA node should be used to allocate huge pages.
@ 2025-02-21 13:49 Thomas Prescher via B4 Relay
  2025-02-21 13:49 ` [PATCH 1/2] mm: hugetlb: add hugetlb_alloc_threads cmdline option Thomas Prescher via B4 Relay
  2025-02-21 13:49 ` [PATCH 2/2] mm: hugetlb: log time needed to allocate hugepages Thomas Prescher via B4 Relay
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Prescher via B4 Relay @ 2025-02-21 13:49 UTC (permalink / raw)
  To: Jonathan Corbet, Muchun Song, Andrew Morton
  Cc: linux-doc, linux-kernel, linux-mm, Thomas Prescher

Allocating huge pages can take a very long time on servers
with terabytes of memory even when they are allocated at
boot time where the allocation happens in parallel.

The kernel currently uses a hard coded value of 2 threads per
NUMA node for these allocations. This value might have been good
enough in the past but it is not sufficient to fully utilize
newer systems.

This patch allows to override this value.

We tested this on 2 generations of Xeon CPUs and the results
show a big improvement of the overall allocation time.

+--------------------+-------+-------+-------+-------+-------+
| threads per node   |   2   |   4   |   8   |   16  |    32 |
+--------------------+-------+-------+-------+-------+-------+
| skylake 4node      |   44s |   22s |   16s |   19s |   20s |
| cascade lake 4node |   39s |   20s |   11s |   10s |    9s |
+--------------------+-------+-------+-------+-------+-------+

On skylake, we see an improvment of 2.75x when using 8 threads,
on cascade lake we can get even better at 4.3x when we use
32 threads per node.

This speedup is quite significant and users of large machines
like these should have the option to make the machines boot
as fast as possible.

Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
---
Thomas Prescher (2):
      mm: hugetlb: add hugetlb_alloc_threads cmdline option
      mm: hugetlb: log time needed to allocate hugepages

 Documentation/admin-guide/kernel-parameters.txt |  7 +++
 Documentation/admin-guide/mm/hugetlbpage.rst    |  9 +++-
 mm/hugetlb.c                                    | 59 ++++++++++++++++++-------
 3 files changed, 58 insertions(+), 17 deletions(-)
---
base-commit: 334426094588f8179fe175a09ecc887ff0c75758
change-id: 20250221-hugepage-parameter-e8542fdfc0ae

Best regards,
-- 
Thomas Prescher <thomas.prescher@cyberus-technology.de>




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-02-25 13:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-21 13:49 [PATCH 0/2] Add a command line option that enables control of how many threads per NUMA node should be used to allocate huge pages Thomas Prescher via B4 Relay
2025-02-21 13:49 ` [PATCH 1/2] mm: hugetlb: add hugetlb_alloc_threads cmdline option Thomas Prescher via B4 Relay
2025-02-21 13:52   ` Matthew Wilcox
2025-02-21 14:16     ` Thomas Prescher
2025-02-23  2:46       ` Andrew Morton
2025-02-24 10:42         ` Thomas Prescher
2025-02-24 17:37   ` Frank van der Linden
2025-02-25 13:01     ` Thomas Prescher
2025-02-21 13:49 ` [PATCH 2/2] mm: hugetlb: log time needed to allocate hugepages Thomas Prescher via B4 Relay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox