* [PATCH v3 1/3] mm: hugetlb: improve parallel huge page allocation time
2025-02-27 23:02 [PATCH v3 0/3] Add a command line option that enables control of how many threads should be used to allocate huge pages Thomas Prescher via B4 Relay
@ 2025-02-27 23:02 ` Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 2/3] mm: hugetlb: add hugetlb_alloc_threads cmdline option Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 3/3] mm: hugetlb: log time needed to allocate hugepages Thomas Prescher via B4 Relay
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Prescher via B4 Relay @ 2025-02-27 23:02 UTC (permalink / raw)
To: Jonathan Corbet, Muchun Song, Andrew Morton
Cc: linux-doc, linux-kernel, linux-mm, Thomas Prescher
From: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Before this patch, the kernel currently used a hard coded
value of 2 threads per NUMA node for these allocations.
This patch changes this policy and the kernel now uses 25%
of the available hardware threads for the allocations.
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
---
mm/hugetlb.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea16450026496c020b544877db147d1..e9b1b3e2b9d467f067d54359e1401a03f9926108 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -14,9 +14,11 @@
#include <linux/pagemap.h>
#include <linux/mempolicy.h>
#include <linux/compiler.h>
+#include <linux/cpumask.h>
#include <linux/cpuset.h>
#include <linux/mutex.h>
#include <linux/memblock.h>
+#include <linux/minmax.h>
#include <linux/sysfs.h>
#include <linux/slab.h>
#include <linux/sched/mm.h>
@@ -3427,31 +3429,31 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
.numa_aware = true
};
+ unsigned int num_allocation_threads = max(num_online_cpus() / 4, 1);
+
job.thread_fn = hugetlb_pages_alloc_boot_node;
job.start = 0;
job.size = h->max_huge_pages;
/*
- * job.max_threads is twice the num_node_state(N_MEMORY),
+ * job.max_threads is 25% of the available cpu threads by default.
*
- * Tests below indicate that a multiplier of 2 significantly improves
- * performance, and although larger values also provide improvements,
- * the gains are marginal.
+ * On large servers with terabytes of memory, huge page allocation
+ * can consume a considerably amount of time.
*
- * Therefore, choosing 2 as the multiplier strikes a good balance between
- * enhancing parallel processing capabilities and maintaining efficient
- * resource management.
+ * Tests below show how long it takes to allocate 1 TiB of memory with 2MiB huge pages.
+ * 2MiB huge pages. Using more threads can significantly improve allocation time.
*
- * +------------+-------+-------+-------+-------+-------+
- * | multiplier | 1 | 2 | 3 | 4 | 5 |
- * +------------+-------+-------+-------+-------+-------+
- * | 256G 2node | 358ms | 215ms | 157ms | 134ms | 126ms |
- * | 2T 4node | 979ms | 679ms | 543ms | 489ms | 481ms |
- * | 50G 2node | 71ms | 44ms | 37ms | 30ms | 31ms |
- * +------------+-------+-------+-------+-------+-------+
+ * +-----------------------+-------+-------+-------+-------+-------+
+ * | threads | 8 | 16 | 32 | 64 | 128 |
+ * +-----------------------+-------+-------+-------+-------+-------+
+ * | skylake 144 cpus | 44s | 22s | 16s | 19s | 20s |
+ * | cascade lake 192 cpus | 39s | 20s | 11s | 10s | 9s |
+ * +-----------------------+-------+-------+-------+-------+-------+
*/
- job.max_threads = num_node_state(N_MEMORY) * 2;
- job.min_chunk = h->max_huge_pages / num_node_state(N_MEMORY) / 2;
+
+ job.max_threads = num_allocation_threads;
+ job.min_chunk = h->max_huge_pages / num_allocation_threads;
padata_do_multithreaded(&job);
return h->nr_huge_pages;
--
2.48.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 2/3] mm: hugetlb: add hugetlb_alloc_threads cmdline option
2025-02-27 23:02 [PATCH v3 0/3] Add a command line option that enables control of how many threads should be used to allocate huge pages Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 1/3] mm: hugetlb: improve parallel huge page allocation time Thomas Prescher via B4 Relay
@ 2025-02-27 23:02 ` Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 3/3] mm: hugetlb: log time needed to allocate hugepages Thomas Prescher via B4 Relay
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Prescher via B4 Relay @ 2025-02-27 23:02 UTC (permalink / raw)
To: Jonathan Corbet, Muchun Song, Andrew Morton
Cc: linux-doc, linux-kernel, linux-mm, Thomas Prescher
From: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Add a command line option that enables control of how many
threads should be used to allocate huge pages.
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
---
Documentation/admin-guide/kernel-parameters.txt | 9 +++++++
Documentation/admin-guide/mm/hugetlbpage.rst | 10 ++++++++
mm/hugetlb.c | 31 +++++++++++++++++++++----
3 files changed, 46 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb8752b42ec8582b8750d7e014c4d76166fa2fc1..1937ee02c1f883ecd910bab33cdb9194bddbd9b1 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1882,6 +1882,15 @@
Documentation/admin-guide/mm/hugetlbpage.rst.
Format: size[KMG]
+ hugepage_alloc_threads=
+ [HW] The number of threads that should be used to
+ allocate hugepages during boot. This option can be
+ used to improve system bootup time when allocating
+ a large amount of huge pages.
+ The default value is 25% of the available hardware threads.
+
+ Note that this parameter only applies to non-gigantic huge pages.
+
hugetlb_cma= [HW,CMA,EARLY] The size of a CMA area used for allocation
of gigantic hugepages. Or using node format, the size
of a CMA area per node can be specified.
diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index f34a0d798d5b533f30add99a34f66ba4e1c496a3..67a941903fd2231e6c082cffb4c9179ee094b208 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -145,7 +145,17 @@ hugepages
It will allocate 1 2M hugepage on node0 and 2 2M hugepages on node1.
If the node number is invalid, the parameter will be ignored.
+hugepage_alloc_threads
+ Specify the number of threads that should be used to allocate hugepages
+ during boot. This parameter can be used to improve system bootup time
+ when allocating a large amount of huge pages.
+ The default value is 25% of the available hardware threads.
+ Example to use 8 allocation threads::
+
+ hugepage_alloc_threads=8
+
+ Note that this parameter only applies to non-gigantic huge pages.
default_hugepagesz
Specify the default huge page size. This parameter can
only be specified once on the command line. default_hugepagesz can
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e9b1b3e2b9d467f067d54359e1401a03f9926108..98dbfa18bee01d01b40cc7c650cd3eca5eae2457 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -70,6 +70,7 @@ static unsigned long __initdata default_hstate_max_huge_pages;
static bool __initdata parsed_valid_hugepagesz = true;
static bool __initdata parsed_default_hugepagesz;
static unsigned int default_hugepages_in_node[MAX_NUMNODES] __initdata;
+static unsigned long hugepage_allocation_threads __initdata;
/*
* Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_pages,
@@ -3429,8 +3430,6 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
.numa_aware = true
};
- unsigned int num_allocation_threads = max(num_online_cpus() / 4, 1);
-
job.thread_fn = hugetlb_pages_alloc_boot_node;
job.start = 0;
job.size = h->max_huge_pages;
@@ -3451,9 +3450,13 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
* | cascade lake 192 cpus | 39s | 20s | 11s | 10s | 9s |
* +-----------------------+-------+-------+-------+-------+-------+
*/
+ if (hugepage_allocation_threads == 0) {
+ hugepage_allocation_threads = num_online_cpus() / 4;
+ hugepage_allocation_threads = max(hugepage_allocation_threads, 1);
+ }
- job.max_threads = num_allocation_threads;
- job.min_chunk = h->max_huge_pages / num_allocation_threads;
+ job.max_threads = hugepage_allocation_threads;
+ job.min_chunk = h->max_huge_pages / hugepage_allocation_threads;
padata_do_multithreaded(&job);
return h->nr_huge_pages;
@@ -4766,6 +4769,26 @@ static int __init default_hugepagesz_setup(char *s)
}
__setup("default_hugepagesz=", default_hugepagesz_setup);
+/* hugepage_alloc_threads command line parsing
+ * When set, use this specific number of threads for the boot
+ * allocation of hugepages.
+ */
+static int __init hugepage_alloc_threads_setup(char *s)
+{
+ unsigned long allocation_threads;
+
+ if (kstrtoul(s, 0, &allocation_threads) != 0)
+ return 1;
+
+ if (allocation_threads == 0)
+ return 1;
+
+ hugepage_allocation_threads = allocation_threads;
+
+ return 1;
+}
+__setup("hugepage_alloc_threads=", hugepage_alloc_threads_setup);
+
static unsigned int allowed_mems_nr(struct hstate *h)
{
int node;
--
2.48.1
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v3 3/3] mm: hugetlb: log time needed to allocate hugepages
2025-02-27 23:02 [PATCH v3 0/3] Add a command line option that enables control of how many threads should be used to allocate huge pages Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 1/3] mm: hugetlb: improve parallel huge page allocation time Thomas Prescher via B4 Relay
2025-02-27 23:02 ` [PATCH v3 2/3] mm: hugetlb: add hugetlb_alloc_threads cmdline option Thomas Prescher via B4 Relay
@ 2025-02-27 23:02 ` Thomas Prescher via B4 Relay
2 siblings, 0 replies; 4+ messages in thread
From: Thomas Prescher via B4 Relay @ 2025-02-27 23:02 UTC (permalink / raw)
To: Jonathan Corbet, Muchun Song, Andrew Morton
Cc: linux-doc, linux-kernel, linux-mm, Thomas Prescher
From: Thomas Prescher <thomas.prescher@cyberus-technology.de>
Having this information allows users to easily tune
the hugepages_node_threads parameter.
Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
---
mm/hugetlb.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 98dbfa18bee01d01b40cc7c650cd3eca5eae2457..816e5846222a54255b99515a94e0c1ba9b2b7b27 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3430,6 +3430,9 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
.numa_aware = true
};
+ unsigned long jiffies_start;
+ unsigned long jiffies_end;
+
job.thread_fn = hugetlb_pages_alloc_boot_node;
job.start = 0;
job.size = h->max_huge_pages;
@@ -3457,7 +3460,14 @@ static unsigned long __init hugetlb_pages_alloc_boot(struct hstate *h)
job.max_threads = hugepage_allocation_threads;
job.min_chunk = h->max_huge_pages / hugepage_allocation_threads;
+
+ jiffies_start = jiffies;
padata_do_multithreaded(&job);
+ jiffies_end = jiffies;
+
+ pr_info("HugeTLB: allocation took %dms with hugepage_alloc_threads=%ld\n",
+ jiffies_to_msecs(jiffies_end - jiffies_start),
+ hugepage_allocation_threads);
return h->nr_huge_pages;
}
--
2.48.1
^ permalink raw reply [flat|nested] 4+ messages in thread