From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2C60FC021BB
	for <linux-mm@archiver.kernel.org>; Mon, 24 Feb 2025 17:38:08 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 984E128000C; Mon, 24 Feb 2025 12:38:07 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 934B728000A; Mon, 24 Feb 2025 12:38:07 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7D5FD28000C; Mon, 24 Feb 2025 12:38:07 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 6153728000A
	for <linux-mm@kvack.org>; Mon, 24 Feb 2025 12:38:07 -0500 (EST)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 136581A1376
	for <linux-mm@kvack.org>; Mon, 24 Feb 2025 17:38:07 +0000 (UTC)
X-FDA: 83155546614.25.BFA7B16
Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171])
	by imf04.hostedemail.com (Postfix) with ESMTP id 384FE40006
	for <linux-mm@kvack.org>; Mon, 24 Feb 2025 17:38:05 +0000 (UTC)
Authentication-Results: imf04.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=kGB4gvJu;
	spf=pass (imf04.hostedemail.com: domain of fvdl@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=fvdl@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1740418685;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=m/O0vbrO92duvYr6F7i3L1H8Ny2GEtGfhV4xH02ux58=;
	b=ATbUcuW2Z1lkheYr3ifnDPvurRdQ/mePpc4zt3i2n22trDVBh+4xUDiFAMQxFZm/gEShmr
	Ccz5nz0D/XDWLZgQD1+yirhvHbW3naEB5E7IbdAvTtUQv+hfmyLrF6Qv9hTnr3k6g80uZm
	Zk4r2kh5T9MmbkZZ50Mh2jkMkiLfmFI=
ARC-Authentication-Results: i=1;
	imf04.hostedemail.com;
	dkim=pass header.d=google.com header.s=20230601 header.b=kGB4gvJu;
	spf=pass (imf04.hostedemail.com: domain of fvdl@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=fvdl@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740418685; a=rsa-sha256;
	cv=none;
	b=j6av7Peb7vjLA4pl2jwZEcM0D/NSTvWvAbXqaEzDj7vdh8Ud2aVA8n0XLJAzKdIdIdFfJd
	fovu9g8RKN8zCU7lsov5hZF+T4lJlrexEP7sPQ8UjP8V5CO6dZIlRSfccrvmur40W5PhfP
	nAGX0ijLvQlpoZO737ebuTjTOsGlGxs=
Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-471fbfe8b89so6241cf.0
        for <linux-mm@kvack.org>; Mon, 24 Feb 2025 09:38:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1740418684; x=1741023484; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=m/O0vbrO92duvYr6F7i3L1H8Ny2GEtGfhV4xH02ux58=;
        b=kGB4gvJucnuleMDgF/khs2EkzgCLSt4C3zm+vQ9o3foO+jgbMDVh3h53MHUNkesN7q
         Tv3Yxib6To6qJVyLO1ba2h3vPeOfgXSUFbPJl5VUWUY7AR7lKEwXmzJedsK84VEhVjFe
         R4X3wbmEe6CFKM2SjB+/q/pDVNRwofYtpxOiI73rebCqsh9nstMb07llUpUjFxI87BIm
         bpRGlPGh3AQVuoji9LQyVrGVg/B8j8MlTfLHZXHMaUDNapQYYHyXXkVCKQSAGgG0tH1j
         92V3GJtblFx5WU0H3fUvwPv999Y4oY0qH86eH6YD9J+15GwmJSaXgz/4Qw/BnglPgwCl
         g+zA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1740418684; x=1741023484;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=m/O0vbrO92duvYr6F7i3L1H8Ny2GEtGfhV4xH02ux58=;
        b=dcyNtOJ8VM/Q6eg9YGSm5zsTyE/y/+n2vRvT5RLL9x9BncGwmXKQzJMWa7tEMH7sOS
         QEpPhqgGHaDTWsgF8lrIrgbZmiH+5OfWZKRTI2Ap9/5uEvIQbk7Y+5jGW3YJ0yQ4jedj
         3LMbYe+lHMIuBSRnAEbP45WAP1bg9gUT5QQpUmHKQGK0TGiumUmgt2lvOiT7D54JTB68
         YZ4Iy35HyHm+oSBW/+e6XE12n492rRqyZPGlEALjzbJd1cgaWnz2ew6GpdF1aOdEdEvm
         bVQH3V5fumo7gGcj+EvmGVWBeG+/IgOec9QkCjR6cHl84a91FadtcnkIa83MuICrG2cl
         X8kg==
X-Forwarded-Encrypted: i=1; AJvYcCU3VjkbdtG5SClmGbTMWQ09Nszvs11dNYeLCEhMgF6yPJVKbEP7oIYrsxzu+rJdwvSO+PWs733M+A==@kvack.org
X-Gm-Message-State: AOJu0YyIdjbGOotMEjhbJxzMhGx4uA/NOCuO9LbSNPV4ujbElnwzpjYc
	omw6wD+Rrr2JCGRkJ81gzDGTvaBJ9BuZKwQBf4XkgK9h/aBranaN4YZ6UN6T24MQUXjazNw2DB6
	XsrWgx8C1bMc1/SYbUD+u9PcU5sFpNqxLYgGDSc1mD4tZlbF54A==
X-Gm-Gg: ASbGncvFkblw7lFaEKungwhLA1vQKhZ24r0MbWhn/+kov1lpuNtCPJrI+A1ZxsY89U8
	lKI3SzrBmfL4Sdye4OS3tCXGalDwq3T35pf7+1f1/DqzG0yctS3DChwJI7UQB9YJYcmNOxsFxcY
	J1l//B
X-Google-Smtp-Source: AGHT+IH3iTdeqfq4jur2rjKozlveokHS0xXgOXYk6eu+6erAuMQ+D8KPGLkRByg6hsOrf2mAtKqTssMrL/1wfcTtqtE=
X-Received: by 2002:a05:622a:13ca:b0:466:861a:f633 with SMTP id
 d75a77b69052e-47376e5d9bdmr232651cf.5.1740418684082; Mon, 24 Feb 2025
 09:38:04 -0800 (PST)
MIME-Version: 1.0
References: <20250221-hugepage-parameter-v1-0-fa49a77c87c8@cyberus-technology.de>
 <20250221-hugepage-parameter-v1-1-fa49a77c87c8@cyberus-technology.de>
In-Reply-To: <20250221-hugepage-parameter-v1-1-fa49a77c87c8@cyberus-technology.de>
From: Frank van der Linden <fvdl@google.com>
Date: Mon, 24 Feb 2025 09:37:52 -0800
X-Gm-Features: AQ5f1Jo9ImB_9IaphJKYV-I3Hv0ey8xUJZJhhhQp9XGv1YVwKjURQJn1eQch93I
Message-ID: <CAPTztWaRDD9v+-yk_DJAb9FzZDyQF93B_BawxcmSSUitRp1a5w@mail.gmail.com>
Subject: Re: [PATCH 1/2] mm: hugetlb: add hugetlb_alloc_threads cmdline option
To: thomas.prescher@cyberus-technology.de
Cc: Jonathan Corbet <corbet@lwn.net>, Muchun Song <muchun.song@linux.dev>, 
	Andrew Morton <akpm@linux-foundation.org>, linux-doc@vger.kernel.org, 
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 384FE40006
X-Stat-Signature: d667kx4hiugz5gn9or5jinna5zcb8rhr
X-Rspam-User: 
X-Rspamd-Server: rspam01
X-HE-Tag: 1740418685-686476
X-HE-Meta: U2FsdGVkX1/akRCYh+cLSlvl+HD0EfZC94kv1y5pAafBqX4FDUAp+W06YkUknXWNJG9Y0JFWtsn5cHdhmwUOi3Pkbe+FlRWetlKzddP3SSSPfRbeYNUHj+DcNeSi7FFoPOs12KRbtz91J0QZRevrqSZuVc6/NPMopW1f4Tt5ESmjuQWkhnqCrwygmkanZxt9lh6hVr57zyCDpCv7RjCLHyohQExzDGtC3CVOhgkKZQCXPT2ixxT85MVUAGTAflw5cMywRQdt8obfLBSe8lhltQ3RWwxGPvB2JsqaKWavZYb7qqtR3qOsqrV2g3UnrHdX43I3dzNJdFOhHqLAYKpIkGwGq459Myvc5HoVzsOzJNalzVYDkIrdz6VQV1ucN8WUesJbf+k7rQ847i/368wkrwst2G58SZmfSiUYGBsdjOIECMO03NmPqSILp6/HmkCODPqk5eW4mCFmPCKbEpMhRVZrt8O4HybE5E+sQ0h9GpdoYb6FwSAT/Iecy6AyRhYdlp3A44zdwkiCib6w1sNXvRVQXL8rsMnruZjZVJugTtOCCUuXWWMYqwjK1VBlyv1gJrRmesvWB5sTfqL7IXjHOlIZt8KDLNqBb2WPCt3+uyRJx+f9VJHSCY56yyElE/PWQiapwqAKxbBGuymYDwrOTuQnxUBMF1x41kfZdUbN8tnd7LITTDN4vTGi6zDahjKFga9HiTJ1bAQmAYEgxLOEgy+KisBeYD0KBYNelFDYCZ+wCF3KKuljF3xjCK549gDyXwQkMxPy+FwNLn/+kKaoMIFT+7w8yu5KYUTQ1knpkmK/sLpWktnMnAEpsYqf1NLzgJ0NiBtoayqEZGiQWaBQhp4WLzmDxplCOTNjeRV0y2S7nODMGhTUm6g7TNh36Xbga7YM3eXlfWdg05IpNYeY1UOdNUYS3en+MJhZICIIcyOHNxkv8p2nCtjwyS0LrU2ZfRIq/KFojCJ/k1KLQV6
 W3r0Ps/X
 1Le6rxJsscp/omtQXOJAe7RQnsGvcN9/IUHE2yDIclwhglZJTO8o5U+h1DOZyAt6j9luLuSD6UryJRGk7M0nLNNmIXjcglc/EPinDgjVoyOMcUgZCq0bFDZHbf47mhd9W4FhRIfWQ6Qa7AzyWDNyHsohg2RSA9vutaJCop3Fh0RcQ4kquDE3UWAFiYpARA5I+ah21RlsjqHU+n/c0YUAxdUwaM7ajPlpNUS+Yur2pUDg7ksLGZc958mXQ1TPfHO57f/kfs6AdgSc+aWc=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Fri, Feb 21, 2025 at 5:49=E2=80=AFAM Thomas Prescher via B4 Relay
<devnull+thomas.prescher.cyberus-technology.de@kernel.org> wrote:
>
> From: Thomas Prescher <thomas.prescher@cyberus-technology.de>
>
> Add a command line option that enables control of how many
> threads per NUMA node should be used to allocate huge pages.
>
> Allocating huge pages can take a very long time on servers
> with terabytes of memory even when they are allocated at
> boot time where the allocation happens in parallel.
>
> The kernel currently uses a hard coded value of 2 threads per
> NUMA node for these allocations.
>
> This patch allows to override this value.
>
> Signed-off-by: Thomas Prescher <thomas.prescher@cyberus-technology.de>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  7 ++++
>  Documentation/admin-guide/mm/hugetlbpage.rst    |  9 ++++-
>  mm/hugetlb.c                                    | 50 +++++++++++++++++--=
------
>  3 files changed, 49 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentat=
ion/admin-guide/kernel-parameters.txt
> index fb8752b42ec8582b8750d7e014c4d76166fa2fc1..812064542fdb0a5c0ff7587aa=
aba8da81dc234a9 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1882,6 +1882,13 @@
>                         Documentation/admin-guide/mm/hugetlbpage.rst.
>                         Format: size[KMG]
>
> +       hugepage_alloc_threads=3D
> +                       [HW] The number of threads per NUMA node that sho=
uld
> +                       be used to allocate hugepages during boot.
> +                       This option can be used to improve system bootup =
time
> +                       when allocating a large amount of huge pages.
> +                       The default value is 2 threads per NUMA node.
> +
>         hugetlb_cma=3D    [HW,CMA,EARLY] The size of a CMA area used for =
allocation
>                         of gigantic hugepages. Or using node format, the =
size
>                         of a CMA area per node can be specified.
> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation=
/admin-guide/mm/hugetlbpage.rst
> index f34a0d798d5b533f30add99a34f66ba4e1c496a3..c88461be0f66887d532ac4ef2=
0e3a61dfd396be7 100644
> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> @@ -145,7 +145,14 @@ hugepages
>
>         It will allocate 1 2M hugepage on node0 and 2 2M hugepages on nod=
e1.
>         If the node number is invalid,  the parameter will be ignored.
> -
> +hugepage_alloc_threads
> +       Specify the number of threads per NUMA node that should be used t=
o
> +       allocate hugepages during boot. This parameter can be used to imp=
rove
> +       system bootup time when allocating a large amount of huge pages.
> +       The default value is 2 threads per NUMA node. Example to use 8 th=
reads
> +       per NUMA node::
> +
> +               hugepage_alloc_threads=3D8
>  default_hugepagesz
>         Specify the default huge page size.  This parameter can
>         only be specified once on the command line.  default_hugepagesz c=
an
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 163190e89ea16450026496c020b544877db147d1..b7d24c41e0f9d22f5b86c253e=
29a2eca28460026 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -68,6 +68,7 @@ static unsigned long __initdata default_hstate_max_huge=
_pages;
>  static bool __initdata parsed_valid_hugepagesz =3D true;
>  static bool __initdata parsed_default_hugepagesz;
>  static unsigned int default_hugepages_in_node[MAX_NUMNODES] __initdata;
> +static unsigned long allocation_threads_per_node __initdata =3D 2;
>
>  /*
>   * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_=
pages,
> @@ -3432,26 +3433,23 @@ static unsigned long __init hugetlb_pages_alloc_b=
oot(struct hstate *h)
>         job.size        =3D h->max_huge_pages;
>
>         /*
> -        * job.max_threads is twice the num_node_state(N_MEMORY),
> +        * job.max_threads is twice the num_node_state(N_MEMORY) by defau=
lt.
>          *
> -        * Tests below indicate that a multiplier of 2 significantly impr=
oves
> -        * performance, and although larger values also provide improveme=
nts,
> -        * the gains are marginal.
> +        * On large servers with terabytes of memory, huge page allocatio=
n
> +        * can consume a considerably amount of time.
>          *
> -        * Therefore, choosing 2 as the multiplier strikes a good balance=
 between
> -        * enhancing parallel processing capabilities and maintaining eff=
icient
> -        * resource management.
> +        * Tests below show how long it takes to allocate 1 TiB of memory=
 with 2MiB huge pages.
> +        * 2MiB huge pages. Using more threads can significantly improve =
allocation time.
>          *
> -        * +------------+-------+-------+-------+-------+-------+
> -        * | multiplier |   1   |   2   |   3   |   4   |   5   |
> -        * +------------+-------+-------+-------+-------+-------+
> -        * | 256G 2node | 358ms | 215ms | 157ms | 134ms | 126ms |
> -        * | 2T   4node | 979ms | 679ms | 543ms | 489ms | 481ms |
> -        * | 50G  2node | 71ms  | 44ms  | 37ms  | 30ms  | 31ms  |
> -        * +------------+-------+-------+-------+-------+-------+
> +        * +--------------------+-------+-------+-------+-------+-------+
> +        * | threads per node   |   2   |   4   |   8   |   16  |    32 |
> +        * +--------------------+-------+-------+-------+-------+-------+
> +        * | skylake 4node      |   44s |   22s |   16s |   19s |   20s |
> +        * | cascade lake 4node |   39s |   20s |   11s |   10s |    9s |
> +        * +--------------------+-------+-------+-------+-------+-------+
>          */
> -       job.max_threads =3D num_node_state(N_MEMORY) * 2;
> -       job.min_chunk   =3D h->max_huge_pages / num_node_state(N_MEMORY) =
/ 2;
> +       job.max_threads =3D num_node_state(N_MEMORY) * allocation_threads=
_per_node;
> +       job.min_chunk   =3D h->max_huge_pages / num_node_state(N_MEMORY) =
/ allocation_threads_per_node;
>         padata_do_multithreaded(&job);
>
>         return h->nr_huge_pages;
> @@ -4764,6 +4762,26 @@ static int __init default_hugepagesz_setup(char *s=
)
>  }
>  __setup("default_hugepagesz=3D", default_hugepagesz_setup);
>
> +/* hugepage_alloc_threads command line parsing
> + * When set, use this specific number of threads per NUMA node for the b=
oot
> + * allocation of hugepages.
> + */
> +static int __init hugepage_alloc_threads_setup(char *s)
> +{
> +       unsigned long threads_per_node;
> +
> +       if (kstrtoul(s, 0, &threads_per_node) !=3D 0)
> +               return 1;
> +
> +       if (threads_per_node =3D=3D 0)
> +               return 1;
> +
> +       allocation_threads_per_node =3D threads_per_node;
> +
> +       return 1;
> +}
> +__setup("hugepage_alloc_threads=3D", hugepage_alloc_threads_setup);
> +
>  static unsigned int allowed_mems_nr(struct hstate *h)
>  {
>         int node;
>
> --
> 2.48.1
>
>
>

Maybe mention that this does not apply to 'gigantic' hugepages (e.g.
hugetlb pages of an order > MAX_PAGE_ORDER). Those are allocated
earlier in boot by memblock, in a single-threaded environment.

Not your fault that this distinction between these types of hugetlb
pages isn't clear in the Docs, of course. Only hugetlb_cma mentions
that it is for gigantic pages. But it's probably best to mention that
the threads parameter is for non-gigantic hugetlb pages only.

- Frank