From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C60FC021BB for ; Mon, 24 Feb 2025 17:38:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 984E128000C; Mon, 24 Feb 2025 12:38:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 934B728000A; Mon, 24 Feb 2025 12:38:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D5FD28000C; Mon, 24 Feb 2025 12:38:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6153728000A for ; Mon, 24 Feb 2025 12:38:07 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 136581A1376 for ; Mon, 24 Feb 2025 17:38:07 +0000 (UTC) X-FDA: 83155546614.25.BFA7B16 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf04.hostedemail.com (Postfix) with ESMTP id 384FE40006 for ; Mon, 24 Feb 2025 17:38:05 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kGB4gvJu; spf=pass (imf04.hostedemail.com: domain of fvdl@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740418685; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=m/O0vbrO92duvYr6F7i3L1H8Ny2GEtGfhV4xH02ux58=; b=ATbUcuW2Z1lkheYr3ifnDPvurRdQ/mePpc4zt3i2n22trDVBh+4xUDiFAMQxFZm/gEShmr Ccz5nz0D/XDWLZgQD1+yirhvHbW3naEB5E7IbdAvTtUQv+hfmyLrF6Qv9hTnr3k6g80uZm Zk4r2kh5T9MmbkZZ50Mh2jkMkiLfmFI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kGB4gvJu; spf=pass (imf04.hostedemail.com: domain of fvdl@google.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740418685; a=rsa-sha256; cv=none; b=j6av7Peb7vjLA4pl2jwZEcM0D/NSTvWvAbXqaEzDj7vdh8Ud2aVA8n0XLJAzKdIdIdFfJd fovu9g8RKN8zCU7lsov5hZF+T4lJlrexEP7sPQ8UjP8V5CO6dZIlRSfccrvmur40W5PhfP nAGX0ijLvQlpoZO737ebuTjTOsGlGxs= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-471fbfe8b89so6241cf.0 for ; Mon, 24 Feb 2025 09:38:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740418684; x=1741023484; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=m/O0vbrO92duvYr6F7i3L1H8Ny2GEtGfhV4xH02ux58=; b=kGB4gvJucnuleMDgF/khs2EkzgCLSt4C3zm+vQ9o3foO+jgbMDVh3h53MHUNkesN7q Tv3Yxib6To6qJVyLO1ba2h3vPeOfgXSUFbPJl5VUWUY7AR7lKEwXmzJedsK84VEhVjFe R4X3wbmEe6CFKM2SjB+/q/pDVNRwofYtpxOiI73rebCqsh9nstMb07llUpUjFxI87BIm bpRGlPGh3AQVuoji9LQyVrGVg/B8j8MlTfLHZXHMaUDNapQYYHyXXkVCKQSAGgG0tH1j 92V3GJtblFx5WU0H3fUvwPv999Y4oY0qH86eH6YD9J+15GwmJSaXgz/4Qw/BnglPgwCl g+zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740418684; x=1741023484; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m/O0vbrO92duvYr6F7i3L1H8Ny2GEtGfhV4xH02ux58=; b=dcyNtOJ8VM/Q6eg9YGSm5zsTyE/y/+n2vRvT5RLL9x9BncGwmXKQzJMWa7tEMH7sOS QEpPhqgGHaDTWsgF8lrIrgbZmiH+5OfWZKRTI2Ap9/5uEvIQbk7Y+5jGW3YJ0yQ4jedj 3LMbYe+lHMIuBSRnAEbP45WAP1bg9gUT5QQpUmHKQGK0TGiumUmgt2lvOiT7D54JTB68 YZ4Iy35HyHm+oSBW/+e6XE12n492rRqyZPGlEALjzbJd1cgaWnz2ew6GpdF1aOdEdEvm bVQH3V5fumo7gGcj+EvmGVWBeG+/IgOec9QkCjR6cHl84a91FadtcnkIa83MuICrG2cl X8kg== X-Forwarded-Encrypted: i=1; AJvYcCU3VjkbdtG5SClmGbTMWQ09Nszvs11dNYeLCEhMgF6yPJVKbEP7oIYrsxzu+rJdwvSO+PWs733M+A==@kvack.org X-Gm-Message-State: AOJu0YyIdjbGOotMEjhbJxzMhGx4uA/NOCuO9LbSNPV4ujbElnwzpjYc omw6wD+Rrr2JCGRkJ81gzDGTvaBJ9BuZKwQBf4XkgK9h/aBranaN4YZ6UN6T24MQUXjazNw2DB6 XsrWgx8C1bMc1/SYbUD+u9PcU5sFpNqxLYgGDSc1mD4tZlbF54A== X-Gm-Gg: ASbGncvFkblw7lFaEKungwhLA1vQKhZ24r0MbWhn/+kov1lpuNtCPJrI+A1ZxsY89U8 lKI3SzrBmfL4Sdye4OS3tCXGalDwq3T35pf7+1f1/DqzG0yctS3DChwJI7UQB9YJYcmNOxsFxcY J1l//B X-Google-Smtp-Source: AGHT+IH3iTdeqfq4jur2rjKozlveokHS0xXgOXYk6eu+6erAuMQ+D8KPGLkRByg6hsOrf2mAtKqTssMrL/1wfcTtqtE= X-Received: by 2002:a05:622a:13ca:b0:466:861a:f633 with SMTP id d75a77b69052e-47376e5d9bdmr232651cf.5.1740418684082; Mon, 24 Feb 2025 09:38:04 -0800 (PST) MIME-Version: 1.0 References: <20250221-hugepage-parameter-v1-0-fa49a77c87c8@cyberus-technology.de> <20250221-hugepage-parameter-v1-1-fa49a77c87c8@cyberus-technology.de> In-Reply-To: <20250221-hugepage-parameter-v1-1-fa49a77c87c8@cyberus-technology.de> From: Frank van der Linden Date: Mon, 24 Feb 2025 09:37:52 -0800 X-Gm-Features: AQ5f1Jo9ImB_9IaphJKYV-I3Hv0ey8xUJZJhhhQp9XGv1YVwKjURQJn1eQch93I Message-ID: Subject: Re: [PATCH 1/2] mm: hugetlb: add hugetlb_alloc_threads cmdline option To: thomas.prescher@cyberus-technology.de Cc: Jonathan Corbet , Muchun Song , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 384FE40006 X-Stat-Signature: d667kx4hiugz5gn9or5jinna5zcb8rhr X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740418685-686476 X-HE-Meta: U2FsdGVkX1/akRCYh+cLSlvl+HD0EfZC94kv1y5pAafBqX4FDUAp+W06YkUknXWNJG9Y0JFWtsn5cHdhmwUOi3Pkbe+FlRWetlKzddP3SSSPfRbeYNUHj+DcNeSi7FFoPOs12KRbtz91J0QZRevrqSZuVc6/NPMopW1f4Tt5ESmjuQWkhnqCrwygmkanZxt9lh6hVr57zyCDpCv7RjCLHyohQExzDGtC3CVOhgkKZQCXPT2ixxT85MVUAGTAflw5cMywRQdt8obfLBSe8lhltQ3RWwxGPvB2JsqaKWavZYb7qqtR3qOsqrV2g3UnrHdX43I3dzNJdFOhHqLAYKpIkGwGq459Myvc5HoVzsOzJNalzVYDkIrdz6VQV1ucN8WUesJbf+k7rQ847i/368wkrwst2G58SZmfSiUYGBsdjOIECMO03NmPqSILp6/HmkCODPqk5eW4mCFmPCKbEpMhRVZrt8O4HybE5E+sQ0h9GpdoYb6FwSAT/Iecy6AyRhYdlp3A44zdwkiCib6w1sNXvRVQXL8rsMnruZjZVJugTtOCCUuXWWMYqwjK1VBlyv1gJrRmesvWB5sTfqL7IXjHOlIZt8KDLNqBb2WPCt3+uyRJx+f9VJHSCY56yyElE/PWQiapwqAKxbBGuymYDwrOTuQnxUBMF1x41kfZdUbN8tnd7LITTDN4vTGi6zDahjKFga9HiTJ1bAQmAYEgxLOEgy+KisBeYD0KBYNelFDYCZ+wCF3KKuljF3xjCK549gDyXwQkMxPy+FwNLn/+kKaoMIFT+7w8yu5KYUTQ1knpkmK/sLpWktnMnAEpsYqf1NLzgJ0NiBtoayqEZGiQWaBQhp4WLzmDxplCOTNjeRV0y2S7nODMGhTUm6g7TNh36Xbga7YM3eXlfWdg05IpNYeY1UOdNUYS3en+MJhZICIIcyOHNxkv8p2nCtjwyS0LrU2ZfRIq/KFojCJ/k1KLQV6 W3r0Ps/X 1Le6rxJsscp/omtQXOJAe7RQnsGvcN9/IUHE2yDIclwhglZJTO8o5U+h1DOZyAt6j9luLuSD6UryJRGk7M0nLNNmIXjcglc/EPinDgjVoyOMcUgZCq0bFDZHbf47mhd9W4FhRIfWQ6Qa7AzyWDNyHsohg2RSA9vutaJCop3Fh0RcQ4kquDE3UWAFiYpARA5I+ah21RlsjqHU+n/c0YUAxdUwaM7ajPlpNUS+Yur2pUDg7ksLGZc958mXQ1TPfHO57f/kfs6AdgSc+aWc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 21, 2025 at 5:49=E2=80=AFAM Thomas Prescher via B4 Relay wrote: > > From: Thomas Prescher > > Add a command line option that enables control of how many > threads per NUMA node should be used to allocate huge pages. > > Allocating huge pages can take a very long time on servers > with terabytes of memory even when they are allocated at > boot time where the allocation happens in parallel. > > The kernel currently uses a hard coded value of 2 threads per > NUMA node for these allocations. > > This patch allows to override this value. > > Signed-off-by: Thomas Prescher > --- > Documentation/admin-guide/kernel-parameters.txt | 7 ++++ > Documentation/admin-guide/mm/hugetlbpage.rst | 9 ++++- > mm/hugetlb.c | 50 +++++++++++++++++--= ------ > 3 files changed, 49 insertions(+), 17 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentat= ion/admin-guide/kernel-parameters.txt > index fb8752b42ec8582b8750d7e014c4d76166fa2fc1..812064542fdb0a5c0ff7587aa= aba8da81dc234a9 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1882,6 +1882,13 @@ > Documentation/admin-guide/mm/hugetlbpage.rst. > Format: size[KMG] > > + hugepage_alloc_threads=3D > + [HW] The number of threads per NUMA node that sho= uld > + be used to allocate hugepages during boot. > + This option can be used to improve system bootup = time > + when allocating a large amount of huge pages. > + The default value is 2 threads per NUMA node. > + > hugetlb_cma=3D [HW,CMA,EARLY] The size of a CMA area used for = allocation > of gigantic hugepages. Or using node format, the = size > of a CMA area per node can be specified. > diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation= /admin-guide/mm/hugetlbpage.rst > index f34a0d798d5b533f30add99a34f66ba4e1c496a3..c88461be0f66887d532ac4ef2= 0e3a61dfd396be7 100644 > --- a/Documentation/admin-guide/mm/hugetlbpage.rst > +++ b/Documentation/admin-guide/mm/hugetlbpage.rst > @@ -145,7 +145,14 @@ hugepages > > It will allocate 1 2M hugepage on node0 and 2 2M hugepages on nod= e1. > If the node number is invalid, the parameter will be ignored. > - > +hugepage_alloc_threads > + Specify the number of threads per NUMA node that should be used t= o > + allocate hugepages during boot. This parameter can be used to imp= rove > + system bootup time when allocating a large amount of huge pages. > + The default value is 2 threads per NUMA node. Example to use 8 th= reads > + per NUMA node:: > + > + hugepage_alloc_threads=3D8 > default_hugepagesz > Specify the default huge page size. This parameter can > only be specified once on the command line. default_hugepagesz c= an > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 163190e89ea16450026496c020b544877db147d1..b7d24c41e0f9d22f5b86c253e= 29a2eca28460026 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -68,6 +68,7 @@ static unsigned long __initdata default_hstate_max_huge= _pages; > static bool __initdata parsed_valid_hugepagesz =3D true; > static bool __initdata parsed_default_hugepagesz; > static unsigned int default_hugepages_in_node[MAX_NUMNODES] __initdata; > +static unsigned long allocation_threads_per_node __initdata =3D 2; > > /* > * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_= pages, > @@ -3432,26 +3433,23 @@ static unsigned long __init hugetlb_pages_alloc_b= oot(struct hstate *h) > job.size =3D h->max_huge_pages; > > /* > - * job.max_threads is twice the num_node_state(N_MEMORY), > + * job.max_threads is twice the num_node_state(N_MEMORY) by defau= lt. > * > - * Tests below indicate that a multiplier of 2 significantly impr= oves > - * performance, and although larger values also provide improveme= nts, > - * the gains are marginal. > + * On large servers with terabytes of memory, huge page allocatio= n > + * can consume a considerably amount of time. > * > - * Therefore, choosing 2 as the multiplier strikes a good balance= between > - * enhancing parallel processing capabilities and maintaining eff= icient > - * resource management. > + * Tests below show how long it takes to allocate 1 TiB of memory= with 2MiB huge pages. > + * 2MiB huge pages. Using more threads can significantly improve = allocation time. > * > - * +------------+-------+-------+-------+-------+-------+ > - * | multiplier | 1 | 2 | 3 | 4 | 5 | > - * +------------+-------+-------+-------+-------+-------+ > - * | 256G 2node | 358ms | 215ms | 157ms | 134ms | 126ms | > - * | 2T 4node | 979ms | 679ms | 543ms | 489ms | 481ms | > - * | 50G 2node | 71ms | 44ms | 37ms | 30ms | 31ms | > - * +------------+-------+-------+-------+-------+-------+ > + * +--------------------+-------+-------+-------+-------+-------+ > + * | threads per node | 2 | 4 | 8 | 16 | 32 | > + * +--------------------+-------+-------+-------+-------+-------+ > + * | skylake 4node | 44s | 22s | 16s | 19s | 20s | > + * | cascade lake 4node | 39s | 20s | 11s | 10s | 9s | > + * +--------------------+-------+-------+-------+-------+-------+ > */ > - job.max_threads =3D num_node_state(N_MEMORY) * 2; > - job.min_chunk =3D h->max_huge_pages / num_node_state(N_MEMORY) = / 2; > + job.max_threads =3D num_node_state(N_MEMORY) * allocation_threads= _per_node; > + job.min_chunk =3D h->max_huge_pages / num_node_state(N_MEMORY) = / allocation_threads_per_node; > padata_do_multithreaded(&job); > > return h->nr_huge_pages; > @@ -4764,6 +4762,26 @@ static int __init default_hugepagesz_setup(char *s= ) > } > __setup("default_hugepagesz=3D", default_hugepagesz_setup); > > +/* hugepage_alloc_threads command line parsing > + * When set, use this specific number of threads per NUMA node for the b= oot > + * allocation of hugepages. > + */ > +static int __init hugepage_alloc_threads_setup(char *s) > +{ > + unsigned long threads_per_node; > + > + if (kstrtoul(s, 0, &threads_per_node) !=3D 0) > + return 1; > + > + if (threads_per_node =3D=3D 0) > + return 1; > + > + allocation_threads_per_node =3D threads_per_node; > + > + return 1; > +} > +__setup("hugepage_alloc_threads=3D", hugepage_alloc_threads_setup); > + > static unsigned int allowed_mems_nr(struct hstate *h) > { > int node; > > -- > 2.48.1 > > > Maybe mention that this does not apply to 'gigantic' hugepages (e.g. hugetlb pages of an order > MAX_PAGE_ORDER). Those are allocated earlier in boot by memblock, in a single-threaded environment. Not your fault that this distinction between these types of hugetlb pages isn't clear in the Docs, of course. Only hugetlb_cma mentions that it is for gigantic pages. But it's probably best to mention that the threads parameter is for non-gigantic hugetlb pages only. - Frank