From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4351C433EF for ; Thu, 9 Sep 2021 14:17:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3954B61132 for ; Thu, 9 Sep 2021 14:17:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3954B61132 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 6C99D900002; Thu, 9 Sep 2021 10:17:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 679036B0072; Thu, 9 Sep 2021 10:17:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5408F900002; Thu, 9 Sep 2021 10:17:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0210.hostedemail.com [216.40.44.210]) by kanga.kvack.org (Postfix) with ESMTP id 3FAD56B006C for ; Thu, 9 Sep 2021 10:17:41 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4619D180CC19A for ; Thu, 9 Sep 2021 14:17:40 +0000 (UTC) X-FDA: 78568238280.31.F165289 Received: from mail-pg1-f170.google.com (mail-pg1-f170.google.com [209.85.215.170]) by imf09.hostedemail.com (Postfix) with ESMTP id 204723000104 for ; Thu, 9 Sep 2021 14:17:39 +0000 (UTC) Received: by mail-pg1-f170.google.com with SMTP id n18so1890162pgm.12 for ; Thu, 09 Sep 2021 07:17:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=uIfdnfFYe7CXfYMyKUIkrwUaExyxjd6v554ANyt2LoQ=; b=c3IljvfTRREUH/FbmtAi+v/46GJDMmJWi56ht3zjJxfKgNP9e5y+uQa6fGy5PK+w3o pHXiCXKTp78ORmivKtGmUBuXZmcYMoE7yzqR4CPYYTC+Wkgj0tTWiSL6On/Ja1aoQJ/q po1dZhk3itHaSew+Iv3mTjDQ35A2gNq5/uXnA+NPBmVrAMYp8pC716fhz6rjR7oMcoTv gyKvmc0lmLIP5HugWGWn6arpISFixqMOU1+J0Vk7ZVpsV/fyWK1PNGFL7tRpq+pKxDZY 0ZOzDiD5d4jN7pIDQgt0Syq8+vBuRduUHpEPYf68dqNo9+EgK7+A5bK5YyEifFnEZRJ2 PkEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=uIfdnfFYe7CXfYMyKUIkrwUaExyxjd6v554ANyt2LoQ=; b=sTE0KaHvddVwbGznq7e2F5mPyz3C/AkBozYLlQtmcQOzNPUiDyCbmPmC9Wuyeq6PmX lQ4BnV32mjIIwopvjp5aEBlvGSXBee56qdX4xehDAo/9OdnrmbBE/yDMMBxOO4yxKYVl TBeX9EXUX8lT9VedO8lO71RskdlJe2UObK+Uym1ZsoSDKpZdw+ysVghD2dqRRpMovDzP wWRQ/mQKMM75gORfKBScByriSJnFrLArfxcqCv1PVhZK2azEq+bjhrpqkNQXmFOr/4ya wApl8RosxL/tSbov4j3bRah3h7AH+xcsllrYSYcFpgBt/9o/GFHe1tmWUlIC5JIJKb6O ckFg== X-Gm-Message-State: AOAM53388mnyP7uneLLVoToXhtUhr1xiMH6sioqjFJzqnxFWwfckqUT2 xcXnSiRZ5HROOgdVX85NEcw= X-Google-Smtp-Source: ABdhPJxvSy40b++F7HHqqFUe//1pPNUn2vjUjPsH0Sb3Z3Y5h6hlOLBaoHmaS7Zdce6PYlKVCvfegA== X-Received: by 2002:a63:7112:: with SMTP id m18mr2900138pgc.93.1631197058066; Thu, 09 Sep 2021 07:17:38 -0700 (PDT) Received: from localhost.localdomain (f.a4.5177.ip4.static.sl-reverse.com. [119.81.164.15]) by smtp.gmail.com with ESMTPSA id e19sm2460957pfi.139.2021.09.09.07.17.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 09 Sep 2021 07:17:37 -0700 (PDT) From: yaozhenguo To: mike.kravetz@oracle.com Cc: corbet@lwn.net, akpm@linux-foundation.org, yaozhenguo@jd.com, willy@infradead.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, yaozhenguo Subject: [PATCH v4] hugetlbfs: Extend the definition of hugepages parameter to support node allocation Date: Thu, 9 Sep 2021 22:16:55 +0800 Message-Id: <20210909141655.87821-1-yaozhenguo1@gmail.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 204723000104 X-Stat-Signature: qxgkb5am5atnk6pxygibsnehxi6f744k Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=c3IljvfT; spf=pass (imf09.hostedemail.com: domain of yaozhenguo1@gmail.com designates 209.85.215.170 as permitted sender) smtp.mailfrom=yaozhenguo1@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1631197059-569057 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We can specify the number of hugepages to allocate at boot. But the hugepages is balanced in all nodes at present. In some scenarios, we only need hugepages in one node. For example: DPDK needs hugepages which are in the same node as NIC. if DPDK needs four hugepages of 1G size in node1 and system has 16 numa nodes. We must reserve 64 hugepages in kernel cmdline. But, only four hugepages are used. The others should be free after boot. If the system memory is low(for example: 64G), it wil= l be an impossible task. So, Extending hugepages parameter to support specifying hugepages at a specific node. For example add following parameter: hugepagesz=3D1G hugepages=3D0:1,1:3 It will allocate 1 hugepage in node0 and 3 hugepages in node1. Signed-off-by: yaozhenguo --- v3 -> v4: changes - fix wrong behavior for parameter: hugepages=3D0:1,1:3 default_hugep= agesz=3D1G - make the change of documentation more reasonable --- .../admin-guide/kernel-parameters.txt | 8 +- Documentation/admin-guide/mm/hugetlbpage.rst | 12 +- include/linux/hugetlb.h | 1 + mm/hugetlb.c | 122 +++++++++++++++++- 4 files changed, 132 insertions(+), 11 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentat= ion/admin-guide/kernel-parameters.txt index bdb22006f..a2046b2c5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1588,9 +1588,11 @@ the number of pages of hugepagesz to be allocated. If this is the first HugeTLB parameter on the command line, it specifies the number of pages to allocate for - the default huge page size. See also - Documentation/admin-guide/mm/hugetlbpage.rst. - Format: + the default huge page size. If using node format, the + number of pages to allocate per-node can be specified. + See also Documentation/admin-guide/mm/hugetlbpage.rst. + Format: or (node format) + :[,:] =20 hugepagesz=3D [HW] The size of the HugeTLB pages. This is used in diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation= /admin-guide/mm/hugetlbpage.rst index 8abaeb144..d70828c07 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -128,7 +128,9 @@ hugepages implicitly specifies the number of huge pages of default size to allocate. If the number of huge pages of default size is implicitly specified, it can not be overwritten by a hugepagesz,hugepages - parameter pair for the default size. + parameter pair for the default size. This parameter also has a + node format. The node format specifies the number of huge pages + to allocate on specific nodes. =20 For example, on an architecture with 2M default huge page size:: =20 @@ -138,6 +140,14 @@ hugepages indicating that the hugepages=3D512 parameter is ignored. If a hugepag= es parameter is preceded by an invalid hugepagesz parameter, it will be ignored. + + Node format example:: + + hugepagesz=3D2M hugepages=3D0:1,1:2 + + It will allocate 1 2M hugepage on node0 and 2 2M hugepages on node1. + If the node number is invalid, the parameter will be ignored. + default_hugepagesz Specify the default huge page size. This parameter can only be specified once on the command line. default_hugepagesz can diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index f7ca1a387..5939ecd4f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -605,6 +605,7 @@ struct hstate { unsigned long nr_overcommit_huge_pages; struct list_head hugepage_activelist; struct list_head hugepage_freelists[MAX_NUMNODES]; + unsigned int max_huge_pages_node[MAX_NUMNODES]; unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dfc940d52..c92ab09cf 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -66,6 +66,7 @@ static struct hstate * __initdata parsed_hstate; static unsigned long __initdata default_hstate_max_huge_pages; static bool __initdata parsed_valid_hugepagesz =3D true; static bool __initdata parsed_default_hugepagesz; +static unsigned int default_hugepages_in_node[MAX_NUMNODES] __initdata; =20 /* * Protects updates to hugepage_freelists, hugepage_activelist, nr_huge_= pages, @@ -2842,10 +2843,75 @@ static void __init gather_bootmem_prealloc(void) } } =20 +static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, = int nid) +{ + unsigned long i; + char buf[32]; + + for (i =3D 0; i < h->max_huge_pages_node[nid]; ++i) { + if (hstate_is_gigantic(h)) { + struct huge_bootmem_page *m; + void *addr; + + addr =3D memblock_alloc_try_nid_raw( + huge_page_size(h), huge_page_size(h), + 0, MEMBLOCK_ALLOC_ACCESSIBLE, nid); + if (!addr) + break; + m =3D addr; + BUG_ON(!IS_ALIGNED(virt_to_phys(m), huge_page_size(h))); + /* + * Put them into a private list first because mem_map + * is not up yet + */ + INIT_LIST_HEAD(&m->list); + list_add(&m->list, &huge_boot_pages); + m->hstate =3D h; + } else { + struct page *page; + + gfp_t gfp_mask =3D htlb_alloc_mask(h) | __GFP_THISNODE; + + page =3D alloc_fresh_huge_page(h, gfp_mask, nid, + &node_states[N_MEMORY], NULL); + if (!page) + break; + put_page(page); /* free it into the hugepage allocator */ + } + cond_resched(); + } + if (i =3D=3D h->max_huge_pages_node[nid]) + return; + + string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); + pr_warn("HugeTLB: allocating %u of page size %s failed node%d. Only al= located %lu hugepages.\n", + h->max_huge_pages_node[nid], buf, nid, i); + h->max_huge_pages_node[nid] =3D i; + h->max_huge_pages -=3D (h->max_huge_pages_node[nid] - i); +} + static void __init hugetlb_hstate_alloc_pages(struct hstate *h) { unsigned long i; nodemask_t *node_alloc_noretry; + bool hugetlb_node_set =3D false; + + /* skip gigantic hugepages allocation if hugetlb_cma enabled */ + if (hstate_is_gigantic(h) && hugetlb_cma_size) { + pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocati= on\n"); + return; + } + + /* do node alloc */ + for (i =3D 0; i < nodes_weight(node_states[N_MEMORY]); i++) { + if (h->max_huge_pages_node[i] > 0) { + hugetlb_hstate_alloc_pages_onenode(h, i); + hugetlb_node_set =3D true; + } + } + + if (hugetlb_node_set) + return; =20 if (!hstate_is_gigantic(h)) { /* @@ -2867,10 +2933,6 @@ static void __init hugetlb_hstate_alloc_pages(stru= ct hstate *h) =20 for (i =3D 0; i < h->max_huge_pages; ++i) { if (hstate_is_gigantic(h)) { - if (hugetlb_cma_size) { - pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time alloca= tion\n"); - goto free; - } if (!alloc_bootmem_huge_page(h)) break; } else if (!alloc_pool_huge_page(h, @@ -2887,7 +2949,6 @@ static void __init hugetlb_hstate_alloc_pages(struc= t hstate *h) h->max_huge_pages, buf, i); h->max_huge_pages =3D i; } -free: kfree(node_alloc_noretry); } =20 @@ -3578,6 +3639,11 @@ static int __init hugetlb_init(void) } default_hstate.max_huge_pages =3D default_hstate_max_huge_pages; + + for (i =3D 0; i < nodes_weight(node_states[N_MEMORY]); i++) + if (default_hugepages_in_node[i] > 0) + default_hstate.max_huge_pages_node[i] =3D + default_hugepages_in_node[i]; } } =20 @@ -3649,6 +3715,10 @@ static int __init hugepages_setup(char *s) { unsigned long *mhp; static unsigned long *last_mhp; + unsigned int node =3D NUMA_NO_NODE; + int count; + unsigned long tmp; + char *p =3D s; =20 if (!parsed_valid_hugepagesz) { pr_warn("HugeTLB: hugepages=3D%s does not follow a valid hugepagesz, i= gnoring\n", s); @@ -3672,8 +3742,37 @@ static int __init hugepages_setup(char *s) return 0; } =20 - if (sscanf(s, "%lu", mhp) <=3D 0) - *mhp =3D 0; + while (*p) { + count =3D 0; + if (sscanf(p, "%lu%n", &tmp, &count) !=3D 1) + goto invalid; + /* Parameter is node format */ + if (p[count] =3D=3D ':') { + node =3D tmp; + p +=3D count + 1; + if (node < 0 || + node >=3D nodes_weight(node_states[N_MEMORY])) + goto invalid; + /* Parse hugepages */ + if (sscanf(p, "%lu%n", &tmp, &count) !=3D 1) + goto invalid; + if (!hugetlb_max_hstate) + default_hugepages_in_node[node] =3D tmp; + else + parsed_hstate->max_huge_pages_node[node] =3D tmp; + *mhp +=3D tmp; + /* Go to parse next node*/ + if (p[count] =3D=3D ',') + p +=3D count + 1; + else + break; + } else { + if (p !=3D s) + goto invalid; + *mhp =3D tmp; + break; + } + } =20 /* * Global state is always initialized later in hugetlb_init. @@ -3686,6 +3785,10 @@ static int __init hugepages_setup(char *s) last_mhp =3D mhp; =20 return 1; + +invalid: + pr_warn("HugeTLB: Invalid hugepages parameter %s\n", p); + return 0; } __setup("hugepages=3D", hugepages_setup); =20 @@ -3747,6 +3850,7 @@ __setup("hugepagesz=3D", hugepagesz_setup); static int __init default_hugepagesz_setup(char *s) { unsigned long size; + int i; =20 parsed_valid_hugepagesz =3D false; if (parsed_default_hugepagesz) { @@ -3775,6 +3879,10 @@ static int __init default_hugepagesz_setup(char *s= ) */ if (default_hstate_max_huge_pages) { default_hstate.max_huge_pages =3D default_hstate_max_huge_pages; + for (i =3D 0; i < nodes_weight(node_states[N_MEMORY]); i++) + if (default_hugepages_in_node[i] > 0) + default_hstate.max_huge_pages_node[i] =3D + default_hugepages_in_node[i]; if (hstate_is_gigantic(&default_hstate)) hugetlb_hstate_alloc_pages(&default_hstate); default_hstate_max_huge_pages =3D 0; --=20 2.27.0