From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06D63C54788 for ; Thu, 22 Feb 2024 14:04:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54D316B0080; Thu, 22 Feb 2024 09:04:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D70E6B0081; Thu, 22 Feb 2024 09:04:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39CE46B0082; Thu, 22 Feb 2024 09:04:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 23E246B0080 for ; Thu, 22 Feb 2024 09:04:47 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BC299A0E69 for ; Thu, 22 Feb 2024 14:04:46 +0000 (UTC) X-FDA: 81819610572.11.D2F07CC Received: from out-177.mta0.migadu.com (out-177.mta0.migadu.com [91.218.175.177]) by imf22.hostedemail.com (Postfix) with ESMTP id 0C109C0013 for ; Thu, 22 Feb 2024 14:04:44 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=n3lcoALj; spf=pass (imf22.hostedemail.com: domain of gang.li@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708610685; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=B0gB3dnFjS7/hmQUqLE92LRaTlx9ED9y/zOkYWN8jYI=; b=X3m4tEmuv3fB87l5vtTa7JvqVOwyXaFqGTnvhRJOVZ4QiznMkQHCm7bz21gIYEGdnCu5Y+ JZCpd/4NAnnMJftiaTrMJZXh8wVw9zSMwuUQDRWVS6SlVz8lG7U3Tc3ovrqyBmQPMEsWkV 7k+uTD4DxK7lqO61uU0t+COj2BiYGzk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708610685; a=rsa-sha256; cv=none; b=gx1hhmBEHScaM8xPv0gLK0AHOWxknuYE39N2SWQvsW0lOpHNPG17mpsrD0WNtC6CKSSqlG 8lqQJTnso/vLuQhx862K464Qmi+NSG0ditHJXge+Kldt5kh9Z7LIRt6hn67/t/ZhCM/pBd 3kOTwgpqS9nbjT+ZmC6ht0CA0MiwbBs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=n3lcoALj; spf=pass (imf22.hostedemail.com: domain of gang.li@linux.dev designates 91.218.175.177 as permitted sender) smtp.mailfrom=gang.li@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708610682; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=B0gB3dnFjS7/hmQUqLE92LRaTlx9ED9y/zOkYWN8jYI=; b=n3lcoALjtwDDsGsHEnhCKu9TSJhGuUQEVQuQfil8mAgq/3q+rsyWIfsq9eKnyq9pgtWkRB kCXfFWkgwJiYqwdl8oDnucfdtJwax5nhPjJXQ8qMNfECotCEMDnEM2yfu49+nyz9RM7aN5 WsGpp6l7B3yc/FlByIC7K1kW3yCplyY= From: Gang Li To: Andrew Morton Cc: David Hildenbrand , David Rientjes , Muchun Song , Tim Chen , Steffen Klassert , Daniel Jordan , Jane Chu , "Paul E . McKenney" , Randy Dunlap , linux-mm@kvack.org, linux-kernel@vger.kernel.org, ligang.bdlg@bytedance.com, Gang Li Subject: [PATCH v6 0/8] hugetlb: parallelize hugetlb page init on boot Date: Thu, 22 Feb 2024 22:04:13 +0800 Message-Id: <20240222140422.393911-1-gang.li@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 0C109C0013 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 5k8n4nwgxueqxpzxgg5kx1kfeqytarsy X-HE-Tag: 1708610684-565366 X-HE-Meta: U2FsdGVkX1//pDq6tSzcA/wzk/AKYFenRAR51ZZr6TMob+JhQvwynkrO3U1LM2uqKFb/IbpnSmVwQu6gbLV+M59WObsVO9pxqBVhD6HgM6h5tCiQjn471MjCljCi+aXg3Uy5pkOrg8HeFJHAE576M5yOeSyBx9m5tq3QTPRd6yfrUzw+8XaXJGz2Jw0sNPNzaWo8iSESCAFcCOEZjfmqGKkgqRY9v1GnEAJtWhQ3WtX9LgJpxhRT7qEtcs9hgEAutB3zZXS+0baae9NaMk8NKMOvwPU1s0/ck7iBxCQ/0uS6k7I9IeGHvo95rsWTyBCig6kYkHjAKyoCv8UuepTz1RaldtBPQdIsUMQYN58R0RhCmAki9NzNAcIPPjtc8jCFdgfAC9QpZci/YHDIitYYjqwqBK20pTo10Grdu33w9jSidX4vEr6MPAvz2S4FPQobglUizB+Qjw7zp7phHKeOa4gtY9auq051o+J3Tt3+70iD+cPipbbHk5ngbdoV78oEQ47r7L1g1xgM/ZZ55tu6+YpaUHG+jVebUZTt3KtLySTSNs8mhw9mp5+YjqrGk7Tq5SlM3XSgjIr3G5USOmIebmonJHkco6AIQChjLTj92RXPqngtND3quZovWeWv+Uk57/j8WAWd/XRiI727V9jNMFsJwbqvLkajj63VPzEXtug8Rdu9xOncpHdjFgFXzB30y7LC2Z/7ZvKZMOEcGnG0iF3Wk2H2YReSy7IH+k2spBBdymeA+pKcWIWEs0Jy35GX654WCMJPYTSMV9X+QMHgxbwHIr7CcJEkMO9nOU7wCBEINH0NjgtVw0L+hgI3A5iObLVJ4yzoLskdI5MIpmGaCw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi all, hugetlb init parallelization has now been updated to v6. This version is tested on mm/mm-stable. Since the release of v5, there have been some scattered discussions, they have primarily centered around two issues. Both of these issues have now been resolved, leading to the release of v6. Two updates in v6 ----------------- - Fix a Kconfig warning hugetlb parallelization depends on PADATA, and PADATA depends on SMP. When SMP is not selected, selecting PADATA will cause a warning: "WARNING: unmet direct dependencies detected for PADATA". So HUGETLBFS can only select PADATA when SMP is set. padata.c will not be compiled if !SMP, but padata_do_multithreaded is still used in this series for hugetlb parallel init. So it is necessary to implement a serial version in padata.h. - Fix a potential bug in gather_bootmem_prealloc_node padata_do_multithreaded implementation guarantees that each gather_bootmem_prealloc_node task handles one node. However, the API described in padata_do_multithreaded comment indicates that padata_do_multithreaded also can assign multiple nodes to a gather_bootmem_prealloc_node task. To avoid potential bug from future changes in padata_do_multithreaded, gather_bootmem_prealloc_parallel is introduced to wrap the gather_bootmem_prealloc_node. More details in: https://lore.kernel.org/r/20240213111347.3189206-3-gang.li@linux.dev Introduction ------------ Hugetlb initialization during boot takes up a considerable amount of time. For instance, on a 2TB system, initializing 1,800 1GB huge pages takes 1-2 seconds out of 10 seconds. Initializing 11,776 1GB pages on a 12TB Intel host takes more than 1 minute[1]. This is a noteworthy figure. Inspired by [2] and [3], hugetlb initialization can also be accelerated through parallelization. Kernel already has infrastructure like padata_do_multithreaded, this patch uses it to achieve effective results by minimal modifications. [1] https://lore.kernel.org/all/783f8bac-55b8-5b95-eb6a-11a583675000@google.com/ [2] https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/ [3] https://lore.kernel.org/all/20230906112605.2286994-1-usama.arif@bytedance.com/ [4] https://lore.kernel.org/all/76becfc1-e609-e3e8-2966-4053143170b6@google.com/ max_threads ----------- This patch use `padata_do_multithreaded` like this: ``` job.max_threads = num_node_state(N_MEMORY) * multiplier; padata_do_multithreaded(&job); ``` To fully utilize the CPU, the number of parallel threads needs to be carefully considered. `max_threads = num_node_state(N_MEMORY)` does not fully utilize the CPU, so we need to multiply it by a multiplier. Tests below indicate that a multiplier of 2 significantly improves performance, and although larger values also provide improvements, the gains are marginal. multiplier 1 2 3 4 5 ------------ ------- ------- ------- ------- ------- 256G 2node 358ms 215ms 157ms 134ms 126ms 2T 4node 979ms 679ms 543ms 489ms 481ms 50G 2node 71ms 44ms 37ms 30ms 31ms Therefore, choosing 2 as the multiplier strikes a good balance between enhancing parallel processing capabilities and maintaining efficient resource management. Test result ----------- test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 1G 4745 2024 57.34% 128c1T(2 node) 1G 3358 1712 49.02% 12T 1G 77000 18300 76.23% 256c2T(4 node) 2M 3336 1051 68.52% 128c1T(2 node) 2M 1943 716 63.15% Change log ---------- Changes in v6: - Fix a Kconfig warning - Fix a potential bug in gather_bootmem_prealloc_node Changes in v5: - https://lore.kernel.org/lkml/20240126152411.1238072-1-gang.li@linux.dev/ - Use prep_and_add_allocated_folios in 2M hugetlb parallelization - Update huge_boot_pages in arch/powerpc/mm/hugetlbpage.c - Revise struct padata_mt_job comment - Add 'max_threads' section in cover letter - Collect more Reviewed-by Changes in v4: - https://lore.kernel.org/r/20240118123911.88833-1-gang.li@linux.dev - Make padata_do_multithreaded dispatch all jobs with a global iterator - Revise commit message - Rename some functions - Collect Tested-by and Reviewed-by Changes in v3: - https://lore.kernel.org/all/20240102131249.76622-1-gang.li@linux.dev/ - Select CONFIG_PADATA as we use padata_do_multithreaded - Fix a race condition in h->next_nid_to_alloc - Fix local variable initialization issues - Remove RFC tag Changes in v2: - https://lore.kernel.org/all/20231208025240.4744-1-gang.li@linux.dev/ - Reduce complexity with `padata_do_multithreaded` - Support 1G hugetlb v1: - https://lore.kernel.org/all/20231123133036.68540-1-gang.li@linux.dev/ - parallelize 2M hugetlb initialization with workqueue Gang Li (8): hugetlb: code clean for hugetlb_hstate_alloc_pages hugetlb: split hugetlb_hstate_alloc_pages hugetlb: pass *next_nid_to_alloc directly to for_each_node_mask_to_alloc padata: dispatch works on different nodes padata: downgrade padata_do_multithreaded to serial execution for non-SMP hugetlb: have CONFIG_HUGETLBFS select CONFIG_PADATA hugetlb: parallelize 2M hugetlb allocation and initialization hugetlb: parallelize 1G hugetlb initialization arch/powerpc/mm/hugetlbpage.c | 2 +- fs/Kconfig | 1 + include/linux/hugetlb.h | 2 +- include/linux/padata.h | 14 +- kernel/padata.c | 14 +- mm/hugetlb.c | 241 +++++++++++++++++++++++----------- mm/mm_init.c | 1 + 7 files changed, 190 insertions(+), 85 deletions(-) -- 2.20.1