From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E12BC5B543 for ; Wed, 4 Jun 2025 13:27:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D1C38D001E; Wed, 4 Jun 2025 09:27:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 082108D0007; Wed, 4 Jun 2025 09:27:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EDE028D001E; Wed, 4 Jun 2025 09:27:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CCA868D0007 for ; Wed, 4 Jun 2025 09:27:56 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 94B9AC160E for ; Wed, 4 Jun 2025 13:27:56 +0000 (UTC) X-FDA: 83517796152.07.E751586 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf13.hostedemail.com (Postfix) with ESMTP id 21D9A20008 for ; Wed, 4 Jun 2025 13:27:53 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=R0bw2sZX; spf=pass (imf13.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749043674; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SvJHtpwU5FS2FH2sEMqFtRQCjQKN+NWI1URkJw+uf/s=; b=fzl5WInYndjMu9js8CW7m/0ra7GQgILvrW8kJJRz53UXb3KX2mUvjAezJF6F+foxKAuSpE +f8KHOyvv5or8P45xUN5pOa5Gc0Rrrxm2J5QvsaPUSwok49qAu4ARXKnKsTedi34bXAtqG v4M6348QKpSyS3MtYhjun0Mzh1FUrgs= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=R0bw2sZX; spf=pass (imf13.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749043674; a=rsa-sha256; cv=none; b=Yw727snnTQko5K7IdYquRbgQbgOLAuIWrOxlOuAP13n4e+5eULsp9RSje7yQsloIec/ZPi yzfD40LRYx2w1/yYjsA1n9xK1tDDCVjNq1+jmRUlaMuKB+at3h74WeNlkQMijKLaaGHuad ancgiATA23b3UR+Rfz9q6saJImWncG8= Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 5545LKpO017568; Wed, 4 Jun 2025 13:27:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=SvJHtp wU5FS2FH2sEMqFtRQCjQKN+NWI1URkJw+uf/s=; b=R0bw2sZXalP3XqKFeDpVzu +TcPYlnNHzVQjJHMjUrxeSElXJ29OS40ej9JPJhXsbFjXiimbOIQfXyB9fvBgpgn uIwVnezweKWoU+wjzJr4FkxC4NYEgJWMxmYcMSqRxErIlVfC3ha0VTJ5Xt/+E08A uOZpAeLjSwKlLHVxLvPJ73ztIwzm/z41zAt+fs65EupYx6B6NDxSnBBODp4cXeXC Fn+38DBD8/utOII8cM4smO+su3D6S50opeNwAldphKH/dEjMqL477ozTLJbxudTO p77oRyI8gXGGKvjgcAXDm78sNyp70UfMjLTlDaHVtJqCm11UkawcIG6oUxGAR0hQ == Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 472fwuj9bs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Jun 2025 13:27:43 +0000 (GMT) Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 554CSsEt002771; Wed, 4 Jun 2025 13:27:42 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 472fwuj9bp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Jun 2025 13:27:42 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 554C6lla022517; Wed, 4 Jun 2025 13:27:41 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([172.16.1.5]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 470c3tg2vu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Jun 2025 13:27:41 +0000 Received: from smtpav05.dal12v.mail.ibm.com (smtpav05.dal12v.mail.ibm.com [10.241.53.104]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 554DReDo66978170 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Jun 2025 13:27:40 GMT Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BE36A58068; Wed, 4 Jun 2025 13:27:40 +0000 (GMT) Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 22F4558052; Wed, 4 Jun 2025 13:27:35 +0000 (GMT) Received: from [9.39.21.166] (unknown [9.39.21.166]) by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Jun 2025 13:27:34 +0000 (GMT) Message-ID: Date: Wed, 4 Jun 2025 18:57:33 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v7 1/5] drivers/base/node: Optimize memory block registration to reduce boot time To: David Hildenbrand , Andrew Morton , Mike Rapoport , Oscar Salvador , Zi Yan , Greg Kroah-Hartman Cc: Ritesh Harjani , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Rafael J . Wysocki" , Danilo Krummrich , Jonathan Cameron , Alison Schofield , Yury Norov , Dave Jiang , Madhavan Srinivasan , Nilay Shroff , linuxppc-dev@lists.ozlabs.org References: <2a0a05c2dffc62a742bf1dd030098be4ce99be28.1748452241.git.donettom@linux.ibm.com> <96f7d3a2-2d85-442c-a9f7-e558d4a2ba06@redhat.com> Content-Language: en-US From: Donet Tom In-Reply-To: <96f7d3a2-2d85-442c-a9f7-e558d4a2ba06@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwNjA0MDA5OSBTYWx0ZWRfXxiJQTUViqBo6 OtNBqFwnEI9WPhS5L1uY2nsJCC7M8xU0oaFcjoM/T0Nx35L1QAe77bXRfMyP3EcFz8qGEl935Xp T30qAwnCUj/+cWKSapXi644sUy+Ns7rXss6dgmCFQ7kbY4xTakQxBph5taxdEREEd/abcn5FY6Z 2NzHZkF6kgkq1pCiFe/g/a8xjxjtSGzhJ+9nEOiD13z6Dgm9nRGkGQ+fIfgPj/8cxleIzGuLntz ZL0y3XGAPJnQofHSG996UrJW4AbwkiNykVonVOouz6yT3IcI4GsiKn3a2aWlfg6JF+HVp4pqvT/ HAO6s91DERzqTQ/se6qmV+1AbA88qyj/e3/hFMtoJiZMvnzxNBuZzQCjMAmiWOkMJo4y6WF0B6B WjbCXuHSn/jkOSq4maxnKEhpjmX6ekNYoD0ImSu0g2G17zWhOP2OTHVRkQSLwEds8BRGVcSe X-Proofpoint-GUID: FBl9wL6YPj3JybCsg8I_1eEkECyG6JgZ X-Proofpoint-ORIG-GUID: CxxGID8KN88eUD0mhbUT-kQgr7rrf0OX X-Authority-Analysis: v=2.4 cv=QtVe3Uyd c=1 sm=1 tr=0 ts=684049cf cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=IkcTkHD0fZMA:10 a=6IFa9wvqVegA:10 a=20KFwNOVAAAA:8 a=VwQbUJbxAAAA:8 a=Ikd4Dj_1AAAA:8 a=VnNF1IyMAAAA:8 a=4TbcEkYQSb79-EmCaOUA:9 a=3ZKOabzyN94A:10 a=QEXdDO2ut3YA:10 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1099,Hydra:6.0.736,FMLib:17.12.80.40 definitions=2025-06-04_03,2025-06-03_02,2025-03-28_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 suspectscore=0 bulkscore=0 adultscore=0 phishscore=0 impostorscore=0 malwarescore=0 priorityscore=1501 mlxlogscore=999 lowpriorityscore=0 spamscore=0 clxscore=1015 classifier=spam authscore=0 authtc=n/a authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2505280000 definitions=main-2506040099 X-Rspamd-Queue-Id: 21D9A20008 X-Stat-Signature: miss6dh4fbpi5486yu4t1wndcp8t71no X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1749043673-898391 X-HE-Meta: U2FsdGVkX18lnibc7eMArgn07M/bagj1A3p44N0OlbzORBymwpXfVfzsEUDZMbQoPq11BuavDqKwHLexSrH7D1ksILQMH3jrRp7y6H8u1Xwz5tDeTTgr87L1MeywUxVrJc5aesEriHi9KFIqbhjJRh0YUFN+ibliMV7MOgpvuRgsYsOXkZEmKJvuIHBDs2P51cjWcQb5n2FkoOHc+mzj02vjU+qdckIDYNqKwyjS90EA9yuiejbKp5SUJS+NqYEs3IXXI/ySQGHsYEVN2D1faIY5nkPf5L68gtocU0ntb6ZSFbPx4o1koyPdY7DASNH8hibh5/MJKbUmlyoKKJBD8brHrGR9fwMVE90iNeYWsdDfsz/RZ9UR0I0gbewS+lIoUaVvZK7FSb4c8EwOujDSCHPrPcakD12FHKiKahawrE50oD854dlzTFEUrBuGcUh/EXWK+WWnrN40i8AFHSpDEyWPr6YirHSzBXLFsT9688HWqnqUuEq/eAu5Nfj6xxc6ZedV2O7NnhNYp1uxR8QGFuTfb3JsfUNUYYUNjiPw8GB+FNhXETh44hKSK85HVtFEGWjc0ckH/anpjlWzh+0ubXiRxSHOBhSZDvljEdahQTMGpYIua1ANL6rqhB4NQNkh2NEfkP79WvqVr8tHWigJs79nxnRty3YbIhAkw2FXHP0Ja+wpBmlbK72v6eNTSMzFkI5uGgXpeaRQGPVujGTu+IVcy6I/md778oGZKSkJ29+R/j0ZLPA8GyXIJmeyyX4xIg6/p3ckdNWLd3KZiZEg9nvsS2p6eSxaQvP8aKABIAMMJsuxKH6jptgRjVn87MUA472yqxp9LfKU8WOl6aNQnl4kA/2hk4exvyNi7ySSm3yPAe3CAuo7rkUhL6Wdrid0kWsqmrPvPVMvAAvxC/tKNm+f1K2eaKP/huWTa+qEjj0pN/iLtR772NMju2+NVvOHP/zbQ1hkrdT280OLWUH Cp2x1IY0 QYy7lNldtFMWyjzoMr0hQk1W/fypWcj4psGffJbAjsX28H1AZCEF68JbVTRHPpHkWHarbIkUneUvSbdaOr3MeXvo2+8xIKW4GN5k1Q+wwX8DVDpLt4ciAosTxDwSjjPspKfjrQLIIPw0JFEd86DsDQFuzZaom/nGIuhuwXZvXV0ZDxKKxcFPmU6j5GZVe+cdg2CUb4sGxXJGoOz5+CJo4vVMpOgf72Ovdpp8Ju7b7mcJi/DDXtOXGu9uN0lefa17J4onCixE+PAbv6ZGq6EMFUEm+cXyeoMMB86kcBnvS76Qc1deGKZGb88rwnr89f8fH8xBLGhlJpMDw5U5uJIFuZASUOoX7xmHusSq9/ylncZk0KmHVBaz2mfgyVY2D0e9muvbA1A+lTBDXLDjoKQP2Mo5hve3GdLBY4iGnHhoM7vCSJ3/07ta3iCiZXo0ykHrvS+kShWIs+iv75Y7Ntw1ZQoJO2kz1NJIQj27TPSsRb4oMTY5jpup5mNYAuvlDIwfyKZqO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 6/4/25 3:08 PM, David Hildenbrand wrote: > On 28.05.25 19:18, Donet Tom wrote: >> During node device initialization, `memory blocks` are registered under >> each NUMA node. The `memory blocks` to be registered are identified >> using >> the node’s start and end PFNs, which are obtained from the node's >> pg_data >> >> However, not all PFNs within this range necessarily belong to the same >> node—some may belong to other nodes. Additionally, due to the >> discontiguous nature of physical memory, certain sections within a >> `memory block` may be absent. >> >> As a result, `memory blocks` that fall between a node’s start and end >> PFNs may span across multiple nodes, and some sections within those >> blocks >> may be missing. `Memory blocks` have a fixed size, which is architecture >> dependent. >> >> Due to these considerations, the memory block registration is currently >> performed as follows: >> >> for_each_online_node(nid): >>      start_pfn = pgdat->node_start_pfn; >>      end_pfn = pgdat->node_start_pfn + node_spanned_pages; >>      for_each_memory_block_between(PFN_PHYS(start_pfn), >> PFN_PHYS(end_pfn)) >>          mem_blk = memory_block_id(pfn_to_section_nr(pfn)); >> pfn_mb_start=section_nr_to_pfn(mem_blk->start_section_nr) >>          pfn_mb_end = pfn_start + memory_block_pfns - 1 >>          for (pfn = pfn_mb_start; pfn < pfn_mb_end; pfn++): >>              if (get_nid_for_pfn(pfn) != nid): >>                  continue; >>              else >>                  do_register_memory_block_under_node(nid, mem_blk, >> MEMINIT_EARLY); >> >> Here, we derive the start and end PFNs from the node's pg_data, then >> determine the memory blocks that may belong to the node. For each >> `memory block` in this range, we inspect all PFNs it contains and check >> their associated NUMA node ID. If a PFN within the block matches the >> current node, the memory block is registered under that node. >> >> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, get_nid_for_pfn() >> performs >> a binary search in the `memblock regions` to determine the NUMA node ID >> for a given PFN. If it is not enabled, the node ID is retrieved directly >> from the struct page. >> >> On large systems, this process can become time-consuming, especially >> since >> we iterate over each `memory block` and all PFNs within it until a >> match is >> found. When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the additional >> overhead of the binary search increases the execution time >> significantly, >> potentially leading to soft lockups during boot. >> >> In this patch, we iterate over `memblock region` to identify the >> `memory blocks` that belong to the current NUMA node. `memblock regions` >> are contiguous memory ranges, each associated with a single NUMA >> node, and >> they do not span across multiple nodes. >> >> for_each_memory_region(r): // r => region >>    if (!node_online(r->nid)): >>      continue; >>    else >>      for_each_memory_block_between(r->base, r->base + r->size - 1): >>        do_register_memory_block_under_node(r->nid, mem_blk, >> MEMINIT_EARLY); >> >> We iterate over all memblock regions, and if the node associated with >> the >> region is online, we calculate the start and end memory blocks based >> on the >> region's start and end PFNs. We then register all the memory blocks >> within >> that range under the region node. >> >> Test Results on My system with 32TB RAM >> ======================================= >> 1. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled. >> >> Without this patch >> ------------------ >> Startup finished in 1min 16.528s (kernel) >> >> With this patch >> --------------- >> Startup finished in 17.236s (kernel) - 78% Improvement >> >> 2. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT disabled. >> >> Without this patch >> ------------------ >> Startup finished in 28.320s (kernel) >> >> With this patch >> --------------- >> Startup finished in 15.621s (kernel) - 46% Improvement >> >> Acked-by: David Hildenbrand >> Acked-by: Oscar Salvador >> Acked-by: Mike Rapoport (Microsoft) >> Acked-by: Zi Yan >> Signed-off-by: Donet Tom >> >> --- > > [...] > >>   #ifdef CONFIG_NUMA >>   void memory_block_add_nid(struct memory_block *mem, int nid, >>                 enum meminit_context context); >> @@ -188,5 +206,4 @@ void memory_block_add_nid(struct memory_block >> *mem, int nid, >>    * can sleep. >>    */ >>   extern struct mutex text_mutex; >> - > > ^ Nit: unrelated change? Thank you David I’ll make the change and send the next revision.