From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE756C3601E for ; Thu, 10 Apr 2025 18:57:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C34B4280123; Thu, 10 Apr 2025 14:57:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE55D28011F; Thu, 10 Apr 2025 14:57:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A847E280123; Thu, 10 Apr 2025 14:57:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 87C0128011F for ; Thu, 10 Apr 2025 14:57:48 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C9B88C1364 for ; Thu, 10 Apr 2025 18:57:48 +0000 (UTC) X-FDA: 83319043416.21.0E4E9DB Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by imf07.hostedemail.com (Postfix) with ESMTP id 4C9E940004 for ; Thu, 10 Apr 2025 18:57:46 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=ZhSVbdlF; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf07.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744311466; a=rsa-sha256; cv=none; b=PIyWtSmiKPK8RCcUYKXVTiWu/VWFKFlXjT6WXlc1M9kxdUFHIGq7NxGoZfEW1PblVjToOm 4t+HRJ2V+eceDZE3MjI9vIz7yb9DsDNENZl9yy3UQvwhZl2Q2h2dTujFAupSOuNdpoOCKf a7+gjsO52kOr49TI03kFyPNSR3Uzt34= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=ZhSVbdlF; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf07.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.156.1 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744311466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1oD4Gf1AXYEjsWVe68UDKFo7+ivIYIIbn8jh/JtKRbY=; b=JJq5dNGkeHXnmBqFRWxXpyxmjuMc5sNa0BHJH100brY8fEWJV8MggOQL7jgKLNnZ3vePMp q3L2ioL315Zepq0Dyet20GcGd98JLPe1sOMo1cgB57Z8hz3qBQFoEm7zs5QFX+TvzrEBj1 v98kyz1oSJra7SXsqIaBTddmiFFd5YM= Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 53ACs5bb018157; Thu, 10 Apr 2025 18:57:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=1oD4Gf 1AXYEjsWVe68UDKFo7+ivIYIIbn8jh/JtKRbY=; b=ZhSVbdlFeC/WjVuZvQrGPC KhLbY93FyJsEvWPMtnhNii26kzGFnC1pstkiNqvk08S3pWPdJhbYWCrdjU/7n5Nr znoGtMHAKfXh+d+pduUXRzb8j2eGUyP3EKnvSlMWy9oRvUvZxV2UiFgL2XiGHlLX gAatoTXWk06jjS8PminW2ptzmBdsZaApgxpmbpdSUf19Oi0mq7PwYp3zA4KJqs6A 8wQI9pakl4xsTijtmTFpzoAg4hcHPUOKMPffeGlbph4RJVyWXe9VSXivK0iKCQ18 zJPwvni9oST6UMv3D7XPc6C44N+DxMRDTvorXpXtw+TQTgYXfKNoVSiWhlZcNTog == Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 45x6ca4vta-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Apr 2025 18:57:36 +0000 (GMT) Received: from m0360083.ppops.net (m0360083.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 53AIdd94012582; Thu, 10 Apr 2025 18:57:36 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 45x6ca4vt6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Apr 2025 18:57:36 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 53AGG3kF011056; Thu, 10 Apr 2025 18:57:34 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([172.16.1.5]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 45uf7yynhg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 10 Apr 2025 18:57:34 +0000 Received: from smtpav04.wdc07v.mail.ibm.com (smtpav04.wdc07v.mail.ibm.com [10.39.53.231]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 53AIvXc832113306 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 10 Apr 2025 18:57:34 GMT Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A981758052; Thu, 10 Apr 2025 18:57:33 +0000 (GMT) Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4B9E858045; Thu, 10 Apr 2025 18:57:30 +0000 (GMT) Received: from [9.39.24.129] (unknown [9.39.24.129]) by smtpav04.wdc07v.mail.ibm.com (Postfix) with ESMTP; Thu, 10 Apr 2025 18:57:29 +0000 (GMT) Message-ID: Date: Fri, 11 Apr 2025 00:27:28 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] base/node: Use curr_node_memblock_intersect_memory_block to Get Memory Block NID if CONFIG_DEFERRED_STRUCT_PAGE_INIT is Set To: Mike Rapoport Cc: Greg Kroah-Hartman , linux-kernel@vger.kernel.org, David Hildenbrand , Andrew Morton , linux-mm@kvack.org, Ritesh Harjani , rafael@kernel.org, Danilo Krummrich References: <50142a29010463f436dc5c4feb540e5de3bb09df.1744175097.git.donettom@linux.ibm.com> Content-Language: en-US From: Donet Tom In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: CFVkbiLjaakrR-ewFn3Xpth46p10sYLM X-Proofpoint-ORIG-GUID: B8pZYdbEVmn0y_4QzAG45MLOFcciV55o X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1095,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-04-10_05,2025-04-10_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 mlxscore=0 priorityscore=1501 suspectscore=0 adultscore=0 bulkscore=0 spamscore=0 clxscore=1015 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2502280000 definitions=main-2504100137 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4C9E940004 X-Stat-Signature: c8y3sn7it1hef458rmik9w69ytx8s64g X-Rspam-User: X-HE-Tag: 1744311466-28866 X-HE-Meta: U2FsdGVkX19nWQdRb2PB8A3hpaYynNLPo5GBs86a8+5RrVFnA4a77DbY2+KwBOWzccXaT6Wmv74Ou0paqh/nlABjVnyjehb5zdtnz1OwV2L5yF4UijYgcgKc4a/UHjbTn6eSm1jp39gIrcvF3d0o58RExOuS2HrS7jiRJAzXg/cfiBv4AY5kK9nsfaYVR6kd4Y40RRosIGcjd4kp0ngrBNklS5uXoUokD2Jh4eY40KZB4GnxUVQhcP1bqOs94Qlbn9jx+DnU+mjiIfLPef6Vf0Fqy89nycCO6XZGLO7Tu33Z4OGDinV35FQ0wRmLggQDiE5v61rV1HzJTh7SKqjhgggGQmcQmVcCr0IvJ5J7WMnlxO6n+OqHBxVHX8sRUPXcmndNU8HMYdt12pKkR/k7pLpwkkqBijcKP7oOGUrzB14Wc3pP/ZrhKmf00Y/dfEZB5bXAEsnsQFQyxvoVEmfiXR7IOpHONYKtLsuWcScnJl6XFBC07PlSK8WlMv4x15iuIsrLHfqOuC7CYRrXiy3d+YbEl/vopS6OLx4fpRC5w0Y72lIISRNFPY1YU7UnE1WaufQyXeL0BC/8R9xISIGyL6/hpa6gHkLDCUU5x/1//GxCFDGQG6wv3tZrJrTM6ejjZ5HJXCInQOykMeQMecEs77w/ElLaoHZjCaFMgL/lHrgVtK5IIfxoFfSMxKIffX9dCRiK3NLdKuSX6kZ0A6wtbsaWnHwQw9yBhs0bSJqPcRkCWOlYut/2f3d8yCzD/Ga2tmHQJO34HcqAAXkvF9Ngrd/xBu7rFn8nj0IWvvgcMeVP/dw1pYzV35pDBY56ECWJg927dRQVeLkv/ZsbkQiYWux5WSbRpbCEgKoaY1ZCUxZMDGuWcgojp44FiJy796DHajgnFZA4NREtem1j3ELhizNoNq3BRLi/ufNDfpet5rX5AR1scfekH7c/y3/PguY4qe0EngQ2Ta+lpHvemlf 1dO54AHF NmYhkIBfKKfGbzwvrzp8lV0Q7w2btufcWEr6hchfuhJcTeFyINhAIEJgxkX3b1ep7OCLFF0FCWcQ3hupru/Aa5E3QJV4tSJgnhYdjxKZmQqrgUsv3cjRpqMajON8BFCZ2ASIis4s4KFKDHRX+Y7PzqYWx2xN3gJ2c4/xuwX99FvbrHnsoVIasMnb+xLh977tuwqo25HHWId2WOqwWKosFk5taaotbckk0y4OwXuVk9x6nzEGodM5agrB2ImFvx/aZxgDXmrXwKtqK/FjBPmSei2fYYG64XpWi4TSjs9V/5RJIaaryHCWc1bkE257V9Nb2hXiLlS/rfL6BORBgVJ+c4orIaQyd6MUTDQYvAdmSfCFE+r2QyFNS5Kb8EWWrCgItdBdyLHekcCK2wnI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/10/25 1:37 PM, Mike Rapoport wrote: > On Wed, Apr 09, 2025 at 10:57:57AM +0530, Donet Tom wrote: >> In the current implementation, when CONFIG_DEFERRED_STRUCT_PAGE_INIT is >> set, we iterate over all PFNs in the memory block and use >> early_pfn_to_nid to find the NID until a match is found. >> >> This patch we are using curr_node_memblock_intersect_memory_block() to >> check if the current node's memblock intersects with the memory block >> passed when CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. If an intersection >> is found, the memory block is added to the current node. >> >> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set, the existing mechanism >> for finding the NID will continue to be used. > I don't think we really need different mechanisms for different settings of > CONFIG_DEFERRED_STRUCT_PAGE_INIT. > > node_dev_init() runs after all struct pages are already initialized and can > always use pfn_to_nid(). In the current implementation, if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, we perform a binary search in the memblock region to determine the pfn's nid. Otherwise, we use pfn_to_nid() to obtain the pfn's nid. Your point is that we could unify this logic and always use pfn_to_nid() to determine the pfn's nid, regardless of whether CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. Is that correct? > > kernel_init_freeable() -> > page_alloc_init_late(); /* completes initialization of deferred pages */ > ... > do_basic_setup() -> > driver_init() -> > node_dev_init(); > > The next step could be refactoring register_mem_block_under_node_early() to > loop over memblock regions rather than over pfns. So it the current implementation node_dev_init()     register_one_node         register_memory_blocks_under_node             walk_memory_blocks()                 register_mem_block_under_node_early                     get_nid_for_pfn We get each node's start and end PFN from the pg_data. Using these values, we determine the memory block's start and end within the current node. To identify the node to which these memory block belongs,we iterate over each PFN in the range. The problem I am facing is, In my system node4 has a memory block ranging from memory30351 to memory38524, and memory128433. The memory blocks between memory38524 and memory128433 do not belong to this node. In  walk_memory_blocks() we iterate over all memory blocks starting from memory38524 to memory128433. In register_mem_block_under_node_early(), up to memory38524, the first pfn correctly returns the corresponding nid and the function returns from there. But after memory38524 and until memory128433, the loop iterates through each pfn and checks the nid. Since the nid does not match the required nid, the loop continues. This causes the soft lockups. This issue occurs only when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, as a binary search is used to determine the PFN's nid. When this configuration is disabled, pfn_to_nid is faster, and the issue does not seen.( Faster because nid is getting from page) To speed up the code when CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, I added this function that iterates over all memblock regions for each memory block to determine its nid. "Loop over memblock regions instead of iterating over PFNs" - My question is - in register_one_node, do you mean that we should iterate over all memblock regions, identify the regions belonging to the current node, and then retrieve the corresponding memory blocks to register them under that node? Thanks Donet > >> Signed-off-by: Donet Tom >> --- >> drivers/base/node.c | 37 +++++++++++++++++++++++++++++-------- >> 1 file changed, 29 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/base/node.c b/drivers/base/node.c >> index cd13ef287011..5c5dd02b8bdd 100644 >> --- a/drivers/base/node.c >> +++ b/drivers/base/node.c >> @@ -20,6 +20,8 @@ >> #include >> #include >> #include >> +#include >> + >> >> static const struct bus_type node_subsys = { >> .name = "node", >> @@ -782,16 +784,19 @@ static void do_register_memory_block_under_node(int nid, >> ret); >> } >> >> -/* register memory section under specified node if it spans that node */ >> -static int register_mem_block_under_node_early(struct memory_block *mem_blk, >> - void *arg) >> +static int register_mem_block_early_if_dfer_page_init(struct memory_block *mem_blk, >> + unsigned long start_pfn, unsigned long end_pfn, int nid) >> { >> - unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; >> - unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); >> - unsigned long end_pfn = start_pfn + memory_block_pfns - 1; >> - int nid = *(int *)arg; >> - unsigned long pfn; >> >> + if (curr_node_memblock_intersect_memory_block(start_pfn, end_pfn, nid)) >> + do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY); >> + return 0; >> +} >> + >> +static int register_mem_block_early__normal(struct memory_block *mem_blk, >> + unsigned long start_pfn, unsigned long end_pfn, int nid) >> +{ >> + unsigned long pfn; >> for (pfn = start_pfn; pfn <= end_pfn; pfn++) { >> int page_nid; >> >> @@ -821,6 +826,22 @@ static int register_mem_block_under_node_early(struct memory_block *mem_blk, >> /* mem section does not span the specified node */ >> return 0; >> } >> +/* register memory section under specified node if it spans that node */ >> +static int register_mem_block_under_node_early(struct memory_block *mem_blk, >> + void *arg) >> +{ >> + unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE; >> + unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr); >> + unsigned long end_pfn = start_pfn + memory_block_pfns - 1; >> + int nid = *(int *)arg; >> + >> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> + if (system_state < SYSTEM_RUNNING) >> + return register_mem_block_early_if_dfer_page_init(mem_blk, start_pfn, end_pfn, nid); >> +#endif >> + return register_mem_block_early__normal(mem_blk, start_pfn, end_pfn, nid); >> + >> +} >> >> /* >> * During hotplug we know that all pages in the memory block belong to the same >> -- >> 2.48.1 >>