From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E2C1C3601E for ; Thu, 10 Apr 2025 05:25:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C627A2800D0; Thu, 10 Apr 2025 01:25:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C10E12800CE; Thu, 10 Apr 2025 01:25:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB1D02800D0; Thu, 10 Apr 2025 01:25:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8C1892800CE for ; Thu, 10 Apr 2025 01:25:43 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E28011A1CD0 for ; Thu, 10 Apr 2025 05:25:43 +0000 (UTC) X-FDA: 83316996966.04.18B6ED0 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf19.hostedemail.com (Postfix) with ESMTP id BBFDA1A0006 for ; Thu, 10 Apr 2025 05:25:41 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=tgKERqRW; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=+sfd0kBa; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=tgKERqRW; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=+sfd0kBa; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744262742; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sz//AQNK6HBfjELDubxkvHQdxuVnx9BP9YnbKTYNU8Y=; b=EJjScnMr7VZ/ugtKEs9xRVYpozghNDpoiGdxG8cKaYnOL9jk3DvCZH+wHfSrt45cys0UHd CEZ/NTwfVpqbeOKPDeTPN4pwqOh/ZFWcQvt+77hjBipY1mKXnyuueGGH/ydUkOno+aA7bw YmfPd6S4GTFp8Q8l4m/7KQzoLIfRc48= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=tgKERqRW; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=+sfd0kBa; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=tgKERqRW; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=+sfd0kBa; spf=pass (imf19.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744262742; a=rsa-sha256; cv=none; b=igSW58VEBLltN+CV4Nz2QEyY/6ghMmjtvD/DVbSnCndivd2gRR11EXhfoJjKewkP0oXSk2 t1caRJcMDUeIsFY2YwaVgtvn7+7NIFIAB0HY3xz5f97nSMexYk8uV6r3LPBhOUOMHHKlzz sibnbWHaASG4u9Tq7iQTm76MwgZoTcI= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 2C9E21F385; Thu, 10 Apr 2025 05:25:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1744262740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Sz//AQNK6HBfjELDubxkvHQdxuVnx9BP9YnbKTYNU8Y=; b=tgKERqRWxcrS8WKKiKkjXpx11PHfw9lYAjJriM7NXmreAr5wDDkL2XyuOtPeHCLYy/7HZh 1wRreeUJ41jiSmzJzRcoloCGE+T9QZTOAZPl4bWyxgw49NGO+udj7OxD7oY2LsM33rCmx6 buz9y+2b2KtffpuKxdAYygmA1LyxLEk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1744262740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Sz//AQNK6HBfjELDubxkvHQdxuVnx9BP9YnbKTYNU8Y=; b=+sfd0kBaZgy9Ou8qAc0gy2ji7HnprFQ6btY+MFjO7tY5ylp6ybMPn0YyRCEsnvqSKhz6HH rqeAgLVqNBZx+/Dg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1744262740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Sz//AQNK6HBfjELDubxkvHQdxuVnx9BP9YnbKTYNU8Y=; b=tgKERqRWxcrS8WKKiKkjXpx11PHfw9lYAjJriM7NXmreAr5wDDkL2XyuOtPeHCLYy/7HZh 1wRreeUJ41jiSmzJzRcoloCGE+T9QZTOAZPl4bWyxgw49NGO+udj7OxD7oY2LsM33rCmx6 buz9y+2b2KtffpuKxdAYygmA1LyxLEk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1744262740; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Sz//AQNK6HBfjELDubxkvHQdxuVnx9BP9YnbKTYNU8Y=; b=+sfd0kBaZgy9Ou8qAc0gy2ji7HnprFQ6btY+MFjO7tY5ylp6ybMPn0YyRCEsnvqSKhz6HH rqeAgLVqNBZx+/Dg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 8FAA8139D5; Thu, 10 Apr 2025 05:25:39 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 6wELIFNW92dJWgAAD6G6ig (envelope-from ); Thu, 10 Apr 2025 05:25:39 +0000 Date: Thu, 10 Apr 2025 07:25:33 +0200 From: Oscar Salvador To: Aditya Gupta Cc: linux-mm@kvack.org, Andrew Morton , Danilo Krummrich , David Hildenbrand , Greg Kroah-Hartman , Mahesh J Salgaonkar , "Rafael J. Wysocki" , Sourabh Jain , linux-kernel@vger.kernel.org Subject: Re: [REPORT] Softlockups on PowerNV with upstream Message-ID: References: <20250409180344.477916-1-adityag@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250409180344.477916-1-adityag@linux.ibm.com> X-Rspamd-Queue-Id: BBFDA1A0006 X-Stat-Signature: fsn6qyngg5ufucwhs6xewcpzcmi1xudm X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1744262741-585716 X-HE-Meta: U2FsdGVkX1/UKgHf3+IyZ14MGmeJERqQStVP9+sCikailQty0AQq9y2l8jdgLqjxTT1zaiLhJl97gbDinveiLZrtFczD04m2rMSgWd77mdHB6hBPz6RKRPXtOSP7w7DSKO5QZt3ecpjC57kwWsEL+kZpQiqJXtwjo3IL6LiV7vz+m7O3+fKbw2+gXsfQHvnN9tX5qFxJGE9CV8ia6FbrIcEUrglDWLty4A+DcHBqRRzC8QSLIXgbRs4CP2lR6h+mK0s4Q86ZQ2qAlrpswvxPwwJofjl4KaaVpPwOHkXb/Y7uMw/cRZ267gSRA5Le7AhYxwFjeuc6dyodYd9Lxso2mU8rwBlurd0MLCvFaEkOeVerf7K3wegyHsAOd87jIrZT/xG5F6wuBWiQW+6R/P46qmjf/Tbzq5CPWjfTVE6Z5r50DNEtjD5HkQ49LiRB0UscR+Hz0fAv0xt+IumBO1pDlTj6J6OtQ1QDSkRqFGsCTbiW8RxQvwX8fy0wwMjFu0hzIYAu+Cz7lKPPBQNF279eP3Wx6PVEacNPPAj6pX1iixhes6gJDqUZWsFsxLcsFPsLBYFrOAfjn/wo6AH+fEZUJxd3hIX28E35ziLJ9W5t3ssvGJa8IGVF8ufun5qYN9WfZXk8YNQjTPW+FlpAJLiL+JtTbeNr+4wVvEZo7dtIxPVbO++++6ARvZGVce7wVrTABQf6lBaY3wl2BTcRv3jBeJXjrxN9+1tVXzhNKgDcMAgIwwxNCE8dedvdg5zVYG4h+PNqvU1Z9B+jvzq/J5I8jq3BBiPFxQRCmVn3yx5/BA6IXQt0g7gaIVbOVKg7N222fLtgbc0rWfbYH4rUrpOzpSZaA7vS5vA8WfKmiHOw/31txEZ5SQh/KBdfERvRe1PRnJp9GgS1isQeCMkJRlNc/7xWxe/JUAIu5AcisrBr4gcetXG6YdtOAWGARpttO14EQ9bk6HbfTXgHI9RVc6d RZlSN+bW EnpotuBwTfc6ZLogXOr2s1uB+enuwpxAH+QXqkWhXVSCBxRzF4vuP2wLurH+ZsuU17lWGBfUEEOaZjOOMM4DqKhjI4YKFvAM89EBxJ0tNTJFqiwzTFiRL+Oz9yIQb0XvuzXdULQ9OkTMezpwbiO2t4vsEe2e9C6P7a2wpFI9H9UlDaHju37Fg4QUl58nA2iudXs3uau9SHs7DU6IBBYbdiC8sQ0VLXZBlzg+YzsKYrEHSMzhmiUmCdLhjESmnb+Xe80IGM/twkhMYACuhzj2FZy29E9oDFuWNWrG8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000034, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote: > Hi, > > While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system. > > I have tested it only on PowerNV systems. But some architectures/platforms also > might have it. PSeries systems don't have this issue though. > > Bisect points to the following commit: > > commit 61659efdb35ce6c6ac7639342098f3c4548b794b > Author: Gavin Shan > Date: Wed Mar 12 09:30:43 2025 +1000 > > drivers/base/memory: improve add_boot_memory_block() > ... > Console log > ----------- > > [ 2.783371] smp: Brought up 4 nodes, 256 CPUs > [ 2.783475] numa: Node 0 CPUs: 0-63 > [ 2.783537] numa: Node 2 CPUs: 64-127 > [ 2.783591] numa: Node 4 CPUs: 128-191 > [ 2.783653] numa: Node 6 CPUs: 192-255 > [ 2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved) If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size, and sections_per_block == 1 (I think). The code before the mentioned commit, was something like: for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++) if (present_section_nr(nr)) section_count++; if (section_count == 0) return 0; return add_memory_block() So, in case of PowerNV , we will just check one section at a time and either return or call add_memory_block depending whether it is present. Now, with the current code that is something different. We now have memory_dev_init: for(nr = 0, nr <= __highest_present_section_nr; nr += 1) ret = add_boot_memory_block add_boot_memory_block: for_each_present_section_nr(base_section_nr, nr) { if (nr >= (base_section_nr + sections_per_block)) break; return add_memory_block(); } return 0; The thing is that next_present_section_nr() (which is called in for_each_present_section_nr()) will loop until we find a present section. And then we will check whether the found section is beyond base_section_nr + sections_per_block (where sections_per_block = 1). If so, we skip add_memory_block. Now, I think that the issue comes from for_each_present_section_nr having to loop a lot until we find a present section. And then the loop in memory_dev_init increments only by 1, which means that the next iteration we might have to loop a lot again to find the another present section. And so on and so forth. Maybe we can fix this by making memory_dev_init() remember in which section add_boot_memory_block returns. Something like the following (only compile-tested) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 8f3a41d9bfaa..d97635cbfd1d 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state, return 0; } -static int __init add_boot_memory_block(unsigned long base_section_nr) +static int __init add_boot_memory_block(unsigned long *base_section_nr) { + int ret; unsigned long nr; - for_each_present_section_nr(base_section_nr, nr) { - if (nr >= (base_section_nr + sections_per_block)) + for_each_present_section_nr(*base_section_nr, nr) { + if (nr >= (*base_section_nr + sections_per_block)) break; - return add_memory_block(memory_block_id(base_section_nr), - MEM_ONLINE, NULL, NULL); + ret = add_memory_block(memory_block_id(*base_section_nr), + MEM_ONLINE, NULL, NULL); + *base_section = nr; + return ret; } + if (nr == -1) + *base_section = __highest_present_section_nr + 1; + else + *base_section = nr; return 0; } @@ -973,9 +980,9 @@ void __init memory_dev_init(void) * Create entries for memory sections that were found * during boot and have been initialized */ - for (nr = 0; nr <= __highest_present_section_nr; - nr += sections_per_block) { - ret = add_boot_memory_block(nr); + nr = first_present_section_nr(); + for (; nr <= __highest_present_section_nr; nr += sections_per_block) { + ret = add_boot_memory_block(&nr); if (ret) panic("%s() failed to add memory block: %d\n", __func__, ret); @Aditya: can you please give it a try? -- Oscar Salvador SUSE Labs