linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Aditya Gupta <adityag@linux.ibm.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Danilo Krummrich <dakr@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Sourabh Jain <sourabhjain@linux.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [REPORT] Softlockups on PowerNV with upstream
Date: Thu, 10 Apr 2025 17:14:53 +0530	[thread overview]
Message-ID: <lucjfatnsaovne3tsegsrwgukgjsrhe6js6yeyl3ljq3tydqiz@mf6w2dqeyrtb> (raw)
In-Reply-To: <Z_dWTU8UsvCHFMpN@localhost.localdomain>

Hi,

On 25/04/10 07:25AM, Oscar Salvador wrote:
> On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote:
> > Hi,
> > 
> > While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.
> > 
> > I have tested it only on PowerNV systems. But some architectures/platforms also
> > might have it. PSeries systems don't have this issue though.
> > 
> > Bisect points to the following commit:
> > 
> >     commit 61659efdb35ce6c6ac7639342098f3c4548b794b
> >     Author: Gavin Shan <gshan@redhat.com>
> >     Date:   Wed Mar 12 09:30:43 2025 +1000
> > 
> >         drivers/base/memory: improve add_boot_memory_block()
> > 
> ... 
> > Console log
> > -----------
> > 
> >     [    2.783371] smp: Brought up 4 nodes, 256 CPUs
> >     [    2.783475] numa: Node 0 CPUs: 0-63
> >     [    2.783537] numa: Node 2 CPUs: 64-127
> >     [    2.783591] numa: Node 4 CPUs: 128-191
> >     [    2.783653] numa: Node 6 CPUs: 192-255
> >     [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)
> 
> If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size,
> and sections_per_block == 1 (I think).

Yes, the memory is around 700G:

    # lsmem
    RANGE                                  SIZE  STATE REMOVABLE         BLOCK
    0x0000000000000000-0x0000001fffffffff  128G online       yes         0-127
    0x0000400000000000-0x0000400fffffffff   64G online       yes   65536-65599
    0x0000800000000000-0x0000803fffffffff  256G online       yes 131072-131327
    0x0000c00000000000-0x0000c03fffffffff  256G online       yes 196608-196863
    
    Memory block size:         1G
    Total online memory:     704G
    Total offline memory:      0B

I don't know about the sections_per_block.

> 
> The code before the mentioned commit, was something like:
> 
>  for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++)
>        if (present_section_nr(nr))
>           section_count++;
> 
>  if (section_count == 0)
>     return 0;
>  return add_memory_block()
> 
> So, in case of PowerNV , we will just check one section at a time and
> either return or call add_memory_block depending whether it is present.
> 
> Now, with the current code that is something different.
> We now have 
> 
> memory_dev_init:
>  for(nr = 0, nr <= __highest_present_section_nr; nr += 1)
>      ret = add_boot_memory_block
> 
> add_boot_memory_block:
>  for_each_present_section_nr(base_section_nr, nr) {
>      if (nr >= (base_section_nr + sections_per_block))
>             break;
> 
>      return add_memory_block();
>  }
>  return 0;
> 
> The thing is that next_present_section_nr() (which is called in
> for_each_present_section_nr()) will loop until we find a present
> section.
> And then we will check whether the found section is beyond
> base_section_nr + sections_per_block (where sections_per_block = 1).
> If so, we skip add_memory_block.
> 
> Now, I think that the issue comes from for_each_present_section_nr
> having to loop a lot until we find a present section.
> And then the loop in memory_dev_init increments only by 1, which means
> that the next iteration we might have to loop a lot again to find the
> another present section. And so on and so forth.
> 
> Maybe we can fix this by making memory_dev_init() remember in which
> section add_boot_memory_block returns.
> Something like the following (only compile-tested)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8f3a41d9bfaa..d97635cbfd1d 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>  	return 0;
>  }
> 
> -static int __init add_boot_memory_block(unsigned long base_section_nr)
> +static int __init add_boot_memory_block(unsigned long *base_section_nr)
>  {
> +	int ret;
>  	unsigned long nr;
> 
> -	for_each_present_section_nr(base_section_nr, nr) {
> -		if (nr >= (base_section_nr + sections_per_block))
> +	for_each_present_section_nr(*base_section_nr, nr) {
> +		if (nr >= (*base_section_nr + sections_per_block))
>  			break;
> 
> -		return add_memory_block(memory_block_id(base_section_nr),
> -					MEM_ONLINE, NULL, NULL);
> +		ret = add_memory_block(memory_block_id(*base_section_nr),
> +				       MEM_ONLINE, NULL, NULL);
> +		*base_section = nr;
> +		return ret;
>  	}
> 
> +	if (nr == -1)
> +		*base_section = __highest_present_section_nr + 1;
> +	else
> +		*base_section = nr;
>  	return 0;
>  }
> 
> @@ -973,9 +980,9 @@ void __init memory_dev_init(void)
>  	 * Create entries for memory sections that were found
>  	 * during boot and have been initialized
>  	 */
> -	for (nr = 0; nr <= __highest_present_section_nr;
> -	     nr += sections_per_block) {
> -		ret = add_boot_memory_block(nr);
> +	nr = first_present_section_nr();
> +	for (; nr <= __highest_present_section_nr; nr += sections_per_block) {
> +		ret = add_boot_memory_block(&nr);
>  		if (ret)
>  			panic("%s() failed to add memory block: %d\n", __func__,
>  			      ret);
>  

Makes sense, thanks for the nice explanation.

> 
> @Aditya: can you please give it a try?
> 

Yes, will try it now.

Thanks,
- Aditya G

> 
> 
> -- 
> Oscar Salvador
> SUSE Labs


  parent reply	other threads:[~2025-04-10 11:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-09 18:03 Aditya Gupta
2025-04-10  1:35 ` Gavin Shan
2025-04-10 11:38   ` Aditya Gupta
2025-04-10  5:25 ` Oscar Salvador
2025-04-10  5:35   ` Gavin Shan
2025-04-10  8:23     ` Oscar Salvador
2025-04-10  9:44       ` Gavin Shan
2025-04-10 11:49         ` Aditya Gupta
2025-04-10 12:22         ` Aditya Gupta
2025-04-10 12:32           ` Gavin Shan
2025-04-10 11:44   ` Aditya Gupta [this message]
2025-04-10 12:26     ` Aditya Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lucjfatnsaovne3tsegsrwgukgjsrhe6js6yeyl3ljq3tydqiz@mf6w2dqeyrtb \
    --to=adityag@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dakr@kernel.org \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mahesh@linux.ibm.com \
    --cc=osalvador@suse.de \
    --cc=rafael@kernel.org \
    --cc=sourabhjain@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox