linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [REPORT] Softlockups on PowerNV with upstream
@ 2025-04-09 18:03 Aditya Gupta
  2025-04-10  1:35 ` Gavin Shan
  2025-04-10  5:25 ` Oscar Salvador
  0 siblings, 2 replies; 12+ messages in thread
From: Aditya Gupta @ 2025-04-09 18:03 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Danilo Krummrich, David Hildenbrand,
	Greg Kroah-Hartman, Mahesh J Salgaonkar, Oscar Salvador,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel

Hi,

While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.

I have tested it only on PowerNV systems. But some architectures/platforms also
might have it. PSeries systems don't have this issue though.

Bisect points to the following commit:

    commit 61659efdb35ce6c6ac7639342098f3c4548b794b
    Author: Gavin Shan <gshan@redhat.com>
    Date:   Wed Mar 12 09:30:43 2025 +1000

        drivers/base/memory: improve add_boot_memory_block()

        Patch series "drivers/base/memory: Two cleanups", v3.

        Two cleanups to drivers/base/memory.


        This patch (of 2)L

        It's unnecessary to count the present sections for the specified block
        since the block will be added if any section in the block is present.
        Besides, for_each_present_section_nr() can be reused as Andrew Morton
        suggested.

        Improve by using for_each_present_section_nr() and dropping the
        unnecessary @section_count.

        No functional changes intended.

        ...

Pasted the console log, bisect log, and the kernel config, below.

Thanks,
- Aditya G

Console log
-----------

    [    2.783371] smp: Brought up 4 nodes, 256 CPUs
    [    2.783475] numa: Node 0 CPUs: 0-63
    [    2.783537] numa: Node 2 CPUs: 64-127
    [    2.783591] numa: Node 4 CPUs: 128-191
    [    2.783653] numa: Node 6 CPUs: 192-255
    [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)
    [    2.892969] devtmpfs: initialized
    [   24.057853] watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1]
    [   24.057861] Modules linked in:
    [   24.057872] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY
    [   24.057879] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
    [   24.057883] NIP:  c00000000209218c LR: c000000002092204 CTR: 0000000000000000
    [   24.057886] REGS: c00040000418fa30 TRAP: 0900   Not tainted  (6.15.0-rc1-next-20250408)
    [   24.057891] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000428  XER: 00000000
    [   24.057904] CFAR: 0000000000000000 IRQMASK: 0
    [   24.057904] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
    [   24.057904] GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80
    [   24.057904] GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428
    [   24.057904] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
    [   24.057904] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [   24.057904] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [   24.057904] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [   24.057904] GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000
    [   24.057948] NIP [c00000000209218c] memory_dev_init+0x114/0x1e0
    [   24.057963] LR [c000000002092204] memory_dev_init+0x18c/0x1e0
    [   24.057968] Call Trace:
    [   24.057970] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
    [   24.057976] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
    [   24.057981] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
    [   24.057989] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
    [   24.057996] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c
    [   24.058004] --- interrupt: 0 at 0x0
    [   24.058010] Code: 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 794a1f24 7d45502a 2c2a0000 <41820020> 79282c28 7cea4214 2c270000
    ...
    [   62.952729] rcu: INFO: rcu_sched self-detected stall on CPU
    [   62.952782] rcu:     248-....: (5999 ticks this GP) idle=5884/1/0x4000000000000002 softirq=81/81 fqs=1997
    [   62.952965] rcu:     (t=6000 jiffies g=-1015 q=1 ncpus=256)
    [   62.953050] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Tainted: G             L      6.15.0-rc1-next-20250408 #1 VOLUNTARY
    [   62.953055] Tainted: [L]=SOFTLOCKUP
    [   62.953057] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
    [   62.953059] NIP:  c000000002092180 LR: c000000002092204 CTR: 0000000000000000
    [   62.953062] REGS: c00040000418fa30 TRAP: 0900   Tainted: G             L       (6.15.0-rc1-next-20250408)
    [   62.953065] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 88000428  XER: 00000000
    [   62.953076] CFAR: 0000000000000000 IRQMASK: 0
    [   62.953076] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
    [   62.953076] GPR04: 0000000000035940 c000c03ffebabb00 0000000000c03fff c000400fff587f80
    [   62.953076] GPR08: 0000000000000000 00000000002c390b 0000000000000587 0000000028000428
    [   62.953076] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
    [   62.953076] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [   62.953076] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [   62.953076] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
    [   62.953076] GPR28: c000000002df7f70 0000000000035900 c0000000011dd898 0000000008000000
    [   62.953117] NIP [c000000002092180] memory_dev_init+0x108/0x1e0
    [   62.953121] LR [c000000002092204] memory_dev_init+0x18c/0x1e0
    [   62.953125] Call Trace:
    [   62.953126] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
    [   62.953131] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
    [   62.953135] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
    [   62.953141] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
    [   62.953146] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c
    [   62.953152] --- interrupt: 0 at 0x0
    [   62.953155] Code: 4181ffe8 3d22012f 3949fe68 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 <794a1f24> 7d45502a 2c2a0000 41820020

Bisect Log
----------

    git bisect start
    # status: waiting for both good and bad commits
    # good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
    git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
    # status: waiting for bad commit, 1 good commit known
    # bad: [7702d0130dc002bab2c3571ddb6ff68f82d99aea] Add linux-next specific files for 20250408
    git bisect bad 7702d0130dc002bab2c3571ddb6ff68f82d99aea
    # good: [390513642ee6763c7ada07f0a1470474986e6c1c] io_uring: always do atomic put from iowq
    git bisect good 390513642ee6763c7ada07f0a1470474986e6c1c
    # bad: [eb0ece16027f8223d5dc9aaf90124f70577bd22a] Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
    git bisect bad eb0ece16027f8223d5dc9aaf90124f70577bd22a
    # good: [7d06015d936c861160803e020f68f413b5c3cd9d] Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
    git bisect good 7d06015d936c861160803e020f68f413b5c3cd9d
    # good: [fa593d0f969dcfa41d390822fdf1a0ab48cd882c] Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
    git bisect good fa593d0f969dcfa41d390822fdf1a0ab48cd882c
    # good: [f64a72bc767f6e9ddb18fdacaeb99708c4810ada] Merge tag 'v6.15rc-part1-ksmbd-server-fixes' of git://git.samba.org/ksmbd
    git bisect good f64a72bc767f6e9ddb18fdacaeb99708c4810ada
    # good: [a14efee04796dd3f614eaf5348ca1ac099c21349] mm/page_alloc: clarify should_claim_block() commentary
    git bisect good a14efee04796dd3f614eaf5348ca1ac099c21349
    # good: [f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112] mm/vmalloc: refactor __vmalloc_node_range_noprof()
    git bisect good f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112
    # bad: [735b3f7e773bd09d459537562754debd1f8e816b] selftests/mm: uffd-unit-tests support for hugepages > 2M
    git bisect bad 735b3f7e773bd09d459537562754debd1f8e816b
    # bad: [d2734f044f84833b2c9ec1b71b542d299d35202b] mm: memory-failure: enhance comments for return value of memory_failure()
    git bisect bad d2734f044f84833b2c9ec1b71b542d299d35202b
    # bad: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block()
    git bisect bad 61659efdb35ce6c6ac7639342098f3c4548b794b
    # good: [58729c04cf1092b87aeef0bf0998c9e2e4771133] mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
    git bisect good 58729c04cf1092b87aeef0bf0998c9e2e4771133
    # good: [80a5c494c89f73907ed659a9233a70253774cdae] selftests/mm: add tests for folio_split(), buddy allocator like split
    git bisect good 80a5c494c89f73907ed659a9233a70253774cdae
    # good: [d53c78fffe7ad364397c693522ceb4d152c2aacd] mm/shmem: use xas_try_split() in shmem_split_large_entry()
    git bisect good d53c78fffe7ad364397c693522ceb4d152c2aacd
    # good: [c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f] mm/damon/sysfs-schemes: avoid Wformat-security warning on damon_sysfs_access_pattern_add_range_dir()
    git bisect good c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f
    # first bad commit: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block()

To Reproduce the issue
----------------------

Build the upstream kernel and boot on a PowerNV Power10 hardware

Kernel config
-------------

This should occur with any default configs you may have, or can use the following:

https://gist.github.com/adi-g15-ibm/6eb03cea2c6202e5eb017abd3819a491

CC list
-------

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org
To: linux-mm@kvack.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-09 18:03 [REPORT] Softlockups on PowerNV with upstream Aditya Gupta
@ 2025-04-10  1:35 ` Gavin Shan
  2025-04-10 11:38   ` Aditya Gupta
  2025-04-10  5:25 ` Oscar Salvador
  1 sibling, 1 reply; 12+ messages in thread
From: Gavin Shan @ 2025-04-10  1:35 UTC (permalink / raw)
  To: Aditya Gupta, linux-mm
  Cc: Andrew Morton, Danilo Krummrich, David Hildenbrand,
	Greg Kroah-Hartman, Mahesh J Salgaonkar, Oscar Salvador,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel, Gavin Shan,
	Gavin Shan

Hi Aditya,

On 4/10/25 4:03 AM, Aditya Gupta wrote:
> 
> While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.
> 
> I have tested it only on PowerNV systems. But some architectures/platforms also
> might have it. PSeries systems don't have this issue though.
> 
> Bisect points to the following commit:
> 
>      commit 61659efdb35ce6c6ac7639342098f3c4548b794b
>      Author: Gavin Shan <gshan@redhat.com>
>      Date:   Wed Mar 12 09:30:43 2025 +1000
> 
>          drivers/base/memory: improve add_boot_memory_block()
> 
>          Patch series "drivers/base/memory: Two cleanups", v3.
> 
>          Two cleanups to drivers/base/memory.
> 
> 
>          This patch (of 2)L
> 
>          It's unnecessary to count the present sections for the specified block
>          since the block will be added if any section in the block is present.
>          Besides, for_each_present_section_nr() can be reused as Andrew Morton
>          suggested.
> 
>          Improve by using for_each_present_section_nr() and dropping the
>          unnecessary @section_count.
> 
>          No functional changes intended.
> 
>          ...
> 
> Pasted the console log, bisect log, and the kernel config, below.
> 

I don't see how 61659efdb35ce ("drivers/base/memory: improve add_boot_memory_block()")
causes any logical changes. Could you help to revert it on top of v6.15.rc1 to confirm
the RCU stall and softlockup issue is still existing?

At present, I don't have access to a Power10 machine, but I will check around.

> Thanks,
> - Aditya G
> 
> Console log
> -----------
> 
>      [    2.783371] smp: Brought up 4 nodes, 256 CPUs
>      [    2.783475] numa: Node 0 CPUs: 0-63
>      [    2.783537] numa: Node 2 CPUs: 64-127
>      [    2.783591] numa: Node 4 CPUs: 128-191
>      [    2.783653] numa: Node 6 CPUs: 192-255
>      [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)

The NUMA node number leaps by one. It seems the machine has 800GB memory if I'm correct.

>      [    2.892969] devtmpfs: initialized
>      [   24.057853] watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1]
>      [   24.057861] Modules linked in:
>      [   24.057872] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY
>      [   24.057879] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
>      [   24.057883] NIP:  c00000000209218c LR: c000000002092204 CTR: 0000000000000000
>      [   24.057886] REGS: c00040000418fa30 TRAP: 0900   Not tainted  (6.15.0-rc1-next-20250408)
>      [   24.057891] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000428  XER: 00000000
>      [   24.057904] CFAR: 0000000000000000 IRQMASK: 0
>      [   24.057904] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
>      [   24.057904] GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80
>      [   24.057904] GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428
>      [   24.057904] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
>      [   24.057904] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>      [   24.057904] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>      [   24.057904] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>      [   24.057904] GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000
>      [   24.057948] NIP [c00000000209218c] memory_dev_init+0x114/0x1e0
>      [   24.057963] LR [c000000002092204] memory_dev_init+0x18c/0x1e0
>      [   24.057968] Call Trace:
>      [   24.057970] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
>      [   24.057976] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
>      [   24.057981] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
>      [   24.057989] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
>      [   24.057996] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c
>      [   24.058004] --- interrupt: 0 at 0x0
>      [   24.058010] Code: 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 794a1f24 7d45502a 2c2a0000 <41820020> 79282c28 7cea4214 2c270000
>      ...
>      [   62.952729] rcu: INFO: rcu_sched self-detected stall on CPU
>      [   62.952782] rcu:     248-....: (5999 ticks this GP) idle=5884/1/0x4000000000000002 softirq=81/81 fqs=1997
>      [   62.952965] rcu:     (t=6000 jiffies g=-1015 q=1 ncpus=256)
>      [   62.953050] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Tainted: G             L      6.15.0-rc1-next-20250408 #1 VOLUNTARY
>      [   62.953055] Tainted: [L]=SOFTLOCKUP
>      [   62.953057] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
>      [   62.953059] NIP:  c000000002092180 LR: c000000002092204 CTR: 0000000000000000
>      [   62.953062] REGS: c00040000418fa30 TRAP: 0900   Tainted: G             L       (6.15.0-rc1-next-20250408)
>      [   62.953065] MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 88000428  XER: 00000000
>      [   62.953076] CFAR: 0000000000000000 IRQMASK: 0
>      [   62.953076] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
>      [   62.953076] GPR04: 0000000000035940 c000c03ffebabb00 0000000000c03fff c000400fff587f80
>      [   62.953076] GPR08: 0000000000000000 00000000002c390b 0000000000000587 0000000028000428
>      [   62.953076] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
>      [   62.953076] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>      [   62.953076] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>      [   62.953076] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>      [   62.953076] GPR28: c000000002df7f70 0000000000035900 c0000000011dd898 0000000008000000
>      [   62.953117] NIP [c000000002092180] memory_dev_init+0x108/0x1e0
>      [   62.953121] LR [c000000002092204] memory_dev_init+0x18c/0x1e0
>      [   62.953125] Call Trace:
>      [   62.953126] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
>      [   62.953131] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
>      [   62.953135] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
>      [   62.953141] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
>      [   62.953146] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c
>      [   62.953152] --- interrupt: 0 at 0x0
>      [   62.953155] Code: 4181ffe8 3d22012f 3949fe68 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 <794a1f24> 7d45502a 2c2a0000 41820020
> 
> Bisect Log
> ----------
> 
>      git bisect start
>      # status: waiting for both good and bad commits
>      # good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
>      git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
>      # status: waiting for bad commit, 1 good commit known
>      # bad: [7702d0130dc002bab2c3571ddb6ff68f82d99aea] Add linux-next specific files for 20250408
>      git bisect bad 7702d0130dc002bab2c3571ddb6ff68f82d99aea
>      # good: [390513642ee6763c7ada07f0a1470474986e6c1c] io_uring: always do atomic put from iowq
>      git bisect good 390513642ee6763c7ada07f0a1470474986e6c1c
>      # bad: [eb0ece16027f8223d5dc9aaf90124f70577bd22a] Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>      git bisect bad eb0ece16027f8223d5dc9aaf90124f70577bd22a
>      # good: [7d06015d936c861160803e020f68f413b5c3cd9d] Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
>      git bisect good 7d06015d936c861160803e020f68f413b5c3cd9d
>      # good: [fa593d0f969dcfa41d390822fdf1a0ab48cd882c] Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
>      git bisect good fa593d0f969dcfa41d390822fdf1a0ab48cd882c
>      # good: [f64a72bc767f6e9ddb18fdacaeb99708c4810ada] Merge tag 'v6.15rc-part1-ksmbd-server-fixes' of git://git.samba.org/ksmbd
>      git bisect good f64a72bc767f6e9ddb18fdacaeb99708c4810ada
>      # good: [a14efee04796dd3f614eaf5348ca1ac099c21349] mm/page_alloc: clarify should_claim_block() commentary
>      git bisect good a14efee04796dd3f614eaf5348ca1ac099c21349
>      # good: [f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112] mm/vmalloc: refactor __vmalloc_node_range_noprof()
>      git bisect good f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112
>      # bad: [735b3f7e773bd09d459537562754debd1f8e816b] selftests/mm: uffd-unit-tests support for hugepages > 2M
>      git bisect bad 735b3f7e773bd09d459537562754debd1f8e816b
>      # bad: [d2734f044f84833b2c9ec1b71b542d299d35202b] mm: memory-failure: enhance comments for return value of memory_failure()
>      git bisect bad d2734f044f84833b2c9ec1b71b542d299d35202b
>      # bad: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block()
>      git bisect bad 61659efdb35ce6c6ac7639342098f3c4548b794b
>      # good: [58729c04cf1092b87aeef0bf0998c9e2e4771133] mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
>      git bisect good 58729c04cf1092b87aeef0bf0998c9e2e4771133
>      # good: [80a5c494c89f73907ed659a9233a70253774cdae] selftests/mm: add tests for folio_split(), buddy allocator like split
>      git bisect good 80a5c494c89f73907ed659a9233a70253774cdae
>      # good: [d53c78fffe7ad364397c693522ceb4d152c2aacd] mm/shmem: use xas_try_split() in shmem_split_large_entry()
>      git bisect good d53c78fffe7ad364397c693522ceb4d152c2aacd
>      # good: [c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f] mm/damon/sysfs-schemes: avoid Wformat-security warning on damon_sysfs_access_pattern_add_range_dir()
>      git bisect good c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f
>      # first bad commit: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block()
> 
> To Reproduce the issue
> ----------------------
> 
> Build the upstream kernel and boot on a PowerNV Power10 hardware
> 
> Kernel config
> -------------
> 
> This should occur with any default configs you may have, or can use the following:
> 
> https://gist.github.com/adi-g15-ibm/6eb03cea2c6202e5eb017abd3819a491
> 
> CC list
> -------
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: linux-kernel@vger.kernel.org
> To: linux-mm@kvack.org
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-09 18:03 [REPORT] Softlockups on PowerNV with upstream Aditya Gupta
  2025-04-10  1:35 ` Gavin Shan
@ 2025-04-10  5:25 ` Oscar Salvador
  2025-04-10  5:35   ` Gavin Shan
  2025-04-10 11:44   ` Aditya Gupta
  1 sibling, 2 replies; 12+ messages in thread
From: Oscar Salvador @ 2025-04-10  5:25 UTC (permalink / raw)
  To: Aditya Gupta
  Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand,
	Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki,
	Sourabh Jain, linux-kernel

On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote:
> Hi,
> 
> While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.
> 
> I have tested it only on PowerNV systems. But some architectures/platforms also
> might have it. PSeries systems don't have this issue though.
> 
> Bisect points to the following commit:
> 
>     commit 61659efdb35ce6c6ac7639342098f3c4548b794b
>     Author: Gavin Shan <gshan@redhat.com>
>     Date:   Wed Mar 12 09:30:43 2025 +1000
> 
>         drivers/base/memory: improve add_boot_memory_block()
> 
... 
> Console log
> -----------
> 
>     [    2.783371] smp: Brought up 4 nodes, 256 CPUs
>     [    2.783475] numa: Node 0 CPUs: 0-63
>     [    2.783537] numa: Node 2 CPUs: 64-127
>     [    2.783591] numa: Node 4 CPUs: 128-191
>     [    2.783653] numa: Node 6 CPUs: 192-255
>     [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)

If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size,
and sections_per_block == 1 (I think).

The code before the mentioned commit, was something like:

 for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++)
       if (present_section_nr(nr))
          section_count++;

 if (section_count == 0)
    return 0;
 return add_memory_block()

So, in case of PowerNV , we will just check one section at a time and
either return or call add_memory_block depending whether it is present.

Now, with the current code that is something different.
We now have 

memory_dev_init:
 for(nr = 0, nr <= __highest_present_section_nr; nr += 1)
     ret = add_boot_memory_block

add_boot_memory_block:
 for_each_present_section_nr(base_section_nr, nr) {
     if (nr >= (base_section_nr + sections_per_block))
            break;

     return add_memory_block();
 }
 return 0;

The thing is that next_present_section_nr() (which is called in
for_each_present_section_nr()) will loop until we find a present
section.
And then we will check whether the found section is beyond
base_section_nr + sections_per_block (where sections_per_block = 1).
If so, we skip add_memory_block.

Now, I think that the issue comes from for_each_present_section_nr
having to loop a lot until we find a present section.
And then the loop in memory_dev_init increments only by 1, which means
that the next iteration we might have to loop a lot again to find the
another present section. And so on and so forth.

Maybe we can fix this by making memory_dev_init() remember in which
section add_boot_memory_block returns.
Something like the following (only compile-tested)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8f3a41d9bfaa..d97635cbfd1d 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	return 0;
 }

-static int __init add_boot_memory_block(unsigned long base_section_nr)
+static int __init add_boot_memory_block(unsigned long *base_section_nr)
 {
+	int ret;
 	unsigned long nr;

-	for_each_present_section_nr(base_section_nr, nr) {
-		if (nr >= (base_section_nr + sections_per_block))
+	for_each_present_section_nr(*base_section_nr, nr) {
+		if (nr >= (*base_section_nr + sections_per_block))
 			break;

-		return add_memory_block(memory_block_id(base_section_nr),
-					MEM_ONLINE, NULL, NULL);
+		ret = add_memory_block(memory_block_id(*base_section_nr),
+				       MEM_ONLINE, NULL, NULL);
+		*base_section = nr;
+		return ret;
 	}

+	if (nr == -1)
+		*base_section = __highest_present_section_nr + 1;
+	else
+		*base_section = nr;
 	return 0;
 }

@@ -973,9 +980,9 @@ void __init memory_dev_init(void)
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized
 	 */
-	for (nr = 0; nr <= __highest_present_section_nr;
-	     nr += sections_per_block) {
-		ret = add_boot_memory_block(nr);
+	nr = first_present_section_nr();
+	for (; nr <= __highest_present_section_nr; nr += sections_per_block) {
+		ret = add_boot_memory_block(&nr);
 		if (ret)
 			panic("%s() failed to add memory block: %d\n", __func__,
 			      ret);
 

@Aditya: can you please give it a try?



-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10  5:25 ` Oscar Salvador
@ 2025-04-10  5:35   ` Gavin Shan
  2025-04-10  8:23     ` Oscar Salvador
  2025-04-10 11:44   ` Aditya Gupta
  1 sibling, 1 reply; 12+ messages in thread
From: Gavin Shan @ 2025-04-10  5:35 UTC (permalink / raw)
  To: Oscar Salvador, Aditya Gupta
  Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand,
	Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki,
	Sourabh Jain, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4874 bytes --]

On 4/10/25 3:25 PM, Oscar Salvador wrote:
> On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote:
>> Hi,
>>
>> While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.
>>
>> I have tested it only on PowerNV systems. But some architectures/platforms also
>> might have it. PSeries systems don't have this issue though.
>>
>> Bisect points to the following commit:
>>
>>      commit 61659efdb35ce6c6ac7639342098f3c4548b794b
>>      Author: Gavin Shan <gshan@redhat.com>
>>      Date:   Wed Mar 12 09:30:43 2025 +1000
>>
>>          drivers/base/memory: improve add_boot_memory_block()
>>
> ...
>> Console log
>> -----------
>>
>>      [    2.783371] smp: Brought up 4 nodes, 256 CPUs
>>      [    2.783475] numa: Node 0 CPUs: 0-63
>>      [    2.783537] numa: Node 2 CPUs: 64-127
>>      [    2.783591] numa: Node 4 CPUs: 128-191
>>      [    2.783653] numa: Node 6 CPUs: 192-255
>>      [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)
> 
> If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size,
> and sections_per_block == 1 (I think).
> 
> The code before the mentioned commit, was something like:
> 
>   for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++)
>         if (present_section_nr(nr))
>            section_count++;
> 
>   if (section_count == 0)
>      return 0;
>   return add_memory_block()
> 
> So, in case of PowerNV , we will just check one section at a time and
> either return or call add_memory_block depending whether it is present.
> 
> Now, with the current code that is something different.
> We now have
> 
> memory_dev_init:
>   for(nr = 0, nr <= __highest_present_section_nr; nr += 1)
>       ret = add_boot_memory_block
> 
> add_boot_memory_block:
>   for_each_present_section_nr(base_section_nr, nr) {
>       if (nr >= (base_section_nr + sections_per_block))
>              break;
> 
>       return add_memory_block();
>   }
>   return 0;
> 
> The thing is that next_present_section_nr() (which is called in
> for_each_present_section_nr()) will loop until we find a present
> section.
> And then we will check whether the found section is beyond
> base_section_nr + sections_per_block (where sections_per_block = 1).
> If so, we skip add_memory_block.
> 
> Now, I think that the issue comes from for_each_present_section_nr
> having to loop a lot until we find a present section.
> And then the loop in memory_dev_init increments only by 1, which means
> that the next iteration we might have to loop a lot again to find the
> another present section. And so on and so forth.
> 
> Maybe we can fix this by making memory_dev_init() remember in which
> section add_boot_memory_block returns.
> Something like the following (only compile-tested)
> 

Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr().
I already had the fix, working on IBM's Power9 machine, where the issue can be
reproduced. Please see the attached patch.

I'm having most tests on ARM64 machine for the fix.

Thanks,
Gavin

> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8f3a41d9bfaa..d97635cbfd1d 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>   	return 0;
>   }
> 
> -static int __init add_boot_memory_block(unsigned long base_section_nr)
> +static int __init add_boot_memory_block(unsigned long *base_section_nr)
>   {
> +	int ret;
>   	unsigned long nr;
> 
> -	for_each_present_section_nr(base_section_nr, nr) {
> -		if (nr >= (base_section_nr + sections_per_block))
> +	for_each_present_section_nr(*base_section_nr, nr) {
> +		if (nr >= (*base_section_nr + sections_per_block))
>   			break;
> 
> -		return add_memory_block(memory_block_id(base_section_nr),
> -					MEM_ONLINE, NULL, NULL);
> +		ret = add_memory_block(memory_block_id(*base_section_nr),
> +				       MEM_ONLINE, NULL, NULL);
> +		*base_section = nr;
> +		return ret;
>   	}
> 
> +	if (nr == -1)
> +		*base_section = __highest_present_section_nr + 1;
> +	else
> +		*base_section = nr;
>   	return 0;
>   }
> 
> @@ -973,9 +980,9 @@ void __init memory_dev_init(void)
>   	 * Create entries for memory sections that were found
>   	 * during boot and have been initialized
>   	 */
> -	for (nr = 0; nr <= __highest_present_section_nr;
> -	     nr += sections_per_block) {
> -		ret = add_boot_memory_block(nr);
> +	nr = first_present_section_nr();
> +	for (; nr <= __highest_present_section_nr; nr += sections_per_block) {
> +		ret = add_boot_memory_block(&nr);
>   		if (ret)
>   			panic("%s() failed to add memory block: %d\n", __func__,
>   			      ret);
>   
> 
> @Aditya: can you please give it a try?
> 
> 
> 

[-- Attachment #2: 0001-drivers-base-memory-Avoid-overhead-for_each_present_.patch --]
[-- Type: text/x-patch, Size: 4736 bytes --]

From d4c43d5f6b962144c4f47d46a66284df92da285e Mon Sep 17 00:00:00 2001
From: Gavin Shan <gshan@redhat.com>
Date: Thu, 10 Apr 2025 14:43:46 +1000
Subject: [PATCH] drivers/base/memory: Avoid overhead
 for_each_present_section_nr()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

for_each_present_section_nr() was introduced to add_boot_memory_block()
by commit 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()").
It causes unnecessary overhead when the present sections are really
sparse. next_present_section_nr() called by the macro finds the next
present section, which is far away from the spanning sections in the
specified block. Too much time consumed by next_present_section_nr()
in this case, which can lead to softlockup as observed by Aditya Gupta
on IBM Power10 machine.

  watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1]
  Modules linked in:
  CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY
  Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
  NIP:  c00000000209218c LR: c000000002092204 CTR: 0000000000000000
  REGS: c00040000418fa30 TRAP: 0900   Not tainted  (6.15.0-rc1-next-20250408)
  MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000428  XER: 00000000
  CFAR: 0000000000000000 IRQMASK: 0
  GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
  GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80
  GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428
  GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000
  NIP [c00000000209218c] memory_dev_init+0x114/0x1e0
  LR [c000000002092204] memory_dev_init+0x18c/0x1e0
  Call Trace:
  [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
  [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
  [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
  [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
  [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c

Avoid the overhead by folding for_each_present_section_nr() to the outer
loop. add_boot_memory_block() is dropped after that.

Fixes: 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()")
Closes: https://lore.kernel.org/linux-mm/20250409180344.477916-1-adityag@linux.ibm.com
Reported-by: Aditya Gupta <adityag@linux.ibm.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 drivers/base/memory.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8f3a41d9bfaa..433a5fe96304 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -816,21 +816,6 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
 	return 0;
 }
 
-static int __init add_boot_memory_block(unsigned long base_section_nr)
-{
-	unsigned long nr;
-
-	for_each_present_section_nr(base_section_nr, nr) {
-		if (nr >= (base_section_nr + sections_per_block))
-			break;
-
-		return add_memory_block(memory_block_id(base_section_nr),
-					MEM_ONLINE, NULL, NULL);
-	}
-
-	return 0;
-}
-
 static int add_hotplug_memory_block(unsigned long block_id,
 				    struct vmem_altmap *altmap,
 				    struct memory_group *group)
@@ -957,7 +942,7 @@ static const struct attribute_group *memory_root_attr_groups[] = {
 void __init memory_dev_init(void)
 {
 	int ret;
-	unsigned long block_sz, nr;
+	unsigned long block_sz, block_id, nr;
 
 	/* Validate the configured memory block size */
 	block_sz = memory_block_size_bytes();
@@ -973,12 +958,17 @@ void __init memory_dev_init(void)
 	 * Create entries for memory sections that were found
 	 * during boot and have been initialized
 	 */
-	for (nr = 0; nr <= __highest_present_section_nr;
-	     nr += sections_per_block) {
-		ret = add_boot_memory_block(nr);
-		if (ret)
-			panic("%s() failed to add memory block: %d\n", __func__,
-			      ret);
+	block_id = ULONG_MAX;
+	for_each_present_section_nr(0, nr) {
+		if (block_id != ULONG_MAX && memory_block_id(nr) == block_id)
+			continue;
+
+		block_id = memory_block_id(nr);
+		ret = add_memory_block(block_id, MEM_ONLINE, NULL, NULL);
+		if (ret) {
+			panic("%s() failed to add memory block: %d\n",
+			      __func__, ret);
+		}
 	}
 }
 
-- 
2.48.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10  5:35   ` Gavin Shan
@ 2025-04-10  8:23     ` Oscar Salvador
  2025-04-10  9:44       ` Gavin Shan
  0 siblings, 1 reply; 12+ messages in thread
From: Oscar Salvador @ 2025-04-10  8:23 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Aditya Gupta, linux-mm, Andrew Morton, Danilo Krummrich,
	David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel

On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote:
> Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr().
> I already had the fix, working on IBM's Power9 machine, where the issue can be
> reproduced. Please see the attached patch.
> 
> I'm having most tests on ARM64 machine for the fix.

Looks good to me.
But we need a comment explaining why block_id is set to ULONG_MAX
at the beginning as this might not be obvious.

Also, do we need
 if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ?

Cannot just be

 if (memory_block_id(nr) == block_id) ?

AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX'
will evaluate false and and we will set block_id afterwards.

Either way looks fine to me.
Another way I guess would be:


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10  8:23     ` Oscar Salvador
@ 2025-04-10  9:44       ` Gavin Shan
  2025-04-10 11:49         ` Aditya Gupta
  2025-04-10 12:22         ` Aditya Gupta
  0 siblings, 2 replies; 12+ messages in thread
From: Gavin Shan @ 2025-04-10  9:44 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Aditya Gupta, linux-mm, Andrew Morton, Danilo Krummrich,
	David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel

On 4/10/25 6:23 PM, Oscar Salvador wrote:
> On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote:
>> Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr().
>> I already had the fix, working on IBM's Power9 machine, where the issue can be
>> reproduced. Please see the attached patch.
>>
>> I'm having most tests on ARM64 machine for the fix.
> 
> Looks good to me.
> But we need a comment explaining why block_id is set to ULONG_MAX
> at the beginning as this might not be obvious.
> 
> Also, do we need
>   if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ?
> 
> Cannot just be
> 
>   if (memory_block_id(nr) == block_id) ?
> 
> AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX'
> will evaluate false and and we will set block_id afterwards.
> 
> Either way looks fine to me.
> Another way I guess would be:
> 

Yeah, we need to record the last handled block ID by @block_id. For the
first time to register the block memory device in the loop, @block_id needs
to be invalid (ULONG_MAX), bypassing the check of 'memory_block_id(nr) == block_id'.
I will post the fix for review after Aditya confirms it works for him, with extra
comment to explain why @block_id is initialized to ULONG_MAX.

Aditya, please have a try when you get a chance, thanks! I verified it on Power9
machine where the issue exists and on one of my ARM64 machine.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10  1:35 ` Gavin Shan
@ 2025-04-10 11:38   ` Aditya Gupta
  0 siblings, 0 replies; 12+ messages in thread
From: Aditya Gupta @ 2025-04-10 11:38 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand,
	Greg Kroah-Hartman, Mahesh J Salgaonkar, Oscar Salvador,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel, Gavin Shan

Hi Gavin,

Sorry for the late reply.

On 25/04/10 11:35AM, Gavin Shan wrote:
> >      [    2.783371] smp: Brought up 4 nodes, 256 CPUs
> >      [    2.783475] numa: Node 0 CPUs: 0-63
> >      [    2.783537] numa: Node 2 CPUs: 64-127
> >      [    2.783591] numa: Node 4 CPUs: 128-191
> >      [    2.783653] numa: Node 6 CPUs: 192-255
> >      [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)
> 
> The NUMA node number leaps by one. It seems the machine has 800GB memory if I'm correct.

Yes Gavin, almost 700G:

    # lsmem
    RANGE                                  SIZE  STATE REMOVABLE         BLOCK
    0x0000000000000000-0x0000001fffffffff  128G online       yes         0-127
    0x0000400000000000-0x0000400fffffffff   64G online       yes   65536-65599
    0x0000800000000000-0x0000803fffffffff  256G online       yes 131072-131327
    0x0000c00000000000-0x0000c03fffffffff  256G online       yes 196608-196863
    
    Memory block size:         1G
    Total online memory:     704G
    Total offline memory:      0B

Thanks,
- Aditya G



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10  5:25 ` Oscar Salvador
  2025-04-10  5:35   ` Gavin Shan
@ 2025-04-10 11:44   ` Aditya Gupta
  2025-04-10 12:26     ` Aditya Gupta
  1 sibling, 1 reply; 12+ messages in thread
From: Aditya Gupta @ 2025-04-10 11:44 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand,
	Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki,
	Sourabh Jain, linux-kernel

Hi,

On 25/04/10 07:25AM, Oscar Salvador wrote:
> On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote:
> > Hi,
> > 
> > While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.
> > 
> > I have tested it only on PowerNV systems. But some architectures/platforms also
> > might have it. PSeries systems don't have this issue though.
> > 
> > Bisect points to the following commit:
> > 
> >     commit 61659efdb35ce6c6ac7639342098f3c4548b794b
> >     Author: Gavin Shan <gshan@redhat.com>
> >     Date:   Wed Mar 12 09:30:43 2025 +1000
> > 
> >         drivers/base/memory: improve add_boot_memory_block()
> > 
> ... 
> > Console log
> > -----------
> > 
> >     [    2.783371] smp: Brought up 4 nodes, 256 CPUs
> >     [    2.783475] numa: Node 0 CPUs: 0-63
> >     [    2.783537] numa: Node 2 CPUs: 64-127
> >     [    2.783591] numa: Node 4 CPUs: 128-191
> >     [    2.783653] numa: Node 6 CPUs: 192-255
> >     [    2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)
> 
> If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size,
> and sections_per_block == 1 (I think).

Yes, the memory is around 700G:

    # lsmem
    RANGE                                  SIZE  STATE REMOVABLE         BLOCK
    0x0000000000000000-0x0000001fffffffff  128G online       yes         0-127
    0x0000400000000000-0x0000400fffffffff   64G online       yes   65536-65599
    0x0000800000000000-0x0000803fffffffff  256G online       yes 131072-131327
    0x0000c00000000000-0x0000c03fffffffff  256G online       yes 196608-196863
    
    Memory block size:         1G
    Total online memory:     704G
    Total offline memory:      0B

I don't know about the sections_per_block.

> 
> The code before the mentioned commit, was something like:
> 
>  for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++)
>        if (present_section_nr(nr))
>           section_count++;
> 
>  if (section_count == 0)
>     return 0;
>  return add_memory_block()
> 
> So, in case of PowerNV , we will just check one section at a time and
> either return or call add_memory_block depending whether it is present.
> 
> Now, with the current code that is something different.
> We now have 
> 
> memory_dev_init:
>  for(nr = 0, nr <= __highest_present_section_nr; nr += 1)
>      ret = add_boot_memory_block
> 
> add_boot_memory_block:
>  for_each_present_section_nr(base_section_nr, nr) {
>      if (nr >= (base_section_nr + sections_per_block))
>             break;
> 
>      return add_memory_block();
>  }
>  return 0;
> 
> The thing is that next_present_section_nr() (which is called in
> for_each_present_section_nr()) will loop until we find a present
> section.
> And then we will check whether the found section is beyond
> base_section_nr + sections_per_block (where sections_per_block = 1).
> If so, we skip add_memory_block.
> 
> Now, I think that the issue comes from for_each_present_section_nr
> having to loop a lot until we find a present section.
> And then the loop in memory_dev_init increments only by 1, which means
> that the next iteration we might have to loop a lot again to find the
> another present section. And so on and so forth.
> 
> Maybe we can fix this by making memory_dev_init() remember in which
> section add_boot_memory_block returns.
> Something like the following (only compile-tested)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 8f3a41d9bfaa..d97635cbfd1d 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state,
>  	return 0;
>  }
> 
> -static int __init add_boot_memory_block(unsigned long base_section_nr)
> +static int __init add_boot_memory_block(unsigned long *base_section_nr)
>  {
> +	int ret;
>  	unsigned long nr;
> 
> -	for_each_present_section_nr(base_section_nr, nr) {
> -		if (nr >= (base_section_nr + sections_per_block))
> +	for_each_present_section_nr(*base_section_nr, nr) {
> +		if (nr >= (*base_section_nr + sections_per_block))
>  			break;
> 
> -		return add_memory_block(memory_block_id(base_section_nr),
> -					MEM_ONLINE, NULL, NULL);
> +		ret = add_memory_block(memory_block_id(*base_section_nr),
> +				       MEM_ONLINE, NULL, NULL);
> +		*base_section = nr;
> +		return ret;
>  	}
> 
> +	if (nr == -1)
> +		*base_section = __highest_present_section_nr + 1;
> +	else
> +		*base_section = nr;
>  	return 0;
>  }
> 
> @@ -973,9 +980,9 @@ void __init memory_dev_init(void)
>  	 * Create entries for memory sections that were found
>  	 * during boot and have been initialized
>  	 */
> -	for (nr = 0; nr <= __highest_present_section_nr;
> -	     nr += sections_per_block) {
> -		ret = add_boot_memory_block(nr);
> +	nr = first_present_section_nr();
> +	for (; nr <= __highest_present_section_nr; nr += sections_per_block) {
> +		ret = add_boot_memory_block(&nr);
>  		if (ret)
>  			panic("%s() failed to add memory block: %d\n", __func__,
>  			      ret);
>  

Makes sense, thanks for the nice explanation.

> 
> @Aditya: can you please give it a try?
> 

Yes, will try it now.

Thanks,
- Aditya G

> 
> 
> -- 
> Oscar Salvador
> SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10  9:44       ` Gavin Shan
@ 2025-04-10 11:49         ` Aditya Gupta
  2025-04-10 12:22         ` Aditya Gupta
  1 sibling, 0 replies; 12+ messages in thread
From: Aditya Gupta @ 2025-04-10 11:49 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Oscar Salvador, linux-mm, Andrew Morton, Danilo Krummrich,
	David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel

Hi Gavin,

On 25/04/10 07:44PM, Gavin Shan wrote:
> > <...snip...>
>
> Aditya, please have a try when you get a chance, thanks! I verified it on Power9
> machine where the issue exists and on one of my ARM64 machine.

Yes Gavin, will try the patch and then reply.

Thanks,
- Aditya G

> 
> Thanks,
> Gavin
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10  9:44       ` Gavin Shan
  2025-04-10 11:49         ` Aditya Gupta
@ 2025-04-10 12:22         ` Aditya Gupta
  2025-04-10 12:32           ` Gavin Shan
  1 sibling, 1 reply; 12+ messages in thread
From: Aditya Gupta @ 2025-04-10 12:22 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Oscar Salvador, linux-mm, Andrew Morton, Danilo Krummrich,
	David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel, Donet Tom

Cc +donet

On 25/04/10 07:44PM, Gavin Shan wrote:
> On 4/10/25 6:23 PM, Oscar Salvador wrote:
> > On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote:
> > > Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr().
> > > I already had the fix, working on IBM's Power9 machine, where the issue can be
> > > reproduced. Please see the attached patch.
> > > 
> > > I'm having most tests on ARM64 machine for the fix.
> > 
> > Looks good to me.
> > But we need a comment explaining why block_id is set to ULONG_MAX
> > at the beginning as this might not be obvious.
> > 
> > Also, do we need
> >   if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ?
> > 
> > Cannot just be
> > 
> >   if (memory_block_id(nr) == block_id) ?
> > 
> > AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX'
> > will evaluate false and and we will set block_id afterwards.
> > 
> > Either way looks fine to me.
> > Another way I guess would be:
> > 
> 
> Yeah, we need to record the last handled block ID by @block_id. For the
> first time to register the block memory device in the loop, @block_id needs
> to be invalid (ULONG_MAX), bypassing the check of 'memory_block_id(nr) == block_id'.
> I will post the fix for review after Aditya confirms it works for him, with extra
> comment to explain why @block_id is initialized to ULONG_MAX.
> 
> Aditya, please have a try when you get a chance, thanks! I verified it on Power9
> machine where the issue exists and on one of my ARM64 machine.

I don't see any softlockups now with your patch as well as Oscar's patch.

Tested on PowerNV Power10.

Thanks for the quick replies Gavin.
- Aditya G

> 
> Thanks,
> Gavin
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10 11:44   ` Aditya Gupta
@ 2025-04-10 12:26     ` Aditya Gupta
  0 siblings, 0 replies; 12+ messages in thread
From: Aditya Gupta @ 2025-04-10 12:26 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand,
	Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki,
	Sourabh Jain, linux-kernel, Gavin Shan

On 25/04/10 05:14PM, Aditya Gupta wrote:
> Hi,
> 
> On 25/04/10 07:25AM, Oscar Salvador wrote:
> > > <...snip...>
> > 
> > @Aditya: can you please give it a try?
> > 
> 
> Yes, will try it now.

I don't see the softlockups now, with your patch Oscar.

Also Gavin's patch also fixes the issue for me.

Tested it on a Power10 PowerNV system.

Thank you for the quick replies !

Thanks,
- Aditya G

> 
> Thanks,
> - Aditya G
> 
> > 
> > 
> > -- 
> > Oscar Salvador
> > SUSE Labs


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [REPORT] Softlockups on PowerNV with upstream
  2025-04-10 12:22         ` Aditya Gupta
@ 2025-04-10 12:32           ` Gavin Shan
  0 siblings, 0 replies; 12+ messages in thread
From: Gavin Shan @ 2025-04-10 12:32 UTC (permalink / raw)
  To: Aditya Gupta
  Cc: Oscar Salvador, linux-mm, Andrew Morton, Danilo Krummrich,
	David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar,
	Rafael J. Wysocki, Sourabh Jain, linux-kernel, Donet Tom,
	Gavin Shan


On 4/10/25 10:22 PM, Aditya Gupta wrote:
> Cc +donet
> 
> On 25/04/10 07:44PM, Gavin Shan wrote:
>> On 4/10/25 6:23 PM, Oscar Salvador wrote:
>>> On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote:
>>>> Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr().
>>>> I already had the fix, working on IBM's Power9 machine, where the issue can be
>>>> reproduced. Please see the attached patch.
>>>>
>>>> I'm having most tests on ARM64 machine for the fix.
>>>
>>> Looks good to me.
>>> But we need a comment explaining why block_id is set to ULONG_MAX
>>> at the beginning as this might not be obvious.
>>>
>>> Also, do we need
>>>    if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ?
>>>
>>> Cannot just be
>>>
>>>    if (memory_block_id(nr) == block_id) ?
>>>
>>> AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX'
>>> will evaluate false and and we will set block_id afterwards.
>>>
>>> Either way looks fine to me.
>>> Another way I guess would be:
>>>
>>
>> Yeah, we need to record the last handled block ID by @block_id. For the
>> first time to register the block memory device in the loop, @block_id needs
>> to be invalid (ULONG_MAX), bypassing the check of 'memory_block_id(nr) == block_id'.
>> I will post the fix for review after Aditya confirms it works for him, with extra
>> comment to explain why @block_id is initialized to ULONG_MAX.
>>
>> Aditya, please have a try when you get a chance, thanks! I verified it on Power9
>> machine where the issue exists and on one of my ARM64 machine.
> 
> I don't see any softlockups now with your patch as well as Oscar's patch.
> 
> Tested on PowerNV Power10.
> 
> Thanks for the quick replies Gavin.

Nice, thanks for the quick test, Aditya. I will send the fix for reivew, with
you copied.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-04-10 12:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-09 18:03 [REPORT] Softlockups on PowerNV with upstream Aditya Gupta
2025-04-10  1:35 ` Gavin Shan
2025-04-10 11:38   ` Aditya Gupta
2025-04-10  5:25 ` Oscar Salvador
2025-04-10  5:35   ` Gavin Shan
2025-04-10  8:23     ` Oscar Salvador
2025-04-10  9:44       ` Gavin Shan
2025-04-10 11:49         ` Aditya Gupta
2025-04-10 12:22         ` Aditya Gupta
2025-04-10 12:32           ` Gavin Shan
2025-04-10 11:44   ` Aditya Gupta
2025-04-10 12:26     ` Aditya Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox