* [REPORT] Softlockups on PowerNV with upstream
@ 2025-04-09 18:03 Aditya Gupta
2025-04-10 1:35 ` Gavin Shan
2025-04-10 5:25 ` Oscar Salvador
0 siblings, 2 replies; 12+ messages in thread
From: Aditya Gupta @ 2025-04-09 18:03 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Danilo Krummrich, David Hildenbrand,
Greg Kroah-Hartman, Mahesh J Salgaonkar, Oscar Salvador,
Rafael J. Wysocki, Sourabh Jain, linux-kernel
Hi,
While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system.
I have tested it only on PowerNV systems. But some architectures/platforms also
might have it. PSeries systems don't have this issue though.
Bisect points to the following commit:
commit 61659efdb35ce6c6ac7639342098f3c4548b794b
Author: Gavin Shan <gshan@redhat.com>
Date: Wed Mar 12 09:30:43 2025 +1000
drivers/base/memory: improve add_boot_memory_block()
Patch series "drivers/base/memory: Two cleanups", v3.
Two cleanups to drivers/base/memory.
This patch (of 2)L
It's unnecessary to count the present sections for the specified block
since the block will be added if any section in the block is present.
Besides, for_each_present_section_nr() can be reused as Andrew Morton
suggested.
Improve by using for_each_present_section_nr() and dropping the
unnecessary @section_count.
No functional changes intended.
...
Pasted the console log, bisect log, and the kernel config, below.
Thanks,
- Aditya G
Console log
-----------
[ 2.783371] smp: Brought up 4 nodes, 256 CPUs
[ 2.783475] numa: Node 0 CPUs: 0-63
[ 2.783537] numa: Node 2 CPUs: 64-127
[ 2.783591] numa: Node 4 CPUs: 128-191
[ 2.783653] numa: Node 6 CPUs: 192-255
[ 2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved)
[ 2.892969] devtmpfs: initialized
[ 24.057853] watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1]
[ 24.057861] Modules linked in:
[ 24.057872] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY
[ 24.057879] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
[ 24.057883] NIP: c00000000209218c LR: c000000002092204 CTR: 0000000000000000
[ 24.057886] REGS: c00040000418fa30 TRAP: 0900 Not tainted (6.15.0-rc1-next-20250408)
[ 24.057891] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000428 XER: 00000000
[ 24.057904] CFAR: 0000000000000000 IRQMASK: 0
[ 24.057904] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
[ 24.057904] GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80
[ 24.057904] GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428
[ 24.057904] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
[ 24.057904] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 24.057904] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 24.057904] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 24.057904] GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000
[ 24.057948] NIP [c00000000209218c] memory_dev_init+0x114/0x1e0
[ 24.057963] LR [c000000002092204] memory_dev_init+0x18c/0x1e0
[ 24.057968] Call Trace:
[ 24.057970] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
[ 24.057976] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
[ 24.057981] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
[ 24.057989] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
[ 24.057996] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c
[ 24.058004] --- interrupt: 0 at 0x0
[ 24.058010] Code: 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 794a1f24 7d45502a 2c2a0000 <41820020> 79282c28 7cea4214 2c270000
...
[ 62.952729] rcu: INFO: rcu_sched self-detected stall on CPU
[ 62.952782] rcu: 248-....: (5999 ticks this GP) idle=5884/1/0x4000000000000002 softirq=81/81 fqs=1997
[ 62.952965] rcu: (t=6000 jiffies g=-1015 q=1 ncpus=256)
[ 62.953050] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Tainted: G L 6.15.0-rc1-next-20250408 #1 VOLUNTARY
[ 62.953055] Tainted: [L]=SOFTLOCKUP
[ 62.953057] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV
[ 62.953059] NIP: c000000002092180 LR: c000000002092204 CTR: 0000000000000000
[ 62.953062] REGS: c00040000418fa30 TRAP: 0900 Tainted: G L (6.15.0-rc1-next-20250408)
[ 62.953065] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 88000428 XER: 00000000
[ 62.953076] CFAR: 0000000000000000 IRQMASK: 0
[ 62.953076] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040
[ 62.953076] GPR04: 0000000000035940 c000c03ffebabb00 0000000000c03fff c000400fff587f80
[ 62.953076] GPR08: 0000000000000000 00000000002c390b 0000000000000587 0000000028000428
[ 62.953076] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000
[ 62.953076] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 62.953076] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 62.953076] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 62.953076] GPR28: c000000002df7f70 0000000000035900 c0000000011dd898 0000000008000000
[ 62.953117] NIP [c000000002092180] memory_dev_init+0x108/0x1e0
[ 62.953121] LR [c000000002092204] memory_dev_init+0x18c/0x1e0
[ 62.953125] Call Trace:
[ 62.953126] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable)
[ 62.953131] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4
[ 62.953135] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370
[ 62.953141] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c
[ 62.953146] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c
[ 62.953152] --- interrupt: 0 at 0x0
[ 62.953155] Code: 4181ffe8 3d22012f 3949fe68 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 <794a1f24> 7d45502a 2c2a0000 41820020
Bisect Log
----------
git bisect start
# status: waiting for both good and bad commits
# good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14
git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557
# status: waiting for bad commit, 1 good commit known
# bad: [7702d0130dc002bab2c3571ddb6ff68f82d99aea] Add linux-next specific files for 20250408
git bisect bad 7702d0130dc002bab2c3571ddb6ff68f82d99aea
# good: [390513642ee6763c7ada07f0a1470474986e6c1c] io_uring: always do atomic put from iowq
git bisect good 390513642ee6763c7ada07f0a1470474986e6c1c
# bad: [eb0ece16027f8223d5dc9aaf90124f70577bd22a] Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad eb0ece16027f8223d5dc9aaf90124f70577bd22a
# good: [7d06015d936c861160803e020f68f413b5c3cd9d] Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
git bisect good 7d06015d936c861160803e020f68f413b5c3cd9d
# good: [fa593d0f969dcfa41d390822fdf1a0ab48cd882c] Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect good fa593d0f969dcfa41d390822fdf1a0ab48cd882c
# good: [f64a72bc767f6e9ddb18fdacaeb99708c4810ada] Merge tag 'v6.15rc-part1-ksmbd-server-fixes' of git://git.samba.org/ksmbd
git bisect good f64a72bc767f6e9ddb18fdacaeb99708c4810ada
# good: [a14efee04796dd3f614eaf5348ca1ac099c21349] mm/page_alloc: clarify should_claim_block() commentary
git bisect good a14efee04796dd3f614eaf5348ca1ac099c21349
# good: [f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112] mm/vmalloc: refactor __vmalloc_node_range_noprof()
git bisect good f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112
# bad: [735b3f7e773bd09d459537562754debd1f8e816b] selftests/mm: uffd-unit-tests support for hugepages > 2M
git bisect bad 735b3f7e773bd09d459537562754debd1f8e816b
# bad: [d2734f044f84833b2c9ec1b71b542d299d35202b] mm: memory-failure: enhance comments for return value of memory_failure()
git bisect bad d2734f044f84833b2c9ec1b71b542d299d35202b
# bad: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block()
git bisect bad 61659efdb35ce6c6ac7639342098f3c4548b794b
# good: [58729c04cf1092b87aeef0bf0998c9e2e4771133] mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
git bisect good 58729c04cf1092b87aeef0bf0998c9e2e4771133
# good: [80a5c494c89f73907ed659a9233a70253774cdae] selftests/mm: add tests for folio_split(), buddy allocator like split
git bisect good 80a5c494c89f73907ed659a9233a70253774cdae
# good: [d53c78fffe7ad364397c693522ceb4d152c2aacd] mm/shmem: use xas_try_split() in shmem_split_large_entry()
git bisect good d53c78fffe7ad364397c693522ceb4d152c2aacd
# good: [c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f] mm/damon/sysfs-schemes: avoid Wformat-security warning on damon_sysfs_access_pattern_add_range_dir()
git bisect good c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f
# first bad commit: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block()
To Reproduce the issue
----------------------
Build the upstream kernel and boot on a PowerNV Power10 hardware
Kernel config
-------------
This should occur with any default configs you may have, or can use the following:
https://gist.github.com/adi-g15-ibm/6eb03cea2c6202e5eb017abd3819a491
CC list
-------
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org
To: linux-mm@kvack.org
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-09 18:03 [REPORT] Softlockups on PowerNV with upstream Aditya Gupta @ 2025-04-10 1:35 ` Gavin Shan 2025-04-10 11:38 ` Aditya Gupta 2025-04-10 5:25 ` Oscar Salvador 1 sibling, 1 reply; 12+ messages in thread From: Gavin Shan @ 2025-04-10 1:35 UTC (permalink / raw) To: Aditya Gupta, linux-mm Cc: Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Oscar Salvador, Rafael J. Wysocki, Sourabh Jain, linux-kernel, Gavin Shan, Gavin Shan Hi Aditya, On 4/10/25 4:03 AM, Aditya Gupta wrote: > > While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system. > > I have tested it only on PowerNV systems. But some architectures/platforms also > might have it. PSeries systems don't have this issue though. > > Bisect points to the following commit: > > commit 61659efdb35ce6c6ac7639342098f3c4548b794b > Author: Gavin Shan <gshan@redhat.com> > Date: Wed Mar 12 09:30:43 2025 +1000 > > drivers/base/memory: improve add_boot_memory_block() > > Patch series "drivers/base/memory: Two cleanups", v3. > > Two cleanups to drivers/base/memory. > > > This patch (of 2)L > > It's unnecessary to count the present sections for the specified block > since the block will be added if any section in the block is present. > Besides, for_each_present_section_nr() can be reused as Andrew Morton > suggested. > > Improve by using for_each_present_section_nr() and dropping the > unnecessary @section_count. > > No functional changes intended. > > ... > > Pasted the console log, bisect log, and the kernel config, below. > I don't see how 61659efdb35ce ("drivers/base/memory: improve add_boot_memory_block()") causes any logical changes. Could you help to revert it on top of v6.15.rc1 to confirm the RCU stall and softlockup issue is still existing? At present, I don't have access to a Power10 machine, but I will check around. > Thanks, > - Aditya G > > Console log > ----------- > > [ 2.783371] smp: Brought up 4 nodes, 256 CPUs > [ 2.783475] numa: Node 0 CPUs: 0-63 > [ 2.783537] numa: Node 2 CPUs: 64-127 > [ 2.783591] numa: Node 4 CPUs: 128-191 > [ 2.783653] numa: Node 6 CPUs: 192-255 > [ 2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved) The NUMA node number leaps by one. It seems the machine has 800GB memory if I'm correct. > [ 2.892969] devtmpfs: initialized > [ 24.057853] watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1] > [ 24.057861] Modules linked in: > [ 24.057872] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY > [ 24.057879] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV > [ 24.057883] NIP: c00000000209218c LR: c000000002092204 CTR: 0000000000000000 > [ 24.057886] REGS: c00040000418fa30 TRAP: 0900 Not tainted (6.15.0-rc1-next-20250408) > [ 24.057891] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000428 XER: 00000000 > [ 24.057904] CFAR: 0000000000000000 IRQMASK: 0 > [ 24.057904] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040 > [ 24.057904] GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80 > [ 24.057904] GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428 > [ 24.057904] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000 > [ 24.057904] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 24.057904] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 24.057904] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 24.057904] GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000 > [ 24.057948] NIP [c00000000209218c] memory_dev_init+0x114/0x1e0 > [ 24.057963] LR [c000000002092204] memory_dev_init+0x18c/0x1e0 > [ 24.057968] Call Trace: > [ 24.057970] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable) > [ 24.057976] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4 > [ 24.057981] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370 > [ 24.057989] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c > [ 24.057996] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c > [ 24.058004] --- interrupt: 0 at 0x0 > [ 24.058010] Code: 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 794a1f24 7d45502a 2c2a0000 <41820020> 79282c28 7cea4214 2c270000 > ... > [ 62.952729] rcu: INFO: rcu_sched self-detected stall on CPU > [ 62.952782] rcu: 248-....: (5999 ticks this GP) idle=5884/1/0x4000000000000002 softirq=81/81 fqs=1997 > [ 62.952965] rcu: (t=6000 jiffies g=-1015 q=1 ncpus=256) > [ 62.953050] CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Tainted: G L 6.15.0-rc1-next-20250408 #1 VOLUNTARY > [ 62.953055] Tainted: [L]=SOFTLOCKUP > [ 62.953057] Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV > [ 62.953059] NIP: c000000002092180 LR: c000000002092204 CTR: 0000000000000000 > [ 62.953062] REGS: c00040000418fa30 TRAP: 0900 Tainted: G L (6.15.0-rc1-next-20250408) > [ 62.953065] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 88000428 XER: 00000000 > [ 62.953076] CFAR: 0000000000000000 IRQMASK: 0 > [ 62.953076] GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040 > [ 62.953076] GPR04: 0000000000035940 c000c03ffebabb00 0000000000c03fff c000400fff587f80 > [ 62.953076] GPR08: 0000000000000000 00000000002c390b 0000000000000587 0000000028000428 > [ 62.953076] GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000 > [ 62.953076] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 62.953076] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 62.953076] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 62.953076] GPR28: c000000002df7f70 0000000000035900 c0000000011dd898 0000000008000000 > [ 62.953117] NIP [c000000002092180] memory_dev_init+0x108/0x1e0 > [ 62.953121] LR [c000000002092204] memory_dev_init+0x18c/0x1e0 > [ 62.953125] Call Trace: > [ 62.953126] [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable) > [ 62.953131] [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4 > [ 62.953135] [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370 > [ 62.953141] [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c > [ 62.953146] [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c > [ 62.953152] --- interrupt: 0 at 0x0 > [ 62.953155] Code: 4181ffe8 3d22012f 3949fe68 7fa9eb78 e8aa0000 2fa50000 60000000 60420000 7c29f840 792aaac2 40800034 419e0030 <794a1f24> 7d45502a 2c2a0000 41820020 > > Bisect Log > ---------- > > git bisect start > # status: waiting for both good and bad commits > # good: [38fec10eb60d687e30c8c6b5420d86e8149f7557] Linux 6.14 > git bisect good 38fec10eb60d687e30c8c6b5420d86e8149f7557 > # status: waiting for bad commit, 1 good commit known > # bad: [7702d0130dc002bab2c3571ddb6ff68f82d99aea] Add linux-next specific files for 20250408 > git bisect bad 7702d0130dc002bab2c3571ddb6ff68f82d99aea > # good: [390513642ee6763c7ada07f0a1470474986e6c1c] io_uring: always do atomic put from iowq > git bisect good 390513642ee6763c7ada07f0a1470474986e6c1c > # bad: [eb0ece16027f8223d5dc9aaf90124f70577bd22a] Merge tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > git bisect bad eb0ece16027f8223d5dc9aaf90124f70577bd22a > # good: [7d06015d936c861160803e020f68f413b5c3cd9d] Merge tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci > git bisect good 7d06015d936c861160803e020f68f413b5c3cd9d > # good: [fa593d0f969dcfa41d390822fdf1a0ab48cd882c] Merge tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next > git bisect good fa593d0f969dcfa41d390822fdf1a0ab48cd882c > # good: [f64a72bc767f6e9ddb18fdacaeb99708c4810ada] Merge tag 'v6.15rc-part1-ksmbd-server-fixes' of git://git.samba.org/ksmbd > git bisect good f64a72bc767f6e9ddb18fdacaeb99708c4810ada > # good: [a14efee04796dd3f614eaf5348ca1ac099c21349] mm/page_alloc: clarify should_claim_block() commentary > git bisect good a14efee04796dd3f614eaf5348ca1ac099c21349 > # good: [f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112] mm/vmalloc: refactor __vmalloc_node_range_noprof() > git bisect good f0e11a997ab438ce91a7dc9a6dd64c0c4a6af112 > # bad: [735b3f7e773bd09d459537562754debd1f8e816b] selftests/mm: uffd-unit-tests support for hugepages > 2M > git bisect bad 735b3f7e773bd09d459537562754debd1f8e816b > # bad: [d2734f044f84833b2c9ec1b71b542d299d35202b] mm: memory-failure: enhance comments for return value of memory_failure() > git bisect bad d2734f044f84833b2c9ec1b71b542d299d35202b > # bad: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block() > git bisect bad 61659efdb35ce6c6ac7639342098f3c4548b794b > # good: [58729c04cf1092b87aeef0bf0998c9e2e4771133] mm/huge_memory: add buddy allocator like (non-uniform) folio_split() > git bisect good 58729c04cf1092b87aeef0bf0998c9e2e4771133 > # good: [80a5c494c89f73907ed659a9233a70253774cdae] selftests/mm: add tests for folio_split(), buddy allocator like split > git bisect good 80a5c494c89f73907ed659a9233a70253774cdae > # good: [d53c78fffe7ad364397c693522ceb4d152c2aacd] mm/shmem: use xas_try_split() in shmem_split_large_entry() > git bisect good d53c78fffe7ad364397c693522ceb4d152c2aacd > # good: [c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f] mm/damon/sysfs-schemes: avoid Wformat-security warning on damon_sysfs_access_pattern_add_range_dir() > git bisect good c637c61c9ed0203d9a1f2ba21fb7a49ddca3ef8f > # first bad commit: [61659efdb35ce6c6ac7639342098f3c4548b794b] drivers/base/memory: improve add_boot_memory_block() > > To Reproduce the issue > ---------------------- > > Build the upstream kernel and boot on a PowerNV Power10 hardware > > Kernel config > ------------- > > This should occur with any default configs you may have, or can use the following: > > https://gist.github.com/adi-g15-ibm/6eb03cea2c6202e5eb017abd3819a491 > > CC list > ------- > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: David Hildenbrand <david@redhat.com> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: "Rafael J. Wysocki" <rafael@kernel.org> > Cc: Sourabh Jain <sourabhjain@linux.ibm.com> > Cc: linux-kernel@vger.kernel.org > To: linux-mm@kvack.org > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 1:35 ` Gavin Shan @ 2025-04-10 11:38 ` Aditya Gupta 0 siblings, 0 replies; 12+ messages in thread From: Aditya Gupta @ 2025-04-10 11:38 UTC (permalink / raw) To: Gavin Shan Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Oscar Salvador, Rafael J. Wysocki, Sourabh Jain, linux-kernel, Gavin Shan Hi Gavin, Sorry for the late reply. On 25/04/10 11:35AM, Gavin Shan wrote: > > [ 2.783371] smp: Brought up 4 nodes, 256 CPUs > > [ 2.783475] numa: Node 0 CPUs: 0-63 > > [ 2.783537] numa: Node 2 CPUs: 64-127 > > [ 2.783591] numa: Node 4 CPUs: 128-191 > > [ 2.783653] numa: Node 6 CPUs: 192-255 > > [ 2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved) > > The NUMA node number leaps by one. It seems the machine has 800GB memory if I'm correct. Yes Gavin, almost 700G: # lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x0000001fffffffff 128G online yes 0-127 0x0000400000000000-0x0000400fffffffff 64G online yes 65536-65599 0x0000800000000000-0x0000803fffffffff 256G online yes 131072-131327 0x0000c00000000000-0x0000c03fffffffff 256G online yes 196608-196863 Memory block size: 1G Total online memory: 704G Total offline memory: 0B Thanks, - Aditya G ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-09 18:03 [REPORT] Softlockups on PowerNV with upstream Aditya Gupta 2025-04-10 1:35 ` Gavin Shan @ 2025-04-10 5:25 ` Oscar Salvador 2025-04-10 5:35 ` Gavin Shan 2025-04-10 11:44 ` Aditya Gupta 1 sibling, 2 replies; 12+ messages in thread From: Oscar Salvador @ 2025-04-10 5:25 UTC (permalink / raw) To: Aditya Gupta Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote: > Hi, > > While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system. > > I have tested it only on PowerNV systems. But some architectures/platforms also > might have it. PSeries systems don't have this issue though. > > Bisect points to the following commit: > > commit 61659efdb35ce6c6ac7639342098f3c4548b794b > Author: Gavin Shan <gshan@redhat.com> > Date: Wed Mar 12 09:30:43 2025 +1000 > > drivers/base/memory: improve add_boot_memory_block() > ... > Console log > ----------- > > [ 2.783371] smp: Brought up 4 nodes, 256 CPUs > [ 2.783475] numa: Node 0 CPUs: 0-63 > [ 2.783537] numa: Node 2 CPUs: 64-127 > [ 2.783591] numa: Node 4 CPUs: 128-191 > [ 2.783653] numa: Node 6 CPUs: 192-255 > [ 2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved) If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size, and sections_per_block == 1 (I think). The code before the mentioned commit, was something like: for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++) if (present_section_nr(nr)) section_count++; if (section_count == 0) return 0; return add_memory_block() So, in case of PowerNV , we will just check one section at a time and either return or call add_memory_block depending whether it is present. Now, with the current code that is something different. We now have memory_dev_init: for(nr = 0, nr <= __highest_present_section_nr; nr += 1) ret = add_boot_memory_block add_boot_memory_block: for_each_present_section_nr(base_section_nr, nr) { if (nr >= (base_section_nr + sections_per_block)) break; return add_memory_block(); } return 0; The thing is that next_present_section_nr() (which is called in for_each_present_section_nr()) will loop until we find a present section. And then we will check whether the found section is beyond base_section_nr + sections_per_block (where sections_per_block = 1). If so, we skip add_memory_block. Now, I think that the issue comes from for_each_present_section_nr having to loop a lot until we find a present section. And then the loop in memory_dev_init increments only by 1, which means that the next iteration we might have to loop a lot again to find the another present section. And so on and so forth. Maybe we can fix this by making memory_dev_init() remember in which section add_boot_memory_block returns. Something like the following (only compile-tested) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 8f3a41d9bfaa..d97635cbfd1d 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state, return 0; } -static int __init add_boot_memory_block(unsigned long base_section_nr) +static int __init add_boot_memory_block(unsigned long *base_section_nr) { + int ret; unsigned long nr; - for_each_present_section_nr(base_section_nr, nr) { - if (nr >= (base_section_nr + sections_per_block)) + for_each_present_section_nr(*base_section_nr, nr) { + if (nr >= (*base_section_nr + sections_per_block)) break; - return add_memory_block(memory_block_id(base_section_nr), - MEM_ONLINE, NULL, NULL); + ret = add_memory_block(memory_block_id(*base_section_nr), + MEM_ONLINE, NULL, NULL); + *base_section = nr; + return ret; } + if (nr == -1) + *base_section = __highest_present_section_nr + 1; + else + *base_section = nr; return 0; } @@ -973,9 +980,9 @@ void __init memory_dev_init(void) * Create entries for memory sections that were found * during boot and have been initialized */ - for (nr = 0; nr <= __highest_present_section_nr; - nr += sections_per_block) { - ret = add_boot_memory_block(nr); + nr = first_present_section_nr(); + for (; nr <= __highest_present_section_nr; nr += sections_per_block) { + ret = add_boot_memory_block(&nr); if (ret) panic("%s() failed to add memory block: %d\n", __func__, ret); @Aditya: can you please give it a try? -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 5:25 ` Oscar Salvador @ 2025-04-10 5:35 ` Gavin Shan 2025-04-10 8:23 ` Oscar Salvador 2025-04-10 11:44 ` Aditya Gupta 1 sibling, 1 reply; 12+ messages in thread From: Gavin Shan @ 2025-04-10 5:35 UTC (permalink / raw) To: Oscar Salvador, Aditya Gupta Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel [-- Attachment #1: Type: text/plain, Size: 4874 bytes --] On 4/10/25 3:25 PM, Oscar Salvador wrote: > On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote: >> Hi, >> >> While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system. >> >> I have tested it only on PowerNV systems. But some architectures/platforms also >> might have it. PSeries systems don't have this issue though. >> >> Bisect points to the following commit: >> >> commit 61659efdb35ce6c6ac7639342098f3c4548b794b >> Author: Gavin Shan <gshan@redhat.com> >> Date: Wed Mar 12 09:30:43 2025 +1000 >> >> drivers/base/memory: improve add_boot_memory_block() >> > ... >> Console log >> ----------- >> >> [ 2.783371] smp: Brought up 4 nodes, 256 CPUs >> [ 2.783475] numa: Node 0 CPUs: 0-63 >> [ 2.783537] numa: Node 2 CPUs: 64-127 >> [ 2.783591] numa: Node 4 CPUs: 128-191 >> [ 2.783653] numa: Node 6 CPUs: 192-255 >> [ 2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved) > > If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size, > and sections_per_block == 1 (I think). > > The code before the mentioned commit, was something like: > > for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++) > if (present_section_nr(nr)) > section_count++; > > if (section_count == 0) > return 0; > return add_memory_block() > > So, in case of PowerNV , we will just check one section at a time and > either return or call add_memory_block depending whether it is present. > > Now, with the current code that is something different. > We now have > > memory_dev_init: > for(nr = 0, nr <= __highest_present_section_nr; nr += 1) > ret = add_boot_memory_block > > add_boot_memory_block: > for_each_present_section_nr(base_section_nr, nr) { > if (nr >= (base_section_nr + sections_per_block)) > break; > > return add_memory_block(); > } > return 0; > > The thing is that next_present_section_nr() (which is called in > for_each_present_section_nr()) will loop until we find a present > section. > And then we will check whether the found section is beyond > base_section_nr + sections_per_block (where sections_per_block = 1). > If so, we skip add_memory_block. > > Now, I think that the issue comes from for_each_present_section_nr > having to loop a lot until we find a present section. > And then the loop in memory_dev_init increments only by 1, which means > that the next iteration we might have to loop a lot again to find the > another present section. And so on and so forth. > > Maybe we can fix this by making memory_dev_init() remember in which > section add_boot_memory_block returns. > Something like the following (only compile-tested) > Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr(). I already had the fix, working on IBM's Power9 machine, where the issue can be reproduced. Please see the attached patch. I'm having most tests on ARM64 machine for the fix. Thanks, Gavin > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 8f3a41d9bfaa..d97635cbfd1d 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state, > return 0; > } > > -static int __init add_boot_memory_block(unsigned long base_section_nr) > +static int __init add_boot_memory_block(unsigned long *base_section_nr) > { > + int ret; > unsigned long nr; > > - for_each_present_section_nr(base_section_nr, nr) { > - if (nr >= (base_section_nr + sections_per_block)) > + for_each_present_section_nr(*base_section_nr, nr) { > + if (nr >= (*base_section_nr + sections_per_block)) > break; > > - return add_memory_block(memory_block_id(base_section_nr), > - MEM_ONLINE, NULL, NULL); > + ret = add_memory_block(memory_block_id(*base_section_nr), > + MEM_ONLINE, NULL, NULL); > + *base_section = nr; > + return ret; > } > > + if (nr == -1) > + *base_section = __highest_present_section_nr + 1; > + else > + *base_section = nr; > return 0; > } > > @@ -973,9 +980,9 @@ void __init memory_dev_init(void) > * Create entries for memory sections that were found > * during boot and have been initialized > */ > - for (nr = 0; nr <= __highest_present_section_nr; > - nr += sections_per_block) { > - ret = add_boot_memory_block(nr); > + nr = first_present_section_nr(); > + for (; nr <= __highest_present_section_nr; nr += sections_per_block) { > + ret = add_boot_memory_block(&nr); > if (ret) > panic("%s() failed to add memory block: %d\n", __func__, > ret); > > > @Aditya: can you please give it a try? > > > [-- Attachment #2: 0001-drivers-base-memory-Avoid-overhead-for_each_present_.patch --] [-- Type: text/x-patch, Size: 4736 bytes --] From d4c43d5f6b962144c4f47d46a66284df92da285e Mon Sep 17 00:00:00 2001 From: Gavin Shan <gshan@redhat.com> Date: Thu, 10 Apr 2025 14:43:46 +1000 Subject: [PATCH] drivers/base/memory: Avoid overhead for_each_present_section_nr() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit for_each_present_section_nr() was introduced to add_boot_memory_block() by commit 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()"). It causes unnecessary overhead when the present sections are really sparse. next_present_section_nr() called by the macro finds the next present section, which is far away from the spanning sections in the specified block. Too much time consumed by next_present_section_nr() in this case, which can lead to softlockup as observed by Aditya Gupta on IBM Power10 machine. watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1] Modules linked in: CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV NIP: c00000000209218c LR: c000000002092204 CTR: 0000000000000000 REGS: c00040000418fa30 TRAP: 0900 Not tainted (6.15.0-rc1-next-20250408) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000428 XER: 00000000 CFAR: 0000000000000000 IRQMASK: 0 GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040 GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80 GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428 GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000 NIP [c00000000209218c] memory_dev_init+0x114/0x1e0 LR [c000000002092204] memory_dev_init+0x18c/0x1e0 Call Trace: [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable) [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4 [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370 [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c Avoid the overhead by folding for_each_present_section_nr() to the outer loop. add_boot_memory_block() is dropped after that. Fixes: 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()") Closes: https://lore.kernel.org/linux-mm/20250409180344.477916-1-adityag@linux.ibm.com Reported-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Gavin Shan <gshan@redhat.com> --- drivers/base/memory.c | 34 ++++++++++++---------------------- 1 file changed, 12 insertions(+), 22 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index 8f3a41d9bfaa..433a5fe96304 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -816,21 +816,6 @@ static int add_memory_block(unsigned long block_id, unsigned long state, return 0; } -static int __init add_boot_memory_block(unsigned long base_section_nr) -{ - unsigned long nr; - - for_each_present_section_nr(base_section_nr, nr) { - if (nr >= (base_section_nr + sections_per_block)) - break; - - return add_memory_block(memory_block_id(base_section_nr), - MEM_ONLINE, NULL, NULL); - } - - return 0; -} - static int add_hotplug_memory_block(unsigned long block_id, struct vmem_altmap *altmap, struct memory_group *group) @@ -957,7 +942,7 @@ static const struct attribute_group *memory_root_attr_groups[] = { void __init memory_dev_init(void) { int ret; - unsigned long block_sz, nr; + unsigned long block_sz, block_id, nr; /* Validate the configured memory block size */ block_sz = memory_block_size_bytes(); @@ -973,12 +958,17 @@ void __init memory_dev_init(void) * Create entries for memory sections that were found * during boot and have been initialized */ - for (nr = 0; nr <= __highest_present_section_nr; - nr += sections_per_block) { - ret = add_boot_memory_block(nr); - if (ret) - panic("%s() failed to add memory block: %d\n", __func__, - ret); + block_id = ULONG_MAX; + for_each_present_section_nr(0, nr) { + if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) + continue; + + block_id = memory_block_id(nr); + ret = add_memory_block(block_id, MEM_ONLINE, NULL, NULL); + if (ret) { + panic("%s() failed to add memory block: %d\n", + __func__, ret); + } } } -- 2.48.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 5:35 ` Gavin Shan @ 2025-04-10 8:23 ` Oscar Salvador 2025-04-10 9:44 ` Gavin Shan 0 siblings, 1 reply; 12+ messages in thread From: Oscar Salvador @ 2025-04-10 8:23 UTC (permalink / raw) To: Gavin Shan Cc: Aditya Gupta, linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote: > Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr(). > I already had the fix, working on IBM's Power9 machine, where the issue can be > reproduced. Please see the attached patch. > > I'm having most tests on ARM64 machine for the fix. Looks good to me. But we need a comment explaining why block_id is set to ULONG_MAX at the beginning as this might not be obvious. Also, do we need if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ? Cannot just be if (memory_block_id(nr) == block_id) ? AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX' will evaluate false and and we will set block_id afterwards. Either way looks fine to me. Another way I guess would be: -- Oscar Salvador SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 8:23 ` Oscar Salvador @ 2025-04-10 9:44 ` Gavin Shan 2025-04-10 11:49 ` Aditya Gupta 2025-04-10 12:22 ` Aditya Gupta 0 siblings, 2 replies; 12+ messages in thread From: Gavin Shan @ 2025-04-10 9:44 UTC (permalink / raw) To: Oscar Salvador Cc: Aditya Gupta, linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel On 4/10/25 6:23 PM, Oscar Salvador wrote: > On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote: >> Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr(). >> I already had the fix, working on IBM's Power9 machine, where the issue can be >> reproduced. Please see the attached patch. >> >> I'm having most tests on ARM64 machine for the fix. > > Looks good to me. > But we need a comment explaining why block_id is set to ULONG_MAX > at the beginning as this might not be obvious. > > Also, do we need > if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ? > > Cannot just be > > if (memory_block_id(nr) == block_id) ? > > AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX' > will evaluate false and and we will set block_id afterwards. > > Either way looks fine to me. > Another way I guess would be: > Yeah, we need to record the last handled block ID by @block_id. For the first time to register the block memory device in the loop, @block_id needs to be invalid (ULONG_MAX), bypassing the check of 'memory_block_id(nr) == block_id'. I will post the fix for review after Aditya confirms it works for him, with extra comment to explain why @block_id is initialized to ULONG_MAX. Aditya, please have a try when you get a chance, thanks! I verified it on Power9 machine where the issue exists and on one of my ARM64 machine. Thanks, Gavin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 9:44 ` Gavin Shan @ 2025-04-10 11:49 ` Aditya Gupta 2025-04-10 12:22 ` Aditya Gupta 1 sibling, 0 replies; 12+ messages in thread From: Aditya Gupta @ 2025-04-10 11:49 UTC (permalink / raw) To: Gavin Shan Cc: Oscar Salvador, linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel Hi Gavin, On 25/04/10 07:44PM, Gavin Shan wrote: > > <...snip...> > > Aditya, please have a try when you get a chance, thanks! I verified it on Power9 > machine where the issue exists and on one of my ARM64 machine. Yes Gavin, will try the patch and then reply. Thanks, - Aditya G > > Thanks, > Gavin > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 9:44 ` Gavin Shan 2025-04-10 11:49 ` Aditya Gupta @ 2025-04-10 12:22 ` Aditya Gupta 2025-04-10 12:32 ` Gavin Shan 1 sibling, 1 reply; 12+ messages in thread From: Aditya Gupta @ 2025-04-10 12:22 UTC (permalink / raw) To: Gavin Shan Cc: Oscar Salvador, linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel, Donet Tom Cc +donet On 25/04/10 07:44PM, Gavin Shan wrote: > On 4/10/25 6:23 PM, Oscar Salvador wrote: > > On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote: > > > Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr(). > > > I already had the fix, working on IBM's Power9 machine, where the issue can be > > > reproduced. Please see the attached patch. > > > > > > I'm having most tests on ARM64 machine for the fix. > > > > Looks good to me. > > But we need a comment explaining why block_id is set to ULONG_MAX > > at the beginning as this might not be obvious. > > > > Also, do we need > > if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ? > > > > Cannot just be > > > > if (memory_block_id(nr) == block_id) ? > > > > AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX' > > will evaluate false and and we will set block_id afterwards. > > > > Either way looks fine to me. > > Another way I guess would be: > > > > Yeah, we need to record the last handled block ID by @block_id. For the > first time to register the block memory device in the loop, @block_id needs > to be invalid (ULONG_MAX), bypassing the check of 'memory_block_id(nr) == block_id'. > I will post the fix for review after Aditya confirms it works for him, with extra > comment to explain why @block_id is initialized to ULONG_MAX. > > Aditya, please have a try when you get a chance, thanks! I verified it on Power9 > machine where the issue exists and on one of my ARM64 machine. I don't see any softlockups now with your patch as well as Oscar's patch. Tested on PowerNV Power10. Thanks for the quick replies Gavin. - Aditya G > > Thanks, > Gavin > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 12:22 ` Aditya Gupta @ 2025-04-10 12:32 ` Gavin Shan 0 siblings, 0 replies; 12+ messages in thread From: Gavin Shan @ 2025-04-10 12:32 UTC (permalink / raw) To: Aditya Gupta Cc: Oscar Salvador, linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel, Donet Tom, Gavin Shan On 4/10/25 10:22 PM, Aditya Gupta wrote: > Cc +donet > > On 25/04/10 07:44PM, Gavin Shan wrote: >> On 4/10/25 6:23 PM, Oscar Salvador wrote: >>> On Thu, Apr 10, 2025 at 03:35:19PM +1000, Gavin Shan wrote: >>>> Thanks, Oscar. You're correct that the overhead is introduced by for_each_present_section_nr(). >>>> I already had the fix, working on IBM's Power9 machine, where the issue can be >>>> reproduced. Please see the attached patch. >>>> >>>> I'm having most tests on ARM64 machine for the fix. >>> >>> Looks good to me. >>> But we need a comment explaining why block_id is set to ULONG_MAX >>> at the beginning as this might not be obvious. >>> >>> Also, do we need >>> if (block_id != ULONG_MAX && memory_block_id(nr) == block_id) ? >>> >>> Cannot just be >>> >>> if (memory_block_id(nr) == block_id) ? >>> >>> AFAICS, the first time we loop through 'memory_block_id(nr) == ULONG_MAX' >>> will evaluate false and and we will set block_id afterwards. >>> >>> Either way looks fine to me. >>> Another way I guess would be: >>> >> >> Yeah, we need to record the last handled block ID by @block_id. For the >> first time to register the block memory device in the loop, @block_id needs >> to be invalid (ULONG_MAX), bypassing the check of 'memory_block_id(nr) == block_id'. >> I will post the fix for review after Aditya confirms it works for him, with extra >> comment to explain why @block_id is initialized to ULONG_MAX. >> >> Aditya, please have a try when you get a chance, thanks! I verified it on Power9 >> machine where the issue exists and on one of my ARM64 machine. > > I don't see any softlockups now with your patch as well as Oscar's patch. > > Tested on PowerNV Power10. > > Thanks for the quick replies Gavin. Nice, thanks for the quick test, Aditya. I will send the fix for reivew, with you copied. Thanks, Gavin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 5:25 ` Oscar Salvador 2025-04-10 5:35 ` Gavin Shan @ 2025-04-10 11:44 ` Aditya Gupta 2025-04-10 12:26 ` Aditya Gupta 1 sibling, 1 reply; 12+ messages in thread From: Aditya Gupta @ 2025-04-10 11:44 UTC (permalink / raw) To: Oscar Salvador Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel Hi, On 25/04/10 07:25AM, Oscar Salvador wrote: > On Wed, Apr 09, 2025 at 11:33:44PM +0530, Aditya Gupta wrote: > > Hi, > > > > While booting current upstream kernel, I consistently get "softlockups", on IBM PowerNV system. > > > > I have tested it only on PowerNV systems. But some architectures/platforms also > > might have it. PSeries systems don't have this issue though. > > > > Bisect points to the following commit: > > > > commit 61659efdb35ce6c6ac7639342098f3c4548b794b > > Author: Gavin Shan <gshan@redhat.com> > > Date: Wed Mar 12 09:30:43 2025 +1000 > > > > drivers/base/memory: improve add_boot_memory_block() > > > ... > > Console log > > ----------- > > > > [ 2.783371] smp: Brought up 4 nodes, 256 CPUs > > [ 2.783475] numa: Node 0 CPUs: 0-63 > > [ 2.783537] numa: Node 2 CPUs: 64-127 > > [ 2.783591] numa: Node 4 CPUs: 128-191 > > [ 2.783653] numa: Node 6 CPUs: 192-255 > > [ 2.804945] Memory: 735777792K/738197504K available (17536K kernel code, 5760K rwdata, 15232K rodata, 6528K init, 2517K bss, 1369664K reserved, 0K cma-reserved) > > If I am not mistaken this is ~700GB, and PowerNV uses 16MB as section size, > and sections_per_block == 1 (I think). Yes, the memory is around 700G: # lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x0000001fffffffff 128G online yes 0-127 0x0000400000000000-0x0000400fffffffff 64G online yes 65536-65599 0x0000800000000000-0x0000803fffffffff 256G online yes 131072-131327 0x0000c00000000000-0x0000c03fffffffff 256G online yes 196608-196863 Memory block size: 1G Total online memory: 704G Total offline memory: 0B I don't know about the sections_per_block. > > The code before the mentioned commit, was something like: > > for (nr = base_section_nr; nr < base_section_nr + sections_per_block; nr++) > if (present_section_nr(nr)) > section_count++; > > if (section_count == 0) > return 0; > return add_memory_block() > > So, in case of PowerNV , we will just check one section at a time and > either return or call add_memory_block depending whether it is present. > > Now, with the current code that is something different. > We now have > > memory_dev_init: > for(nr = 0, nr <= __highest_present_section_nr; nr += 1) > ret = add_boot_memory_block > > add_boot_memory_block: > for_each_present_section_nr(base_section_nr, nr) { > if (nr >= (base_section_nr + sections_per_block)) > break; > > return add_memory_block(); > } > return 0; > > The thing is that next_present_section_nr() (which is called in > for_each_present_section_nr()) will loop until we find a present > section. > And then we will check whether the found section is beyond > base_section_nr + sections_per_block (where sections_per_block = 1). > If so, we skip add_memory_block. > > Now, I think that the issue comes from for_each_present_section_nr > having to loop a lot until we find a present section. > And then the loop in memory_dev_init increments only by 1, which means > that the next iteration we might have to loop a lot again to find the > another present section. And so on and so forth. > > Maybe we can fix this by making memory_dev_init() remember in which > section add_boot_memory_block returns. > Something like the following (only compile-tested) > > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index 8f3a41d9bfaa..d97635cbfd1d 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -816,18 +816,25 @@ static int add_memory_block(unsigned long block_id, unsigned long state, > return 0; > } > > -static int __init add_boot_memory_block(unsigned long base_section_nr) > +static int __init add_boot_memory_block(unsigned long *base_section_nr) > { > + int ret; > unsigned long nr; > > - for_each_present_section_nr(base_section_nr, nr) { > - if (nr >= (base_section_nr + sections_per_block)) > + for_each_present_section_nr(*base_section_nr, nr) { > + if (nr >= (*base_section_nr + sections_per_block)) > break; > > - return add_memory_block(memory_block_id(base_section_nr), > - MEM_ONLINE, NULL, NULL); > + ret = add_memory_block(memory_block_id(*base_section_nr), > + MEM_ONLINE, NULL, NULL); > + *base_section = nr; > + return ret; > } > > + if (nr == -1) > + *base_section = __highest_present_section_nr + 1; > + else > + *base_section = nr; > return 0; > } > > @@ -973,9 +980,9 @@ void __init memory_dev_init(void) > * Create entries for memory sections that were found > * during boot and have been initialized > */ > - for (nr = 0; nr <= __highest_present_section_nr; > - nr += sections_per_block) { > - ret = add_boot_memory_block(nr); > + nr = first_present_section_nr(); > + for (; nr <= __highest_present_section_nr; nr += sections_per_block) { > + ret = add_boot_memory_block(&nr); > if (ret) > panic("%s() failed to add memory block: %d\n", __func__, > ret); > Makes sense, thanks for the nice explanation. > > @Aditya: can you please give it a try? > Yes, will try it now. Thanks, - Aditya G > > > -- > Oscar Salvador > SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [REPORT] Softlockups on PowerNV with upstream 2025-04-10 11:44 ` Aditya Gupta @ 2025-04-10 12:26 ` Aditya Gupta 0 siblings, 0 replies; 12+ messages in thread From: Aditya Gupta @ 2025-04-10 12:26 UTC (permalink / raw) To: Oscar Salvador Cc: linux-mm, Andrew Morton, Danilo Krummrich, David Hildenbrand, Greg Kroah-Hartman, Mahesh J Salgaonkar, Rafael J. Wysocki, Sourabh Jain, linux-kernel, Gavin Shan On 25/04/10 05:14PM, Aditya Gupta wrote: > Hi, > > On 25/04/10 07:25AM, Oscar Salvador wrote: > > > <...snip...> > > > > @Aditya: can you please give it a try? > > > > Yes, will try it now. I don't see the softlockups now, with your patch Oscar. Also Gavin's patch also fixes the issue for me. Tested it on a Power10 PowerNV system. Thank you for the quick replies ! Thanks, - Aditya G > > Thanks, > - Aditya G > > > > > > > -- > > Oscar Salvador > > SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-04-10 12:32 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-04-09 18:03 [REPORT] Softlockups on PowerNV with upstream Aditya Gupta 2025-04-10 1:35 ` Gavin Shan 2025-04-10 11:38 ` Aditya Gupta 2025-04-10 5:25 ` Oscar Salvador 2025-04-10 5:35 ` Gavin Shan 2025-04-10 8:23 ` Oscar Salvador 2025-04-10 9:44 ` Gavin Shan 2025-04-10 11:49 ` Aditya Gupta 2025-04-10 12:22 ` Aditya Gupta 2025-04-10 12:32 ` Gavin Shan 2025-04-10 11:44 ` Aditya Gupta 2025-04-10 12:26 ` Aditya Gupta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox