* 6.16-pre-rc1: BUG: Bad page state in process swapper on parisc
@ 2025-06-03 17:31 Helge Deller
2025-08-27 21:31 ` Christoph Biedl
0 siblings, 1 reply; 5+ messages in thread
From: Helge Deller @ 2025-06-03 17:31 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Linux Kernel Development,
Linux Memory Management List, linux-parisc
I'm facing a kernel crash on the 32-bit parisc platform with git head.
git bisecting leads to this patch which triggers the crash:
commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
Syslog:...
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 131072
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] stackdepot: allocating hash table via alloc_large_system_hash
[ 0.000000] stackdepot hash table entries: 32768 (order: 6, 262144 bytes, linear)
..
[ 0.000000] MEMBLOCK configuration: (I added this output during debugging:)
[ 0.000000] memory size = 0x20000000 reserved size = 0x01f0ed2a
[ 0.000000] memory.cnt = 0x1
[ 0.000000] memory[0x0] [0x00000000-0x1fffffff], 0x20000000 bytes flags: 0x0
[ 0.000000] reserved.cnt = 0xa
[ 0.000000] reserved[0x0] [0x00000000-0x0008a0b0], 0x0008a0b1 bytes flags: 0x0
[ 0.000000] reserved[0x1] [0x0008a0c0-0x0008a130], 0x00000071 bytes flags: 0x0
[ 0.000000] reserved[0x2] [0x0008a140-0x0008a143], 0x00000004 bytes flags: 0x0
[ 0.000000] reserved[0x3] [0x0008a150-0x0008a153], 0x00000004 bytes flags: 0x0
[ 0.000000] reserved[0x4] [0x0008a160-0x0008a2d3], 0x00000174 bytes flags: 0x0
[ 0.000000] reserved[0x5] [0x0008a2e0-0x0008a5e3], 0x00000304 bytes flags: 0x0
[ 0.000000] reserved[0x6] [0x0008a5f0-0x0008a6b3], 0x000000c4 bytes flags: 0x0
[ 0.000000] reserved[0x7] [0x0008a6c0-0x0008acc3], 0x00000604 bytes flags: 0x0
[ 0.000000] reserved[0x8] [0x0008acd0-0x000f6d8f], 0x0006c0c0 bytes flags: 0x0
[ 0.000000] reserved[0x9] [0x00100000-0x01f17fff], 0x01e18000 bytes flags: 0x0
[ 0.000000] BUG: Bad page state in process swapper pfn:000f7
[ 0.000000] page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7
[ 0.000000] flags: 0x0(zone=0)
[ 0.000000] raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000
[ 0.000000] raw: 00000000
[ 0.000000] page dumped because: page_pool leak
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE
[ 0.000000] Hardware name: 9000/778/B160L
[ 0.000000] Backtrace:
[ 0.000000] [<1041d1f4>] show_stack+0x34/0x48
[ 0.000000] [<10412dd8>] dump_stack_lvl+0x80/0xc8
[ 0.000000] [<10412e3c>] dump_stack+0x1c/0x2c
[ 0.000000] [<106ece88>] bad_page+0x14c/0x17c
[ 0.000000] [<10406c50>] free_page_is_bad.part.0+0xd4/0xec
[ 0.000000] [<106ed180>] free_page_is_bad+0x80/0x88
[ 0.000000] [<106ef05c>] __free_pages_ok+0x374/0x508
[ 0.000000] [<1011d34c>] __free_pages_core+0x1f0/0x218
[ 0.000000] [<1011a2f0>] memblock_free_pages+0x68/0x94
[ 0.000000] [<10120324>] memblock_free_all+0x26c/0x310
[ 0.000000] [<1011a4d8>] mm_core_init+0x18c/0x208
[ 0.000000] [<10100e88>] start_kernel+0x4ec/0x7a0
[ 0.000000] [<101054d0>] start_parisc+0xb4/0xc4
When it crashes, __free_pages_ok is called with page=0x118022bc pfn=f7 order=0.
Other maybe relevant values:
page->pp_magic 0x118022c0, PP_MAGIC_MASK = 0xc000007c,
PP_SIGNATURE=0x40, PP_DMA_INDEX_BITS=23, PP_DMA_INDEX_MASK=0x3fffff80
I'm not convinced that Toke's patch is actually the culprit, but
it somehow changes the behaviour.
Any idea what I could test or check to narrow down the issue?
Helge
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: 6.16-pre-rc1: BUG: Bad page state in process swapper on parisc
2025-06-03 17:31 6.16-pre-rc1: BUG: Bad page state in process swapper on parisc Helge Deller
@ 2025-08-27 21:31 ` Christoph Biedl
2025-09-11 22:12 ` boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels Helge Deller
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Biedl @ 2025-08-27 21:31 UTC (permalink / raw)
To: Helge Deller
Cc: Toke Høiland-Jørgensen, Linux Kernel Development,
Linux Memory Management List, linux-parisc
Sorry for being somewhat late to the party ...
Helge Deller wrote a few weeks ago ...
> I'm facing a kernel crash on the 32-bit parisc platform with git head.
>
> git bisecting leads to this patch which triggers the crash:
> commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool")
>
> Syslog:...
> [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 131072
> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [ 0.000000] stackdepot: allocating hash table via alloc_large_system_hash
> [ 0.000000] stackdepot hash table entries: 32768 (order: 6, 262144 bytes, linear)
> ..
> [ 0.000000] MEMBLOCK configuration: (I added this output during debugging:)
> [ 0.000000] memory size = 0x20000000 reserved size = 0x01f0ed2a
> [ 0.000000] memory.cnt = 0x1
> [ 0.000000] memory[0x0] [0x00000000-0x1fffffff], 0x20000000 bytes flags: 0x0
> [ 0.000000] reserved.cnt = 0xa
> [ 0.000000] reserved[0x0] [0x00000000-0x0008a0b0], 0x0008a0b1 bytes flags: 0x0
> [ 0.000000] reserved[0x1] [0x0008a0c0-0x0008a130], 0x00000071 bytes flags: 0x0
> [ 0.000000] reserved[0x2] [0x0008a140-0x0008a143], 0x00000004 bytes flags: 0x0
> [ 0.000000] reserved[0x3] [0x0008a150-0x0008a153], 0x00000004 bytes flags: 0x0
> [ 0.000000] reserved[0x4] [0x0008a160-0x0008a2d3], 0x00000174 bytes flags: 0x0
> [ 0.000000] reserved[0x5] [0x0008a2e0-0x0008a5e3], 0x00000304 bytes flags: 0x0
> [ 0.000000] reserved[0x6] [0x0008a5f0-0x0008a6b3], 0x000000c4 bytes flags: 0x0
> [ 0.000000] reserved[0x7] [0x0008a6c0-0x0008acc3], 0x00000604 bytes flags: 0x0
> [ 0.000000] reserved[0x8] [0x0008acd0-0x000f6d8f], 0x0006c0c0 bytes flags: 0x0
> [ 0.000000] reserved[0x9] [0x00100000-0x01f17fff], 0x01e18000 bytes flags: 0x0
> [ 0.000000] BUG: Bad page state in process swapper pfn:000f7
> [ 0.000000] page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7
> [ 0.000000] flags: 0x0(zone=0)
> [ 0.000000] raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000
> [ 0.000000] raw: 00000000
> [ 0.000000] page dumped because: page_pool leak
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE
> [ 0.000000] Hardware name: 9000/778/B160L
> [ 0.000000] Backtrace:
> [ 0.000000] [<1041d1f4>] show_stack+0x34/0x48
> [ 0.000000] [<10412dd8>] dump_stack_lvl+0x80/0xc8
> [ 0.000000] [<10412e3c>] dump_stack+0x1c/0x2c
> [ 0.000000] [<106ece88>] bad_page+0x14c/0x17c
> [ 0.000000] [<10406c50>] free_page_is_bad.part.0+0xd4/0xec
> [ 0.000000] [<106ed180>] free_page_is_bad+0x80/0x88
> [ 0.000000] [<106ef05c>] __free_pages_ok+0x374/0x508
> [ 0.000000] [<1011d34c>] __free_pages_core+0x1f0/0x218
> [ 0.000000] [<1011a2f0>] memblock_free_pages+0x68/0x94
> [ 0.000000] [<10120324>] memblock_free_all+0x26c/0x310
> [ 0.000000] [<1011a4d8>] mm_core_init+0x18c/0x208
> [ 0.000000] [<10100e88>] start_kernel+0x4ec/0x7a0
> [ 0.000000] [<101054d0>] start_parisc+0xb4/0xc4
The same occured here but due to time constraints and hardware issues I
couldn't dig into this earlier.
Bisecting in the 6.15.y stable series led to commit c30ae60f41f9 which
was cherry-picked from ee62ce7a1d90 ("page_pool: Track DMA-mapped pages
and unmap them when destroying the pool").
The problem still exists in 6.17-rc2.
| HP-UX model name: 9000/785/C3600
if that matters.
Christoph
^ permalink raw reply [flat|nested] 5+ messages in thread* boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels
2025-08-27 21:31 ` Christoph Biedl
@ 2025-09-11 22:12 ` Helge Deller
2025-09-12 7:57 ` David Hildenbrand
0 siblings, 1 reply; 5+ messages in thread
From: Helge Deller @ 2025-09-11 22:12 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, David Hildenbrand,
Linux Kernel Development, Linux Memory Management List,
linux-parisc
Cc: Christoph Biedl, Helge Deller
As reported earlier in this mail thread, all 32-bit Linux kernels since v6.16
fail to boot on the parisc architecture like this:
BUG: Bad page state in process swapper pfn:000f7
page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7
flags: 0x0(zone=0)
raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000
raw: 00000000
page dumped because: page_pool leak
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE
Hardware name: 9000/778/B160L
Backtrace:
[<106ece88>] bad_page+0x14c/0x17c
[<10406c50>] free_page_is_bad.part.0+0xd4/0xec
[<106ed180>] free_page_is_bad+0x80/0x88
[<106ef05c>] __free_pages_ok+0x374/0x508
[<1011d34c>] __free_pages_core+0x1f0/0x218
[<1011a2f0>] memblock_free_pages+0x68/0x94
[<10120324>] memblock_free_all+0x26c/0x310
[<1011a4d8>] mm_core_init+0x18c/0x208
[<10100e88>] start_kernel+0x4ec/0x7a0
[<101054d0>] start_parisc+0xb4/0xc4
git bisecting leads to this patch which triggers the crash:
commit ee62ce7a1d909ccba0399680a03c2dee83bcae95
Author: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Wed Apr 9 12:41:37 2025 +0200
page_pool: Track DMA-mapped pages and unmap them when destroying the pool
It turns out that the patch itself isn't wrong.
But it's the culprit which leads to the kernel bug since it modifies
PP_MAGIC_MASK for 32-bit kernels from:
-#define PP_MAGIC_MASK ~0x3UL
+#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
Function page_pool_page_is_pp() needs to unambiguously identify page pool
pages (using PP_MAGIC_MASK), but since the patch now reduced the valid bits to
check in PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c, the remaining bits are
not sufficient to unambiguously identify such pages any longer.
Because of that, page_pool_page_is_pp() sometimes wrongly reports pages as
page pool pages and as such triggers the kernel BUG as it believes it found a
page pool leak.
IMHO this is a generic 32-bit kernel issue, not just affecting parisc.
Do you see any options other than:
a) revert the patch (ee62ce7a1d90), or:
b) return false in page_pool_page_is_pp() when !defined(CONFIG_64BIT),
which means to effectively disable the page pool page test on 32bit
machines
Helge
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels
2025-09-11 22:12 ` boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels Helge Deller
@ 2025-09-12 7:57 ` David Hildenbrand
2025-09-12 14:04 ` Helge Deller
0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2025-09-12 7:57 UTC (permalink / raw)
To: Helge Deller, Toke Høiland-Jørgensen,
Linux Kernel Development, Linux Memory Management List,
linux-parisc
Cc: Christoph Biedl, Helge Deller, Byungchul Park
On 12.09.25 00:12, Helge Deller wrote:
> As reported earlier in this mail thread, all 32-bit Linux kernels since v6.16
> fail to boot on the parisc architecture like this:
>
> BUG: Bad page state in process swapper pfn:000f7
> page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7
> flags: 0x0(zone=0)
> raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000
> raw: 00000000
> page dumped because: page_pool leak
> Modules linked in:
> CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE
> Hardware name: 9000/778/B160L
> Backtrace:
> [<106ece88>] bad_page+0x14c/0x17c
> [<10406c50>] free_page_is_bad.part.0+0xd4/0xec
> [<106ed180>] free_page_is_bad+0x80/0x88
> [<106ef05c>] __free_pages_ok+0x374/0x508
> [<1011d34c>] __free_pages_core+0x1f0/0x218
> [<1011a2f0>] memblock_free_pages+0x68/0x94
> [<10120324>] memblock_free_all+0x26c/0x310
> [<1011a4d8>] mm_core_init+0x18c/0x208
> [<10100e88>] start_kernel+0x4ec/0x7a0
> [<101054d0>] start_parisc+0xb4/0xc4
>
> git bisecting leads to this patch which triggers the crash:
>
> commit ee62ce7a1d909ccba0399680a03c2dee83bcae95
> Author: Toke Høiland-Jørgensen <toke@redhat.com>
> Date: Wed Apr 9 12:41:37 2025 +0200
> page_pool: Track DMA-mapped pages and unmap them when destroying the pool
>
> It turns out that the patch itself isn't wrong.
>
> But it's the culprit which leads to the kernel bug since it modifies
> PP_MAGIC_MASK for 32-bit kernels from:
>
> -#define PP_MAGIC_MASK ~0x3UL
> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>
> Function page_pool_page_is_pp() needs to unambiguously identify page pool
> pages (using PP_MAGIC_MASK), but since the patch now reduced the valid bits to
> check in PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c, the remaining bits are
> not sufficient to unambiguously identify such pages any longer.
>
> Because of that, page_pool_page_is_pp() sometimes wrongly reports pages as
> page pool pages and as such triggers the kernel BUG as it believes it found a
> page pool leak.
>
> IMHO this is a generic 32-bit kernel issue, not just affecting parisc.
>
> Do you see any options other than:
> a) revert the patch (ee62ce7a1d90), or:
> b) return false in page_pool_page_is_pp() when !defined(CONFIG_64BIT),
> which means to effectively disable the page pool page test on 32bit
> machines
We should have a change coming soon that would use a page type and fix
it as well I think.
https://lkml.kernel.org/r/20250728052742.81294-1-byungchul@sk.com
Until then, the easiest fix would be indeed to go with b).
But maybe the page type thing could be backported?
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels
2025-09-12 7:57 ` David Hildenbrand
@ 2025-09-12 14:04 ` Helge Deller
0 siblings, 0 replies; 5+ messages in thread
From: Helge Deller @ 2025-09-12 14:04 UTC (permalink / raw)
To: David Hildenbrand, Helge Deller,
Toke Høiland-Jørgensen, Linux Kernel Development,
Linux Memory Management List, linux-parisc
Cc: Christoph Biedl, Byungchul Park
On 9/12/25 09:57, David Hildenbrand wrote:
> On 12.09.25 00:12, Helge Deller wrote:
>> As reported earlier in this mail thread, all 32-bit Linux kernels since v6.16
>> fail to boot on the parisc architecture like this:
>>
>> BUG: Bad page state in process swapper pfn:000f7
>> page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0xf7
>> flags: 0x0(zone=0)
>> raw: 00000000 118022c0 118022c0 00000000 00000000 00000000 ffffffff 00000000
>> raw: 00000000
>> page dumped because: page_pool leak
>> Modules linked in:
>> CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-32bit+ #2730 NONE
>> Hardware name: 9000/778/B160L
>> Backtrace:
>> [<106ece88>] bad_page+0x14c/0x17c
>> [<10406c50>] free_page_is_bad.part.0+0xd4/0xec
>> [<106ed180>] free_page_is_bad+0x80/0x88
>> [<106ef05c>] __free_pages_ok+0x374/0x508
>> [<1011d34c>] __free_pages_core+0x1f0/0x218
>> [<1011a2f0>] memblock_free_pages+0x68/0x94
>> [<10120324>] memblock_free_all+0x26c/0x310
>> [<1011a4d8>] mm_core_init+0x18c/0x208
>> [<10100e88>] start_kernel+0x4ec/0x7a0
>> [<101054d0>] start_parisc+0xb4/0xc4
>>
>> git bisecting leads to this patch which triggers the crash:
>>
>> commit ee62ce7a1d909ccba0399680a03c2dee83bcae95
>> Author: Toke Høiland-Jørgensen <toke@redhat.com>
>> Date: Wed Apr 9 12:41:37 2025 +0200
>> page_pool: Track DMA-mapped pages and unmap them when destroying the pool
>>
>> It turns out that the patch itself isn't wrong.
>>
>> But it's the culprit which leads to the kernel bug since it modifies
>> PP_MAGIC_MASK for 32-bit kernels from:
>>
>> -#define PP_MAGIC_MASK ~0x3UL
>> +#define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>
>> Function page_pool_page_is_pp() needs to unambiguously identify page pool
>> pages (using PP_MAGIC_MASK), but since the patch now reduced the valid bits to
>> check in PP_MAGIC_MASK from 0xFFFFFFFC to 0xc000007c, the remaining bits are
>> not sufficient to unambiguously identify such pages any longer.
>>
>> Because of that, page_pool_page_is_pp() sometimes wrongly reports pages as
>> page pool pages and as such triggers the kernel BUG as it believes it found a
>> page pool leak.
>>
>> IMHO this is a generic 32-bit kernel issue, not just affecting parisc.
>>
>> Do you see any options other than:
>> a) revert the patch (ee62ce7a1d90), or:
>> b) return false in page_pool_page_is_pp() when !defined(CONFIG_64BIT),
>> which means to effectively disable the page pool page test on 32bit
>> machines
>
> We should have a change coming soon that would use a page type and fix it as well I think.
>
> https://lkml.kernel.org/r/20250728052742.81294-1-byungchul@sk.com
>
> Until then, the easiest fix would be indeed to go with b).
Ok, I'll send a patch for b).
Thanks!
Helge
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-12 14:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-03 17:31 6.16-pre-rc1: BUG: Bad page state in process swapper on parisc Helge Deller
2025-08-27 21:31 ` Christoph Biedl
2025-09-11 22:12 ` boot failure because of inaccurate page_pool_page_is_pp() on 32-bit kernels Helge Deller
2025-09-12 7:57 ` David Hildenbrand
2025-09-12 14:04 ` Helge Deller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox