* BUG: Bad page state in process kworker/u32:1
@ 2026-03-11 16:12 Tj
2026-03-11 16:17 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 3+ messages in thread
From: Tj @ 2026-03-11 16:12 UTC (permalink / raw)
To: david; +Cc: linux-mm
On arm64, Qualcomm sdm845, an attempt to allocate and release a CMA for
DMA fails. It seems to be caused by the recent commit 9bda131c6093e9c4
"mm: cma: add cma_alloc_frozen{_compound}()" where cma_alloc() now calls
set_page_refcounted() but cma_release() or its callees do not undo it,
resulting in:
kernel: BUG: Bad page state in process kworker/u32:1 pfn:f4b00
kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
pfn:0xf4b00
kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608
0000000000000000
kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
kernel: page dumped because: nonzero _refcount
I've enabled pr_debug plus added in my own pr_info()s to track the
callers. The following shows, first, my manual dump_stack() in
__cma_alloc_frozen() in order to understand the callers, and immediately
after the BUG.
The high-level activity is the Qualcomm coprocessor firmware loading
that is preparing to set up a DMA buffer to pass data to the coprocessor.
kernel: ipa 1e40000.ipa: ipa_probe()
kernel: ipa 1e40000.ipa: ipa_firmware_loader()
kernel: ipa 1e40000.ipa: channel 4 limited to 256 TREs
kernel: ipa 1e40000.ipa: IPA driver initialized
kernel: ipa 1e40000.ipa: ipa_firmware_load()
kernel: ipa 1e40000.ipa: request_firmware()
kernel: ipa 1e40000.ipa: fw_get_filesystem_firmware()
kernel: ipa 1e40000.ipa: Firmware loaded:
qcom/sdm850/samsung/w737/ipa_fws.elf
kernel: ipa 1e40000.ipa: ipa_firmware_load() = 0
kernel: ipa 1e40000.ipa: ipa_firmware_load() calling qcom_mdt_load()
kernel: ipa 1e40000.ipa: qcom_mdt_load()
kernel: ipa 1e40000.ipa: __qcom_mdt_pas_init()
kernel: qcom_scm firmware:scm: qcom_scmp_pas_init_image( id=15,
metadata=00000000239bef84, size=6812, ctx=0000000000000000 )
kernel: cma: __cma_alloc_frozen(cma 000000003df15a7c, name: reserved,
count 2, align 1)
kernel: CPU: 1 UID: 0 PID: 56 Comm: kworker/u32:1 Not tainted
7.0.0-rc2-sdm845 #78 PREEMPTLAZY
kernel: Hardware name: SAMSUNG ELECTRONICS CO., LTD. Galaxy
Book2/SM-W737YZSBTEL, BIOS P02AHG.005.190624.WY.1359 06/24/2019
kernel: Workqueue: events_unbound deferred_probe_work_func
kernel: Call trace:
kernel: show_stack+0x20/0x38 (C)
kernel: dump_stack_lvl+0x78/0x90
kernel: dump_stack+0x18/0x28
kernel: __cma_alloc_frozen+0x4c/0xa98
kernel: cma_alloc+0x30/0x98
kernel: cma_alloc_aligned+0x48/0x78
kernel: dma_alloc_contiguous+0x38/0x58
kernel: __dma_direct_alloc_pages.constprop.0+0xd4/0x430
kernel: dma_direct_alloc+0xdc/0x3d0
kernel: dma_alloc_attrs+0x98/0x488
kernel: qcom_scm_pas_init_image+0x148/0x228
kernel: __qcom_mdt_pas_init+0x138/0x240
kernel: qcom_mdt_load+0x6c/0xb8
kernel: ipa_probe+0xe80/0x13c0
kernel: platform_probe+0x64/0xa8
kernel: really_probe+0xc8/0x3f0
kernel: __driver_probe_device+0x88/0x190
kernel: driver_probe_device+0x44/0x120
kernel: __device_attach_driver+0xc4/0x178
kernel: bus_for_each_drv+0x8c/0xf0
kernel: __device_attach+0xa4/0x1d0
kernel: device_initial_probe+0x58/0x68
kernel: bus_probe_device+0x40/0xb8
kernel: deferred_probe_work_func+0xc0/0x128
kernel: process_one_work+0x17c/0x4e8
kernel: worker_thread+0x198/0x330
kernel: kthread+0x13c/0x150
kernel: ret_from_fork+0x10/0x20
kernel: cma: __cma_alloc_frozen(): returned 00000000585b858d
kernel: qcom_scm firmware:scm: __qcom_scmp_pas_init_image()
kernel: qcom_scm firmware:scm: qcom_scm_call) = 0
kernel: qcom_scm firmware:scm: called qcom_scm_bw_disable()
kernel: qcom_scm firmware:scm: called qcom_scm_clk_disable()
kernel: qcom_scm firmware:scm: __qcom_scmp_pas_init_image() = 0
kernel: cma: find_cma_memrange(page 00000000585b858d, count 2)
kernel: cma: __cma_release_frozen(page 00000000585b858d, count 2)
kernel: BUG: Bad page state in process kworker/u32:1 pfn:f4b00
kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
pfn:0xf4b00
kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608
0000000000000000
kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
kernel: page dumped because: nonzero _refcount
kernel: Modules linked in:
kernel: CPU: 4 UID: 0 PID: 56 Comm: kworker/u32:1 Not tainted
7.0.0-rc2-sdm845 #78 PREEMPTLAZY
kernel: Hardware name: SAMSUNG ELECTRONICS CO., LTD. Galaxy
Book2/SM-W737YZSBTEL, BIOS P02AHG.005.190624.WY.1359 06/24/2019
kernel: Workqueue: events_unbound deferred_probe_work_func
kernel: Call trace:
kernel: show_stack+0x20/0x38 (C)
kernel: dump_stack_lvl+0x78/0x90
kernel: dump_stack+0x18/0x28
kernel: bad_page+0x8c/0x138
kernel: __free_frozen_pages+0x4dc/0x778
kernel: free_contig_frozen_range+0xd8/0x128
kernel: cma_release+0xf8/0x378
kernel: dma_free_contiguous+0x34/0x88
kernel: dma_direct_free+0x100/0x188
kernel: dma_free_attrs+0x90/0x248
kernel: qcom_scm_pas_init_image+0x1a4/0x228
kernel: __qcom_mdt_pas_init+0x138/0x240
kernel: qcom_mdt_load+0x6c/0xb8
kernel: ipa_probe+0xe80/0x13c0
kernel: platform_probe+0x64/0xa8
kernel: really_probe+0xc8/0x3f0
kernel: __driver_probe_device+0x88/0x190
kernel: driver_probe_device+0x44/0x120
kernel: __device_attach_driver+0xc4/0x178
kernel: bus_for_each_drv+0x8c/0xf0
kernel: __device_attach+0xa4/0x1d0
kernel: device_initial_probe+0x58/0x68
kernel: bus_probe_device+0x40/0xb8
kernel: deferred_probe_work_func+0xc0/0x128
kernel: process_one_work+0x17c/0x4e8
kernel: worker_thread+0x198/0x330
kernel: kthread+0x13c/0x150
kernel: ret_from_fork+0x10/0x20
kernel: Disabling lock debugging due to kernel taint
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: BUG: Bad page state in process kworker/u32:1
2026-03-11 16:12 BUG: Bad page state in process kworker/u32:1 Tj
@ 2026-03-11 16:17 ` David Hildenbrand (Arm)
2026-03-11 18:27 ` Tj
0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-11 16:17 UTC (permalink / raw)
To: Tj; +Cc: linux-mm
On 3/11/26 17:12, Tj wrote:
> On arm64, Qualcomm sdm845, an attempt to allocate and release a CMA for
> DMA fails. It seems to be caused by the recent commit 9bda131c6093e9c4
> "mm: cma: add cma_alloc_frozen{_compound}()" where cma_alloc() now calls
> set_page_refcounted() but cma_release() or its callees do not undo it,
> resulting in:
>
> kernel: BUG: Bad page state in process kworker/u32:1 pfn:f4b00
> kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
> pfn:0xf4b00
> kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
> kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608
> 0000000000000000
> kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff
> 0000000000000000
> kernel: page dumped because: nonzero _refcount
>
> I've enabled pr_debug plus added in my own pr_info()s to track the
> callers. The following shows, first, my manual dump_stack() in
> __cma_alloc_frozen() in order to understand the callers, and immediately
> after the BUG.
This might be fixed by
commit f4355d6bb39fc8e53d772fa0654c8441b214e349
Author: Zi Yan <ziy@nvidia.com>
Date: Tue Feb 24 22:12:31 2026 -0500
mm/cma: move put_page_testzero() out of VM_WARN_ON in cma_release()
When CONFIG_DEBUG_VM is not set, VM_WARN_ON is a NOP. Putting any
statement with side effect inside it is incorrect. Collect all
!put_page_testzero() results and check the sum using WARN instead after
the loop. It restores the same check in free_contig_range() before commit
e0c1326779cc ("mm: page_alloc: add alloc_contig_frozen_{range,pages}()"),
the commit prior to the Fixes one.
Link: https://lkml.kernel.org/r/20260225031231.2352011-1-ziy@nvidia.com
Can you double-check?
--
Cheers,
David
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: BUG: Bad page state in process kworker/u32:1
2026-03-11 16:17 ` David Hildenbrand (Arm)
@ 2026-03-11 18:27 ` Tj
0 siblings, 0 replies; 3+ messages in thread
From: Tj @ 2026-03-11 18:27 UTC (permalink / raw)
To: David Hildenbrand (Arm), akpm; +Cc: linux-mm
That fixed it, thank-you. Lost a week due to this thinking I'd done
something wrong since I'm working on getting kernel working on Samsung
Galaxy Book2 W737 and many bits from device-tree to drivers are problematic.
On 11/03/2026 16:17, David Hildenbrand (Arm) wrote:
> On 3/11/26 17:12, Tj wrote:
>> On arm64, Qualcomm sdm845, an attempt to allocate and release a CMA for
>> DMA fails. It seems to be caused by the recent commit 9bda131c6093e9c4
>> "mm: cma: add cma_alloc_frozen{_compound}()" where cma_alloc() now calls
>> set_page_refcounted() but cma_release() or its callees do not undo it,
>> resulting in:
>>
>> kernel: BUG: Bad page state in process kworker/u32:1 pfn:f4b00
>> kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
>> pfn:0xf4b00
>> kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
>> kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608
>> 0000000000000000
>> kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff
>> 0000000000000000
>> kernel: page dumped because: nonzero _refcount
>>
>> I've enabled pr_debug plus added in my own pr_info()s to track the
>> callers. The following shows, first, my manual dump_stack() in
>> __cma_alloc_frozen() in order to understand the callers, and immediately
>> after the BUG.
> This might be fixed by
>
> commit f4355d6bb39fc8e53d772fa0654c8441b214e349
> Author: Zi Yan <ziy@nvidia.com>
> Date: Tue Feb 24 22:12:31 2026 -0500
>
> mm/cma: move put_page_testzero() out of VM_WARN_ON in cma_release()
>
> When CONFIG_DEBUG_VM is not set, VM_WARN_ON is a NOP. Putting any
> statement with side effect inside it is incorrect. Collect all
> !put_page_testzero() results and check the sum using WARN instead after
> the loop. It restores the same check in free_contig_range() before commit
> e0c1326779cc ("mm: page_alloc: add alloc_contig_frozen_{range,pages}()"),
> the commit prior to the Fixes one.
>
> Link: https://lkml.kernel.org/r/20260225031231.2352011-1-ziy@nvidia.com
>
> Can you double-check?
>
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-11 18:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-11 16:12 BUG: Bad page state in process kworker/u32:1 Tj
2026-03-11 16:17 ` David Hildenbrand (Arm)
2026-03-11 18:27 ` Tj
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox