linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* BUG: Bad page state in process kworker/u32:1
@ 2026-03-11 16:12 Tj
  2026-03-11 16:17 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 3+ messages in thread
From: Tj @ 2026-03-11 16:12 UTC (permalink / raw)
  To: david; +Cc: linux-mm

On arm64, Qualcomm sdm845, an attempt to allocate and release a CMA for 
DMA fails. It seems to be caused by the recent commit 9bda131c6093e9c4 
"mm: cma: add cma_alloc_frozen{_compound}()" where cma_alloc() now calls 
set_page_refcounted() but cma_release() or its callees do not undo it, 
resulting in:

kernel: BUG: Bad page state in process kworker/u32:1  pfn:f4b00
kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 
pfn:0xf4b00
kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608 
0000000000000000
kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 
0000000000000000
kernel: page dumped because: nonzero _refcount

I've enabled pr_debug plus added in my own pr_info()s to track the 
callers. The following shows, first, my manual dump_stack() in 
__cma_alloc_frozen() in order to understand the callers, and immediately 
after the BUG.

The high-level activity is the Qualcomm coprocessor firmware loading 
that is preparing to set up a DMA buffer to pass data to the coprocessor.

kernel: ipa 1e40000.ipa: ipa_probe()
kernel: ipa 1e40000.ipa: ipa_firmware_loader()
kernel: ipa 1e40000.ipa: channel 4 limited to 256 TREs
kernel: ipa 1e40000.ipa: IPA driver initialized
kernel: ipa 1e40000.ipa: ipa_firmware_load()
kernel: ipa 1e40000.ipa: request_firmware()
kernel: ipa 1e40000.ipa: fw_get_filesystem_firmware()
kernel: ipa 1e40000.ipa: Firmware loaded: 
qcom/sdm850/samsung/w737/ipa_fws.elf
kernel: ipa 1e40000.ipa: ipa_firmware_load() = 0
kernel: ipa 1e40000.ipa: ipa_firmware_load() calling qcom_mdt_load()
kernel: ipa 1e40000.ipa: qcom_mdt_load()
kernel: ipa 1e40000.ipa: __qcom_mdt_pas_init()
kernel: qcom_scm firmware:scm: qcom_scmp_pas_init_image( id=15, 
metadata=00000000239bef84, size=6812, ctx=0000000000000000 )
kernel: cma: __cma_alloc_frozen(cma 000000003df15a7c, name: reserved, 
count 2, align 1)
kernel: CPU: 1 UID: 0 PID: 56 Comm: kworker/u32:1 Not tainted 
7.0.0-rc2-sdm845 #78 PREEMPTLAZY
kernel: Hardware name: SAMSUNG ELECTRONICS CO., LTD. Galaxy 
Book2/SM-W737YZSBTEL, BIOS P02AHG.005.190624.WY.1359 06/24/2019
kernel: Workqueue: events_unbound deferred_probe_work_func
kernel: Call trace:
kernel:  show_stack+0x20/0x38 (C)
kernel:  dump_stack_lvl+0x78/0x90
kernel:  dump_stack+0x18/0x28
kernel:  __cma_alloc_frozen+0x4c/0xa98
kernel:  cma_alloc+0x30/0x98
kernel:  cma_alloc_aligned+0x48/0x78
kernel:  dma_alloc_contiguous+0x38/0x58
kernel:  __dma_direct_alloc_pages.constprop.0+0xd4/0x430
kernel:  dma_direct_alloc+0xdc/0x3d0
kernel:  dma_alloc_attrs+0x98/0x488
kernel:  qcom_scm_pas_init_image+0x148/0x228
kernel:  __qcom_mdt_pas_init+0x138/0x240
kernel:  qcom_mdt_load+0x6c/0xb8
kernel:  ipa_probe+0xe80/0x13c0
kernel:  platform_probe+0x64/0xa8
kernel:  really_probe+0xc8/0x3f0
kernel:  __driver_probe_device+0x88/0x190
kernel:  driver_probe_device+0x44/0x120
kernel:  __device_attach_driver+0xc4/0x178
kernel:  bus_for_each_drv+0x8c/0xf0
kernel:  __device_attach+0xa4/0x1d0
kernel:  device_initial_probe+0x58/0x68
kernel:  bus_probe_device+0x40/0xb8
kernel:  deferred_probe_work_func+0xc0/0x128
kernel:  process_one_work+0x17c/0x4e8
kernel:  worker_thread+0x198/0x330
kernel:  kthread+0x13c/0x150
kernel:  ret_from_fork+0x10/0x20
kernel: cma: __cma_alloc_frozen(): returned 00000000585b858d
kernel: qcom_scm firmware:scm: __qcom_scmp_pas_init_image()
kernel: qcom_scm firmware:scm:   qcom_scm_call) = 0
kernel: qcom_scm firmware:scm:   called qcom_scm_bw_disable()
kernel: qcom_scm firmware:scm:   called qcom_scm_clk_disable()
kernel: qcom_scm firmware:scm: __qcom_scmp_pas_init_image() = 0
kernel: cma: find_cma_memrange(page 00000000585b858d, count 2)
kernel: cma: __cma_release_frozen(page 00000000585b858d, count 2)
kernel: BUG: Bad page state in process kworker/u32:1  pfn:f4b00
kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 
pfn:0xf4b00
kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608 
0000000000000000
kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 
0000000000000000
kernel: page dumped because: nonzero _refcount
kernel: Modules linked in:
kernel: CPU: 4 UID: 0 PID: 56 Comm: kworker/u32:1 Not tainted 
7.0.0-rc2-sdm845 #78 PREEMPTLAZY
kernel: Hardware name: SAMSUNG ELECTRONICS CO., LTD. Galaxy 
Book2/SM-W737YZSBTEL, BIOS P02AHG.005.190624.WY.1359 06/24/2019
kernel: Workqueue: events_unbound deferred_probe_work_func
kernel: Call trace:
kernel:  show_stack+0x20/0x38 (C)
kernel:  dump_stack_lvl+0x78/0x90
kernel:  dump_stack+0x18/0x28
kernel:  bad_page+0x8c/0x138
kernel:  __free_frozen_pages+0x4dc/0x778
kernel:  free_contig_frozen_range+0xd8/0x128
kernel:  cma_release+0xf8/0x378
kernel:  dma_free_contiguous+0x34/0x88
kernel:  dma_direct_free+0x100/0x188
kernel:  dma_free_attrs+0x90/0x248
kernel:  qcom_scm_pas_init_image+0x1a4/0x228
kernel:  __qcom_mdt_pas_init+0x138/0x240
kernel:  qcom_mdt_load+0x6c/0xb8
kernel:  ipa_probe+0xe80/0x13c0
kernel:  platform_probe+0x64/0xa8
kernel:  really_probe+0xc8/0x3f0
kernel:  __driver_probe_device+0x88/0x190
kernel:  driver_probe_device+0x44/0x120
kernel:  __device_attach_driver+0xc4/0x178
kernel:  bus_for_each_drv+0x8c/0xf0
kernel:  __device_attach+0xa4/0x1d0
kernel:  device_initial_probe+0x58/0x68
kernel:  bus_probe_device+0x40/0xb8
kernel:  deferred_probe_work_func+0xc0/0x128
kernel:  process_one_work+0x17c/0x4e8
kernel:  worker_thread+0x198/0x330
kernel:  kthread+0x13c/0x150
kernel:  ret_from_fork+0x10/0x20
kernel: Disabling lock debugging due to kernel taint




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: Bad page state in process kworker/u32:1
  2026-03-11 16:12 BUG: Bad page state in process kworker/u32:1 Tj
@ 2026-03-11 16:17 ` David Hildenbrand (Arm)
  2026-03-11 18:27   ` Tj
  0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-11 16:17 UTC (permalink / raw)
  To: Tj; +Cc: linux-mm

On 3/11/26 17:12, Tj wrote:
> On arm64, Qualcomm sdm845, an attempt to allocate and release a CMA for 
> DMA fails. It seems to be caused by the recent commit 9bda131c6093e9c4 
> "mm: cma: add cma_alloc_frozen{_compound}()" where cma_alloc() now calls 
> set_page_refcounted() but cma_release() or its callees do not undo it, 
> resulting in:
> 
> kernel: BUG: Bad page state in process kworker/u32:1  pfn:f4b00
> kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 
> pfn:0xf4b00
> kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
> kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608 
> 0000000000000000
> kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff 
> 0000000000000000
> kernel: page dumped because: nonzero _refcount
> 
> I've enabled pr_debug plus added in my own pr_info()s to track the 
> callers. The following shows, first, my manual dump_stack() in 
> __cma_alloc_frozen() in order to understand the callers, and immediately 
> after the BUG.

This might be fixed by

commit f4355d6bb39fc8e53d772fa0654c8441b214e349
Author: Zi Yan <ziy@nvidia.com>
Date:   Tue Feb 24 22:12:31 2026 -0500

    mm/cma: move put_page_testzero() out of VM_WARN_ON in cma_release()
    
    When CONFIG_DEBUG_VM is not set, VM_WARN_ON is a NOP.  Putting any
    statement with side effect inside it is incorrect.  Collect all
    !put_page_testzero() results and check the sum using WARN instead after
    the loop.  It restores the same check in free_contig_range() before commit
    e0c1326779cc ("mm: page_alloc: add alloc_contig_frozen_{range,pages}()"),
    the commit prior to the Fixes one.
    
    Link: https://lkml.kernel.org/r/20260225031231.2352011-1-ziy@nvidia.com

Can you double-check?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: Bad page state in process kworker/u32:1
  2026-03-11 16:17 ` David Hildenbrand (Arm)
@ 2026-03-11 18:27   ` Tj
  0 siblings, 0 replies; 3+ messages in thread
From: Tj @ 2026-03-11 18:27 UTC (permalink / raw)
  To: David Hildenbrand (Arm), akpm; +Cc: linux-mm

That fixed it, thank-you. Lost a week due to this thinking I'd done 
something wrong since I'm working on getting kernel working on Samsung 
Galaxy Book2 W737 and many bits from device-tree to drivers are problematic.

On 11/03/2026 16:17, David Hildenbrand (Arm) wrote:
> On 3/11/26 17:12, Tj wrote:
>> On arm64, Qualcomm sdm845, an attempt to allocate and release a CMA for
>> DMA fails. It seems to be caused by the recent commit 9bda131c6093e9c4
>> "mm: cma: add cma_alloc_frozen{_compound}()" where cma_alloc() now calls
>> set_page_refcounted() but cma_release() or its callees do not undo it,
>> resulting in:
>>
>> kernel: BUG: Bad page state in process kworker/u32:1  pfn:f4b00
>> kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
>> pfn:0xf4b00
>> kernel: flags: 0x1ffe00000000000(node=0|zone=0|lastcpupid=0xfff) CMA
>> kernel: raw: 01ffe00000000000 fffffdffc1d2c048 ffff800080353608
>> 0000000000000000
>> kernel: raw: 0000000000000000 0000000000000000 00000001ffffffff
>> 0000000000000000
>> kernel: page dumped because: nonzero _refcount
>>
>> I've enabled pr_debug plus added in my own pr_info()s to track the
>> callers. The following shows, first, my manual dump_stack() in
>> __cma_alloc_frozen() in order to understand the callers, and immediately
>> after the BUG.
> This might be fixed by
>
> commit f4355d6bb39fc8e53d772fa0654c8441b214e349
> Author: Zi Yan <ziy@nvidia.com>
> Date:   Tue Feb 24 22:12:31 2026 -0500
>
>      mm/cma: move put_page_testzero() out of VM_WARN_ON in cma_release()
>
>      When CONFIG_DEBUG_VM is not set, VM_WARN_ON is a NOP.  Putting any
>      statement with side effect inside it is incorrect.  Collect all
>      !put_page_testzero() results and check the sum using WARN instead after
>      the loop.  It restores the same check in free_contig_range() before commit
>      e0c1326779cc ("mm: page_alloc: add alloc_contig_frozen_{range,pages}()"),
>      the commit prior to the Fixes one.
>
>      Link: https://lkml.kernel.org/r/20260225031231.2352011-1-ziy@nvidia.com
>
> Can you double-check?
>
> --
> Cheers,
>
> David



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-11 18:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-11 16:12 BUG: Bad page state in process kworker/u32:1 Tj
2026-03-11 16:17 ` David Hildenbrand (Arm)
2026-03-11 18:27   ` Tj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox