* mm: Regression with v7.0-rc1 on RISC-V
@ 2026-02-24 8:37 Ron Economos
2026-02-24 11:00 ` David Hildenbrand (Arm)
2026-02-24 12:58 ` Kefeng Wang
0 siblings, 2 replies; 11+ messages in thread
From: Ron Economos @ 2026-02-24 8:37 UTC (permalink / raw)
To: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv
Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song,
osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
Mark Brown, akpm, pjw
I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
[ OK ] Reached target shutdown.target - System Shutdown.
[ OK ] Reached target final.target - Late Shutdown Services.
[ OK ] Finished systemd-reboot.service - System Reboot.
[ OK ] Reached target reboot.target - System Reboot.
[ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
[ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xf8850
[ 173.985336] flags: 0xffff80000000000(node=0|zone=0|lastcpupid=0x1ffff) CMA
[ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 0000000000000000
[ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 173.985403] page dumped because: nonzero _refcount
[ 173.985418] Modules linked in: qrtr cfg80211 da9063_onkey binfmt_misc at24 lm90 nls_iso8859_1 pwm_fan sch_fq_codel dm_multipath
nvme_fabrics efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc nfnetlink autofs4 btrfs libblake2b raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear mmc_spi crc7 of_mmc_spi rtc_da9063
crc_itu_t da9063_regulator mscc nvme macsec phy_package nvme_core nvme_keyring nvme_auth hkdf macb phylink spi_sifive i2c_ocores
[ 173.985802] CPU: 3 UID: 0 PID: 1 Comm: shutdown Not tainted 7.0.0-rc1 #2 PREEMPT(full)
[ 173.985814] Hardware name: SiFive HiFive Unmatched A00 (DT)
[ 173.985820] Call Trace:
[ 173.985827] [<ffffffff8001bfd4>] dump_backtrace+0x1c/0x30
[ 173.985850] [<ffffffff80002522>] show_stack+0x2a/0x3c
[ 173.985861] [<ffffffff8001395e>] dump_stack_lvl+0x46/0x6c
[ 173.985877] [<ffffffff80013998>] dump_stack+0x14/0x1e
[ 173.985886] [<ffffffff80317fe4>] bad_page+0x12c/0x170
[ 173.985905] [<ffffffff8031bb14>] __free_frozen_pages+0x634/0x830
[ 173.985919] [<ffffffff8031bd88>] __free_contig_frozen_range+0x78/0xe8
[ 173.985929] [<ffffffff8031bec0>] free_contig_frozen_range+0xc8/0x1e8
[ 173.985940] [<ffffffff803a15a0>] __cma_release_frozen+0x60/0x1c8
[ 173.985959] [<ffffffff803a27c6>] cma_release+0x26/0x40
[ 173.985969] [<ffffffff800fcf96>] dma_free_contiguous+0xde/0x178
[ 173.985981] [<ffffffff800fbba4>] dma_direct_free+0x104/0x198
[ 173.985990] [<ffffffff800fa19e>] dma_free_attrs+0x66/0x1e8
[ 173.986002] [<ffffffff028780ec>] macb_free_consistent+0xbc/0x160 [macb]
[ 173.986132] [<ffffffff02878274>] macb_close+0xe4/0x120 [macb]
[ 173.986169] [<ffffffff80cae9fe>] __dev_close_many+0xa6/0x1d0
[ 173.986184] [<ffffffff80caeb94>] netif_close_many+0x6c/0x118
[ 173.986193] [<ffffffff80caec98>] netif_close+0x58/0x70
[ 173.986201] [<ffffffff80cbab84>] dev_close+0x6c/0x98
[ 173.986216] [<ffffffff02874518>] macb_shutdown+0x48/0x68 [macb]
[ 173.986254] [<ffffffff809bc9c6>] platform_shutdown+0x16/0x30
[ 173.986277] [<ffffffff809b7126>] device_shutdown+0x11e/0x208
[ 173.986287] [<ffffffff80070f26>] kernel_restart+0x36/0x98
[ 173.986304] [<ffffffff80071276>] __do_sys_reboot+0x14e/0x230
[ 173.986314] [<ffffffff8007136e>] __riscv_sys_reboot+0x16/0x28
[ 173.986324] [<ffffffff80f26d6a>] do_trap_ecall_u+0x1aa/0x578
[ 173.986339] [<ffffffff80f37c58>] handle_exception+0x168/0x174
[ 173.986359] Disabling lock debugging due to kernel taint
I've bisected it to commit "mm: cma: add cma_alloc_frozen{_compound}()" 9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f
Here's the bisect log.
git bisect start
# status: waiting for both good and bad commits
# bad: [cee73b1e840c154f64ace682cb477c1ae2e29cc4] Merge tag 'riscv-for-linus-7.0-mw1' of
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
git bisect bad cee73b1e840c154f64ace682cb477c1ae2e29cc4
# status: waiting for good commit(s), bad commit known
# good: [8383522821c6fea6bbb4bc0317056c433a482a95] Merge branch 'net-mscc-ocelot-fix-missing-lock-in-ocelot_port_xmit'
git bisect good 8383522821c6fea6bbb4bc0317056c433a482a95
# good: [37a93dd5c49b5fda807fd204edf2547c3493319c] Merge tag 'net-next-7.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect good 37a93dd5c49b5fda807fd204edf2547c3493319c
# bad: [4cff5c05e076d2ee4e34122aa956b84a2eaac587] Merge tag 'mm-stable-2026-02-11-19-22' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 4cff5c05e076d2ee4e34122aa956b84a2eaac587
# bad: [fde8353121aa304ee88542f011dd5dc83ced47e4] selftests/mm: report SKIP in pfnmap if a check fails
git bisect bad fde8353121aa304ee88542f011dd5dc83ced47e4
# good: [743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6] Revert "mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc"
git bisect good 743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6
# bad: [2f5e576598c915db18b7ccd0003be52458959ce7] powerpc/mm: implement *_user_accessible_page() for ptes
git bisect bad 2f5e576598c915db18b7ccd0003be52458959ce7
# good: [6c08cc64d194dc5cc3dfc785517098d3b161c05f] mm: cma: kill cma_pages_valid()
git bisect good 6c08cc64d194dc5cc3dfc785517098d3b161c05f
# bad: [4bdd692291275eaaabe993e1c4a7b5b01cd6dc37] mm/damon/lru_sort: add monitoring intervals auto-tuning parameter
git bisect bad 4bdd692291275eaaabe993e1c4a7b5b01cd6dc37
# bad: [fbec8a1e4fa4daf2611c9a3e3b29d03a73acbd0c] mm/damon/sysfs-schemes: support DAMOS_QUOTA_[IN]ACTIVE_MEM_BP
git bisect bad fbec8a1e4fa4daf2611c9a3e3b29d03a73acbd0c
# bad: [14f270761d3374db24c84630f2aa7a3c732fed4a] mm: hugetlb: allocate frozen pages for gigantic allocation
git bisect bad 14f270761d3374db24c84630f2aa7a3c732fed4a
# bad: [9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f] mm: cma: add cma_alloc_frozen{_compound}()
git bisect bad 9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f
# good: [e0c1326779cc1b8e3a9e30ae273b89202ed4c82c] mm: page_alloc: add alloc_contig_frozen_{range,pages}()
git bisect good e0c1326779cc1b8e3a9e30ae273b89202ed4c82c
# first bad commit: [9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f] mm: cma: add cma_alloc_frozen{_compound}()
It doesn't revert cleanly, so I couldn't try that.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-24 8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos
@ 2026-02-24 11:00 ` David Hildenbrand (Arm)
[not found] ` <1966378802.577797.1771952827516@app.mailbox.org>
2026-02-24 12:58 ` Kefeng Wang
1 sibling, 1 reply; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-24 11:00 UTC (permalink / raw)
To: Ron Economos, wangkefeng.wang, linux-mm, linux-kernel, linux-riscv
Cc: ziy, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador,
sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm,
pjw
On 2/24/26 09:37, Ron Economos wrote:
> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>
> [ OK ] Reached target shutdown.target - System Shutdown.
> [ OK ] Reached target final.target - Late Shutdown Services.
> [ OK ] Finished systemd-reboot.service - System Reboot.
> [ OK ] Reached target reboot.target - System Reboot.
> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0xf8850
> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0|
> lastcpupid=0x1ffff) CMA
> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
> 0000000000000000
> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
> 0000000000000000
> [ 173.985403] page dumped because: nonzero _refcount
So, we're freeing something from CMA in cma_release().
In cma_release() we iterate all pages to decrement their refcount
VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
I would expect that this would fire already if there is still a page
referenced.
Are you running with CONFIG_DEBUG_VM=y ?
--
Cheers,
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-24 8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos
2026-02-24 11:00 ` David Hildenbrand (Arm)
@ 2026-02-24 12:58 ` Kefeng Wang
2026-02-24 13:25 ` Ron Economos
1 sibling, 1 reply; 11+ messages in thread
From: Kefeng Wang @ 2026-02-24 12:58 UTC (permalink / raw)
To: Ron Economos, linux-mm, linux-kernel, linux-riscv
Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song,
osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
Mark Brown, akpm, pjw
On 2026/2/24 16:37, Ron Economos wrote:
> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>
> [ OK ] Reached target shutdown.target - System Shutdown.
> [ OK ] Reached target final.target - Late Shutdown Services.
> [ OK ] Finished systemd-reboot.service - System Reboot.
> [ OK ] Reached target reboot.target - System Reboot.
> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0xf8850
> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0|
> lastcpupid=0x1ffff) CMA
> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
> 0000000000000000
> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
> 0000000000000000
> [ 173.985403] page dumped because: nonzero _refcount
The refcount set to 1 when cma_alloc() by set_page_refcounted(), and it
will be dec to 0 in cma_release() by put_page_testzero(), there may be a
problem somewhere else?
Could you enable CONFIG_DEBUG_VM and CONFIG_DEBUG_PAGE_REF, and try to
track down page reference manipulation by tracepoint? or for CMA-related
pages, introduce explicit printk both increments and decrements of the
page reference count to identify the root cause of the issue.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-24 12:58 ` Kefeng Wang
@ 2026-02-24 13:25 ` Ron Economos
0 siblings, 0 replies; 11+ messages in thread
From: Ron Economos @ 2026-02-24 13:25 UTC (permalink / raw)
To: Kefeng Wang, linux-mm, linux-kernel, linux-riscv
Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song,
osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
Mark Brown, akpm, pjw
On 2/24/26 04:58, Kefeng Wang wrote:
>
>
> On 2026/2/24 16:37, Ron Economos wrote:
>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>
>> [ OK ] Reached target shutdown.target - System Shutdown.
>> [ OK ] Reached target final.target - Late Shutdown Services.
>> [ OK ] Finished systemd-reboot.service - System Reboot.
>> [ OK ] Reached target reboot.target - System Reboot.
>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xf8850
>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| lastcpupid=0x1ffff) CMA
>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 0000000000000000
>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>> [ 173.985403] page dumped because: nonzero _refcount
>
> The refcount set to 1 when cma_alloc() by set_page_refcounted(), and it
> will be dec to 0 in cma_release() by put_page_testzero(), there may be a problem somewhere else?
>
> Could you enable CONFIG_DEBUG_VM and CONFIG_DEBUG_PAGE_REF, and try to track down page reference manipulation by tracepoint? or
> for CMA-related
> pages, introduce explicit printk both increments and decrements of the page reference count to identify the root cause of the issue.
>
Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
[not found] ` <1966378802.577797.1771952827516@app.mailbox.org>
@ 2026-02-24 17:14 ` Zi Yan
2026-02-24 17:17 ` Zi Yan
2026-02-24 17:21 ` Mark Brown
1 sibling, 1 reply; 11+ messages in thread
From: Zi Yan @ 2026-02-24 17:14 UTC (permalink / raw)
To: David Hildenbrand
Cc: David Hildenbrand (Arm),
Ron Economos, wangkefeng.wang, linux-mm, linux-kernel,
linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song,
osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
Mark Brown, akpm, pjw
On 24 Feb 2026, at 12:07, David Hildenbrand wrote:
>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>
>>
>>
>>
>>
>> On 2/24/26 09:37, Ron Economos wrote:
>>
>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>
>>>
>>>
>>> [ OK ] Reached target shutdown.target - System Shutdown.
>>>
>>> [ OK ] Reached target final.target - Late Shutdown Services.
>>>
>>> [ OK ] Finished systemd-reboot.service - System Reboot.
>>>
>>> [ OK ] Reached target reboot.target - System Reboot.
>>>
>>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
>>>
>>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>
>>> index:0x0 pfn:0xf8850
>>>
>>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>
>>> lastcpupid=0x1ffff) CMA
>>>
>>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>
>>> 0000000000000000
>>>
>>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>
>>> 0000000000000000
>>>
>>> [ 173.985403] page dumped because: nonzero _refcount
>>
>> So, we're freeing something from CMA in cma_release().
>>
>>
>>
>> In cma_release() we iterate all pages to decrement their refcount
>>
>>
>>
>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>
>>
>>
>> I would expect that this would fire already if there is still a page
>>
>> referenced.
>>
>>
>>
>> Are you running with CONFIG_DEBUG_VM=y ?
>>
>>
>>
>>
>>
>> --
>>
>> Cheers,
>>
>>
>>
>> David
>
> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>
>
>
> At least that’s what I remember.
Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
and is_check_pages_enabled(), which leads to free_page_is_bad()’s
“page dumped because: nonzero _refcount”, are disabled.
It seems to me that someone else bump the page refcount between
VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-24 17:14 ` Zi Yan
@ 2026-02-24 17:17 ` Zi Yan
2026-02-24 17:29 ` Zi Yan
0 siblings, 1 reply; 11+ messages in thread
From: Zi Yan @ 2026-02-24 17:17 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Ron Economos, wangkefeng.wang, linux-mm, linux-kernel,
linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song,
osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
Mark Brown, akpm, pjw
On 24 Feb 2026, at 12:14, Zi Yan wrote:
> On 24 Feb 2026, at 12:07, David Hildenbrand wrote:
>
>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>>
>>>
>>>
>>>
>>>
>>> On 2/24/26 09:37, Ron Economos wrote:
>>>
>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>>
>>>>
>>>>
>>>> [ OK ] Reached target shutdown.target - System Shutdown.
>>>>
>>>> [ OK ] Reached target final.target - Late Shutdown Services.
>>>>
>>>> [ OK ] Finished systemd-reboot.service - System Reboot.
>>>>
>>>> [ OK ] Reached target reboot.target - System Reboot.
>>>>
>>>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
>>>>
>>>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>>
>>>> index:0x0 pfn:0xf8850
>>>>
>>>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>>
>>>> lastcpupid=0x1ffff) CMA
>>>>
>>>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>>
>>>> 0000000000000000
>>>>
>>>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>>
>>>> 0000000000000000
>>>>
>>>> [ 173.985403] page dumped because: nonzero _refcount
>>>
>>> So, we're freeing something from CMA in cma_release().
>>>
>>>
>>>
>>> In cma_release() we iterate all pages to decrement their refcount
>>>
>>>
>>>
>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>>
>>>
>>>
>>> I would expect that this would fire already if there is still a page
>>>
>>> referenced.
>>>
>>>
>>>
>>> Are you running with CONFIG_DEBUG_VM=y ?
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Cheers,
>>>
>>>
>>>
>>> David
>>
>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>
>>
>>
>> At least that’s what I remember.
>
> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
> “page dumped because: nonzero _refcount”, are disabled.
>
> It seems to me that someone else bump the page refcount between
> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>
Merging Ron’s reply from another thread[1]:
“Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
Looks like something is racy, since it is reproducible reliably.
[1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
[not found] ` <1966378802.577797.1771952827516@app.mailbox.org>
2026-02-24 17:14 ` Zi Yan
@ 2026-02-24 17:21 ` Mark Brown
1 sibling, 0 replies; 11+ messages in thread
From: Mark Brown @ 2026-02-24 17:21 UTC (permalink / raw)
To: David Hildenbrand
Cc: David Hildenbrand (Arm),
Ron Economos, wangkefeng.wang, linux-mm, linux-kernel,
linux-riscv, ziy, jackmanb, jane.chu, hannes, willy, muchun.song,
osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, akpm, pjw
[-- Attachment #1: Type: text/plain, Size: 369 bytes --]
On Tue, Feb 24, 2026 at 06:07:07PM +0100, David Hildenbrand wrote:
> <div class="default-style">Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?</div>
Pinged David off list about the mail being HTML - this was the new
comment for anyone else who's mail client just showed them the raw
HTML...
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-24 17:17 ` Zi Yan
@ 2026-02-24 17:29 ` Zi Yan
2026-02-24 20:55 ` Ron Economos
2026-02-25 1:58 ` Kefeng Wang
0 siblings, 2 replies; 11+ messages in thread
From: Zi Yan @ 2026-02-24 17:29 UTC (permalink / raw)
To: Ron Economos, David Hildenbrand (Arm)
Cc: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb,
jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar,
vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw
On 24 Feb 2026, at 12:17, Zi Yan wrote:
> On 24 Feb 2026, at 12:14, Zi Yan wrote:
>
>> On 24 Feb 2026, at 12:07, David Hildenbrand wrote:
>>
>>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2/24/26 09:37, Ron Economos wrote:
>>>>
>>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>>>
>>>>>
>>>>>
>>>>> [ OK ] Reached target shutdown.target - System Shutdown.
>>>>>
>>>>> [ OK ] Reached target final.target - Late Shutdown Services.
>>>>>
>>>>> [ OK ] Finished systemd-reboot.service - System Reboot.
>>>>>
>>>>> [ OK ] Reached target reboot.target - System Reboot.
>>>>>
>>>>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
>>>>>
>>>>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>>>
>>>>> index:0x0 pfn:0xf8850
>>>>>
>>>>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>>>
>>>>> lastcpupid=0x1ffff) CMA
>>>>>
>>>>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>>>
>>>>> 0000000000000000
>>>>>
>>>>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>>>
>>>>> 0000000000000000
>>>>>
>>>>> [ 173.985403] page dumped because: nonzero _refcount
>>>>
>>>> So, we're freeing something from CMA in cma_release().
>>>>
>>>>
>>>>
>>>> In cma_release() we iterate all pages to decrement their refcount
>>>>
>>>>
>>>>
>>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>>>
>>>>
>>>>
>>>> I would expect that this would fire already if there is still a page
>>>>
>>>> referenced.
>>>>
>>>>
>>>>
>>>> Are you running with CONFIG_DEBUG_VM=y ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Cheers,
>>>>
>>>>
>>>>
>>>> David
>>>
>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>
>>>
>>>
>>> At least that’s what I remember.
>>
>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>> “page dumped because: nonzero _refcount”, are disabled.
>>
>> It seems to me that someone else bump the page refcount between
>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>
>
> Merging Ron’s reply from another thread[1]:
>
> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>
> Looks like something is racy, since it is reproducible reliably.
>
> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
the validity of the expression is checked and no code is generated.
So that put_page_testzero() becomes a NOP.
Hi Ron,
Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?
diff --git a/mm/cma.c b/mm/cma.c
index 94b5da468a7d..96be62eb3713 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
return false;
pfn = page_to_pfn(pages);
- for (i = 0; i < count; i++, pfn++)
- VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
+ for (i = 0; i < count; i++, pfn++) {
+ int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
+
+ VM_WARN_ON(!ret);
+ }
__cma_release_frozen(cma, cmr, pages, count);
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-24 17:29 ` Zi Yan
@ 2026-02-24 20:55 ` Ron Economos
2026-02-25 1:58 ` Kefeng Wang
1 sibling, 0 replies; 11+ messages in thread
From: Ron Economos @ 2026-02-24 20:55 UTC (permalink / raw)
To: Zi Yan, David Hildenbrand (Arm)
Cc: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb,
jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar,
vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw
On 2/24/26 09:29, Zi Yan wrote:
> On 24 Feb 2026, at 12:17, Zi Yan wrote:
>
>> On 24 Feb 2026, at 12:14, Zi Yan wrote:
>>
>>> On 24 Feb 2026, at 12:07, David Hildenbrand wrote:
>>>
>>>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/24/26 09:37, Ron Economos wrote:
>>>>>
>>>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [ OK ] Reached target shutdown.target - System Shutdown.
>>>>>>
>>>>>> [ OK ] Reached target final.target - Late Shutdown Services.
>>>>>>
>>>>>> [ OK ] Finished systemd-reboot.service - System Reboot.
>>>>>>
>>>>>> [ OK ] Reached target reboot.target - System Reboot.
>>>>>>
>>>>>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
>>>>>>
>>>>>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>>>>
>>>>>> index:0x0 pfn:0xf8850
>>>>>>
>>>>>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>>>>
>>>>>> lastcpupid=0x1ffff) CMA
>>>>>>
>>>>>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>>>>
>>>>>> 0000000000000000
>>>>>>
>>>>>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>>>>
>>>>>> 0000000000000000
>>>>>>
>>>>>> [ 173.985403] page dumped because: nonzero _refcount
>>>>> So, we're freeing something from CMA in cma_release().
>>>>>
>>>>>
>>>>>
>>>>> In cma_release() we iterate all pages to decrement their refcount
>>>>>
>>>>>
>>>>>
>>>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>>>>
>>>>>
>>>>>
>>>>> I would expect that this would fire already if there is still a page
>>>>>
>>>>> referenced.
>>>>>
>>>>>
>>>>>
>>>>> Are you running with CONFIG_DEBUG_VM=y ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>>
>>>>> David
>>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>>
>>>>
>>>>
>>>> At least that’s what I remember.
>>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>>> “page dumped because: nonzero _refcount”, are disabled.
>>>
>>> It seems to me that someone else bump the page refcount between
>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>>
>> Merging Ron’s reply from another thread[1]:
>>
>> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
>> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>>
>> Looks like something is racy, since it is reproducible reliably.
>>
>> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
> VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
> the validity of the expression is checked and no code is generated.
> So that put_page_testzero() becomes a NOP.
>
> Hi Ron,
>
> Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?
>
> diff --git a/mm/cma.c b/mm/cma.c
> index 94b5da468a7d..96be62eb3713 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
> return false;
>
> pfn = page_to_pfn(pages);
> - for (i = 0; i < count; i++, pfn++)
> - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
> + for (i = 0; i < count; i++, pfn++) {
> + int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
> +
> + VM_WARN_ON(!ret);
> + }
>
> __cma_release_frozen(cma, cmr, pages, count);
>
>
>
> Best Regards,
> Yan, Zi
Yes, that patch fixes the issue.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-24 17:29 ` Zi Yan
2026-02-24 20:55 ` Ron Economos
@ 2026-02-25 1:58 ` Kefeng Wang
2026-02-25 2:15 ` Zi Yan
1 sibling, 1 reply; 11+ messages in thread
From: Kefeng Wang @ 2026-02-25 1:58 UTC (permalink / raw)
To: Zi Yan, Ron Economos, David Hildenbrand (Arm)
Cc: linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes,
willy, muchun.song, osalvador, sidhartha.kumar, vbabka,
claudiu.beznea.uj, Mark Brown, akpm, pjw
On 2026/2/25 1:29, Zi Yan wrote:
> On 24 Feb 2026, at 12:17, Zi Yan wrote:
...
>>>>
>>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>>
>>>>
>>>>
>>>> At least that’s what I remember.
>>>
>>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>>> “page dumped because: nonzero _refcount”, are disabled.
>>>
>>> It seems to me that someone else bump the page refcount between
>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>>
>>
>> Merging Ron’s reply from another thread[1]:
>>
>> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
>> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>>
>> Looks like something is racy, since it is reproducible reliably.
>>
>> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
>
> VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
> the validity of the expression is checked and no code is generated.
> So that put_page_testzero() becomes a NOP.
Indeed...
>
> Hi Ron,
>
> Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?
>
> diff --git a/mm/cma.c b/mm/cma.c
> index 94b5da468a7d..96be62eb3713 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
> return false;
>
> pfn = page_to_pfn(pages);
> - for (i = 0; i < count; i++, pfn++)
> - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
> + for (i = 0; i < count; i++, pfn++) {
> + int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
> +
> + VM_WARN_ON(!ret);
> + }
Maybe we only warn once by adding back the original check?
diff --git a/mm/cma.c b/mm/cma.c
index 94b5da468a7d..a73a22d34232 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -1014,14 +1014,17 @@ bool cma_release(struct cma *cma, const struct
page *pages,
{
struct cma_memrange *cmr;
unsigned long i, pfn;
+ int ret = 0;
cmr = find_cma_memrange(cma, pages, count);
if (!cmr)
return false;
pfn = page_to_pfn(pages);
- for (i = 0; i < count; i++, pfn++)
- VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
+ for (i = 0; i < count; i++, pfn++) {
+ ret + = put_page_testzero(pfn_to_page(pfn));
+
+ WARN(ret != 0, "%lu pages are still in use!\n", ret);
__cma_release_frozen(cma, cmr, pages, count);
>
> __cma_release_frozen(cma, cmr, pages, count);
>
>
>
> Best Regards,
> Yan, Zi
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V
2026-02-25 1:58 ` Kefeng Wang
@ 2026-02-25 2:15 ` Zi Yan
0 siblings, 0 replies; 11+ messages in thread
From: Zi Yan @ 2026-02-25 2:15 UTC (permalink / raw)
To: Kefeng Wang
Cc: Ron Economos, David Hildenbrand (Arm),
linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes,
willy, muchun.song, osalvador, sidhartha.kumar, vbabka,
claudiu.beznea.uj, Mark Brown, akpm, pjw
On 24 Feb 2026, at 20:58, Kefeng Wang wrote:
> On 2026/2/25 1:29, Zi Yan wrote:
>> On 24 Feb 2026, at 12:17, Zi Yan wrote:
>
> ...
>
>>>>>
>>>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>>>
>>>>>
>>>>>
>>>>> At least that’s what I remember.
>>>>
>>>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>>>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>>>> “page dumped because: nonzero _refcount”, are disabled.
>>>>
>>>> It seems to me that someone else bump the page refcount between
>>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>>>
>>>
>>> Merging Ron’s reply from another thread[1]:
>>>
>>> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
>>> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>>>
>>> Looks like something is racy, since it is reproducible reliably.
>>>
>>> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
>>
>> VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
>> the validity of the expression is checked and no code is generated.
>> So that put_page_testzero() becomes a NOP.
>
> Indeed...
>
>>
>> Hi Ron,
>>
>> Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?
>>
>> diff --git a/mm/cma.c b/mm/cma.c
>> index 94b5da468a7d..96be62eb3713 100644
>> --- a/mm/cma.c
>> +++ b/mm/cma.c
>> @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
>> return false;
>>
>> pfn = page_to_pfn(pages);
>> - for (i = 0; i < count; i++, pfn++)
>> - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>> + for (i = 0; i < count; i++, pfn++) {
>> + int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
>> +
>> + VM_WARN_ON(!ret);
>> + }
>
> Maybe we only warn once by adding back the original check?
>
> diff --git a/mm/cma.c b/mm/cma.c
> index 94b5da468a7d..a73a22d34232 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -1014,14 +1014,17 @@ bool cma_release(struct cma *cma, const struct page *pages,
> {
> struct cma_memrange *cmr;
> unsigned long i, pfn;
> + int ret = 0;
>
> cmr = find_cma_memrange(cma, pages, count);
> if (!cmr)
> return false;
>
> pfn = page_to_pfn(pages);
> - for (i = 0; i < count; i++, pfn++)
> - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
> + for (i = 0; i < count; i++, pfn++) {
> + ret + = put_page_testzero(pfn_to_page(pfn));
> +
> + WARN(ret != 0, "%lu pages are still in use!\n", ret);
>
> __cma_release_frozen(cma, cmr, pages, count);
Sounds like a better solution. Let me use this as v2 fix.
Thanks.
>
>
>
>>
>> __cma_release_frozen(cma, cmr, pages, count);
>>
>>
>>
>> Best Regards,
>> Yan, Zi
>>
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-02-25 2:15 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24 8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos
2026-02-24 11:00 ` David Hildenbrand (Arm)
[not found] ` <1966378802.577797.1771952827516@app.mailbox.org>
2026-02-24 17:14 ` Zi Yan
2026-02-24 17:17 ` Zi Yan
2026-02-24 17:29 ` Zi Yan
2026-02-24 20:55 ` Ron Economos
2026-02-25 1:58 ` Kefeng Wang
2026-02-25 2:15 ` Zi Yan
2026-02-24 17:21 ` Mark Brown
2026-02-24 12:58 ` Kefeng Wang
2026-02-24 13:25 ` Ron Economos
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox