* mm: Regression with v7.0-rc1 on RISC-V
@ 2026-02-24 8:37 Ron Economos
2026-02-24 11:00 ` David Hildenbrand (Arm)
2026-02-24 12:58 ` Kefeng Wang
0 siblings, 2 replies; 11+ messages in thread
From: Ron Economos @ 2026-02-24 8:37 UTC (permalink / raw)
To: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv
Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song,
osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
Mark Brown, akpm, pjw
I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
[ OK ] Reached target shutdown.target - System Shutdown.
[ OK ] Reached target final.target - Late Shutdown Services.
[ OK ] Finished systemd-reboot.service - System Reboot.
[ OK ] Reached target reboot.target - System Reboot.
[ 173.985249] BUG: Bad page state in process shutdown pfn:f8850
[ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xf8850
[ 173.985336] flags: 0xffff80000000000(node=0|zone=0|lastcpupid=0x1ffff) CMA
[ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 0000000000000000
[ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 173.985403] page dumped because: nonzero _refcount
[ 173.985418] Modules linked in: qrtr cfg80211 da9063_onkey binfmt_misc at24 lm90 nls_iso8859_1 pwm_fan sch_fq_codel dm_multipath
nvme_fabrics efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc nfnetlink autofs4 btrfs libblake2b raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear mmc_spi crc7 of_mmc_spi rtc_da9063
crc_itu_t da9063_regulator mscc nvme macsec phy_package nvme_core nvme_keyring nvme_auth hkdf macb phylink spi_sifive i2c_ocores
[ 173.985802] CPU: 3 UID: 0 PID: 1 Comm: shutdown Not tainted 7.0.0-rc1 #2 PREEMPT(full)
[ 173.985814] Hardware name: SiFive HiFive Unmatched A00 (DT)
[ 173.985820] Call Trace:
[ 173.985827] [<ffffffff8001bfd4>] dump_backtrace+0x1c/0x30
[ 173.985850] [<ffffffff80002522>] show_stack+0x2a/0x3c
[ 173.985861] [<ffffffff8001395e>] dump_stack_lvl+0x46/0x6c
[ 173.985877] [<ffffffff80013998>] dump_stack+0x14/0x1e
[ 173.985886] [<ffffffff80317fe4>] bad_page+0x12c/0x170
[ 173.985905] [<ffffffff8031bb14>] __free_frozen_pages+0x634/0x830
[ 173.985919] [<ffffffff8031bd88>] __free_contig_frozen_range+0x78/0xe8
[ 173.985929] [<ffffffff8031bec0>] free_contig_frozen_range+0xc8/0x1e8
[ 173.985940] [<ffffffff803a15a0>] __cma_release_frozen+0x60/0x1c8
[ 173.985959] [<ffffffff803a27c6>] cma_release+0x26/0x40
[ 173.985969] [<ffffffff800fcf96>] dma_free_contiguous+0xde/0x178
[ 173.985981] [<ffffffff800fbba4>] dma_direct_free+0x104/0x198
[ 173.985990] [<ffffffff800fa19e>] dma_free_attrs+0x66/0x1e8
[ 173.986002] [<ffffffff028780ec>] macb_free_consistent+0xbc/0x160 [macb]
[ 173.986132] [<ffffffff02878274>] macb_close+0xe4/0x120 [macb]
[ 173.986169] [<ffffffff80cae9fe>] __dev_close_many+0xa6/0x1d0
[ 173.986184] [<ffffffff80caeb94>] netif_close_many+0x6c/0x118
[ 173.986193] [<ffffffff80caec98>] netif_close+0x58/0x70
[ 173.986201] [<ffffffff80cbab84>] dev_close+0x6c/0x98
[ 173.986216] [<ffffffff02874518>] macb_shutdown+0x48/0x68 [macb]
[ 173.986254] [<ffffffff809bc9c6>] platform_shutdown+0x16/0x30
[ 173.986277] [<ffffffff809b7126>] device_shutdown+0x11e/0x208
[ 173.986287] [<ffffffff80070f26>] kernel_restart+0x36/0x98
[ 173.986304] [<ffffffff80071276>] __do_sys_reboot+0x14e/0x230
[ 173.986314] [<ffffffff8007136e>] __riscv_sys_reboot+0x16/0x28
[ 173.986324] [<ffffffff80f26d6a>] do_trap_ecall_u+0x1aa/0x578
[ 173.986339] [<ffffffff80f37c58>] handle_exception+0x168/0x174
[ 173.986359] Disabling lock debugging due to kernel taint
I've bisected it to commit "mm: cma: add cma_alloc_frozen{_compound}()" 9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f
Here's the bisect log.
git bisect start
# status: waiting for both good and bad commits
# bad: [cee73b1e840c154f64ace682cb477c1ae2e29cc4] Merge tag 'riscv-for-linus-7.0-mw1' of
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
git bisect bad cee73b1e840c154f64ace682cb477c1ae2e29cc4
# status: waiting for good commit(s), bad commit known
# good: [8383522821c6fea6bbb4bc0317056c433a482a95] Merge branch 'net-mscc-ocelot-fix-missing-lock-in-ocelot_port_xmit'
git bisect good 8383522821c6fea6bbb4bc0317056c433a482a95
# good: [37a93dd5c49b5fda807fd204edf2547c3493319c] Merge tag 'net-next-7.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect good 37a93dd5c49b5fda807fd204edf2547c3493319c
# bad: [4cff5c05e076d2ee4e34122aa956b84a2eaac587] Merge tag 'mm-stable-2026-02-11-19-22' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 4cff5c05e076d2ee4e34122aa956b84a2eaac587
# bad: [fde8353121aa304ee88542f011dd5dc83ced47e4] selftests/mm: report SKIP in pfnmap if a check fails
git bisect bad fde8353121aa304ee88542f011dd5dc83ced47e4
# good: [743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6] Revert "mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc"
git bisect good 743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6
# bad: [2f5e576598c915db18b7ccd0003be52458959ce7] powerpc/mm: implement *_user_accessible_page() for ptes
git bisect bad 2f5e576598c915db18b7ccd0003be52458959ce7
# good: [6c08cc64d194dc5cc3dfc785517098d3b161c05f] mm: cma: kill cma_pages_valid()
git bisect good 6c08cc64d194dc5cc3dfc785517098d3b161c05f
# bad: [4bdd692291275eaaabe993e1c4a7b5b01cd6dc37] mm/damon/lru_sort: add monitoring intervals auto-tuning parameter
git bisect bad 4bdd692291275eaaabe993e1c4a7b5b01cd6dc37
# bad: [fbec8a1e4fa4daf2611c9a3e3b29d03a73acbd0c] mm/damon/sysfs-schemes: support DAMOS_QUOTA_[IN]ACTIVE_MEM_BP
git bisect bad fbec8a1e4fa4daf2611c9a3e3b29d03a73acbd0c
# bad: [14f270761d3374db24c84630f2aa7a3c732fed4a] mm: hugetlb: allocate frozen pages for gigantic allocation
git bisect bad 14f270761d3374db24c84630f2aa7a3c732fed4a
# bad: [9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f] mm: cma: add cma_alloc_frozen{_compound}()
git bisect bad 9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f
# good: [e0c1326779cc1b8e3a9e30ae273b89202ed4c82c] mm: page_alloc: add alloc_contig_frozen_{range,pages}()
git bisect good e0c1326779cc1b8e3a9e30ae273b89202ed4c82c
# first bad commit: [9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f] mm: cma: add cma_alloc_frozen{_compound}()
It doesn't revert cleanly, so I couldn't try that.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-24 8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos @ 2026-02-24 11:00 ` David Hildenbrand (Arm) [not found] ` <1966378802.577797.1771952827516@app.mailbox.org> 2026-02-24 12:58 ` Kefeng Wang 1 sibling, 1 reply; 11+ messages in thread From: David Hildenbrand (Arm) @ 2026-02-24 11:00 UTC (permalink / raw) To: Ron Economos, wangkefeng.wang, linux-mm, linux-kernel, linux-riscv Cc: ziy, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 2/24/26 09:37, Ron Economos wrote: > I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V. > > [ OK ] Reached target shutdown.target - System Shutdown. > [ OK ] Reached target final.target - Late Shutdown Services. > [ OK ] Finished systemd-reboot.service - System Reboot. > [ OK ] Reached target reboot.target - System Reboot. > [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850 > [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 > index:0x0 pfn:0xf8850 > [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| > lastcpupid=0x1ffff) CMA > [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 > 0000000000000000 > [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff > 0000000000000000 > [ 173.985403] page dumped because: nonzero _refcount So, we're freeing something from CMA in cma_release(). In cma_release() we iterate all pages to decrement their refcount VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); I would expect that this would fire already if there is still a page referenced. Are you running with CONFIG_DEBUG_VM=y ? -- Cheers, David ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <1966378802.577797.1771952827516@app.mailbox.org>]
* Re: mm: Regression with v7.0-rc1 on RISC-V [not found] ` <1966378802.577797.1771952827516@app.mailbox.org> @ 2026-02-24 17:14 ` Zi Yan 2026-02-24 17:17 ` Zi Yan 2026-02-24 17:21 ` Mark Brown 1 sibling, 1 reply; 11+ messages in thread From: Zi Yan @ 2026-02-24 17:14 UTC (permalink / raw) To: David Hildenbrand Cc: David Hildenbrand (Arm), Ron Economos, wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 24 Feb 2026, at 12:07, David Hildenbrand wrote: >> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben: >> >> >> >> >> >> On 2/24/26 09:37, Ron Economos wrote: >> >>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V. >>> >>> >>> >>> [ OK ] Reached target shutdown.target - System Shutdown. >>> >>> [ OK ] Reached target final.target - Late Shutdown Services. >>> >>> [ OK ] Finished systemd-reboot.service - System Reboot. >>> >>> [ OK ] Reached target reboot.target - System Reboot. >>> >>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850 >>> >>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 >>> >>> index:0x0 pfn:0xf8850 >>> >>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| >>> >>> lastcpupid=0x1ffff) CMA >>> >>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 >>> >>> 0000000000000000 >>> >>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff >>> >>> 0000000000000000 >>> >>> [ 173.985403] page dumped because: nonzero _refcount >> >> So, we're freeing something from CMA in cma_release(). >> >> >> >> In cma_release() we iterate all pages to decrement their refcount >> >> >> >> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); >> >> >> >> I would expect that this would fire already if there is still a page >> >> referenced. >> >> >> >> Are you running with CONFIG_DEBUG_VM=y ? >> >> >> >> >> >> -- >> >> Cheers, >> >> >> >> David > > Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM? > > > > At least that’s what I remember. Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and is_check_pages_enabled(), which leads to free_page_is_bad()’s “page dumped because: nonzero _refcount”, are disabled. It seems to me that someone else bump the page refcount between VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad(). Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-24 17:14 ` Zi Yan @ 2026-02-24 17:17 ` Zi Yan 2026-02-24 17:29 ` Zi Yan 0 siblings, 1 reply; 11+ messages in thread From: Zi Yan @ 2026-02-24 17:17 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Ron Economos, wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 24 Feb 2026, at 12:14, Zi Yan wrote: > On 24 Feb 2026, at 12:07, David Hildenbrand wrote: > >>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben: >>> >>> >>> >>> >>> >>> On 2/24/26 09:37, Ron Economos wrote: >>> >>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V. >>>> >>>> >>>> >>>> [ OK ] Reached target shutdown.target - System Shutdown. >>>> >>>> [ OK ] Reached target final.target - Late Shutdown Services. >>>> >>>> [ OK ] Finished systemd-reboot.service - System Reboot. >>>> >>>> [ OK ] Reached target reboot.target - System Reboot. >>>> >>>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850 >>>> >>>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 >>>> >>>> index:0x0 pfn:0xf8850 >>>> >>>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| >>>> >>>> lastcpupid=0x1ffff) CMA >>>> >>>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 >>>> >>>> 0000000000000000 >>>> >>>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff >>>> >>>> 0000000000000000 >>>> >>>> [ 173.985403] page dumped because: nonzero _refcount >>> >>> So, we're freeing something from CMA in cma_release(). >>> >>> >>> >>> In cma_release() we iterate all pages to decrement their refcount >>> >>> >>> >>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); >>> >>> >>> >>> I would expect that this would fire already if there is still a page >>> >>> referenced. >>> >>> >>> >>> Are you running with CONFIG_DEBUG_VM=y ? >>> >>> >>> >>> >>> >>> -- >>> >>> Cheers, >>> >>> >>> >>> David >> >> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM? >> >> >> >> At least that’s what I remember. > > Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) > and is_check_pages_enabled(), which leads to free_page_is_bad()’s > “page dumped because: nonzero _refcount”, are disabled. > > It seems to me that someone else bump the page refcount between > VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad(). > Merging Ron’s reply from another thread[1]: “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.” Looks like something is racy, since it is reproducible reliably. [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/ Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-24 17:17 ` Zi Yan @ 2026-02-24 17:29 ` Zi Yan 2026-02-24 20:55 ` Ron Economos 2026-02-25 1:58 ` Kefeng Wang 0 siblings, 2 replies; 11+ messages in thread From: Zi Yan @ 2026-02-24 17:29 UTC (permalink / raw) To: Ron Economos, David Hildenbrand (Arm) Cc: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 24 Feb 2026, at 12:17, Zi Yan wrote: > On 24 Feb 2026, at 12:14, Zi Yan wrote: > >> On 24 Feb 2026, at 12:07, David Hildenbrand wrote: >> >>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben: >>>> >>>> >>>> >>>> >>>> >>>> On 2/24/26 09:37, Ron Economos wrote: >>>> >>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V. >>>>> >>>>> >>>>> >>>>> [ OK ] Reached target shutdown.target - System Shutdown. >>>>> >>>>> [ OK ] Reached target final.target - Late Shutdown Services. >>>>> >>>>> [ OK ] Finished systemd-reboot.service - System Reboot. >>>>> >>>>> [ OK ] Reached target reboot.target - System Reboot. >>>>> >>>>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850 >>>>> >>>>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 >>>>> >>>>> index:0x0 pfn:0xf8850 >>>>> >>>>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| >>>>> >>>>> lastcpupid=0x1ffff) CMA >>>>> >>>>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 >>>>> >>>>> 0000000000000000 >>>>> >>>>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff >>>>> >>>>> 0000000000000000 >>>>> >>>>> [ 173.985403] page dumped because: nonzero _refcount >>>> >>>> So, we're freeing something from CMA in cma_release(). >>>> >>>> >>>> >>>> In cma_release() we iterate all pages to decrement their refcount >>>> >>>> >>>> >>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); >>>> >>>> >>>> >>>> I would expect that this would fire already if there is still a page >>>> >>>> referenced. >>>> >>>> >>>> >>>> Are you running with CONFIG_DEBUG_VM=y ? >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Cheers, >>>> >>>> >>>> >>>> David >>> >>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM? >>> >>> >>> >>> At least that’s what I remember. >> >> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) >> and is_check_pages_enabled(), which leads to free_page_is_bad()’s >> “page dumped because: nonzero _refcount”, are disabled. >> >> It seems to me that someone else bump the page refcount between >> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad(). >> > > Merging Ron’s reply from another thread[1]: > > “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and > the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.” > > Looks like something is racy, since it is reproducible reliably. > > [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/ VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only the validity of the expression is checked and no code is generated. So that put_page_testzero() becomes a NOP. Hi Ron, Can you check if the patch below fix the issue without CONFIG_DEBUG_VM? diff --git a/mm/cma.c b/mm/cma.c index 94b5da468a7d..96be62eb3713 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages, return false; pfn = page_to_pfn(pages); - for (i = 0; i < count; i++, pfn++) - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); + for (i = 0; i < count; i++, pfn++) { + int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn)); + + VM_WARN_ON(!ret); + } __cma_release_frozen(cma, cmr, pages, count); Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-24 17:29 ` Zi Yan @ 2026-02-24 20:55 ` Ron Economos 2026-02-25 1:58 ` Kefeng Wang 1 sibling, 0 replies; 11+ messages in thread From: Ron Economos @ 2026-02-24 20:55 UTC (permalink / raw) To: Zi Yan, David Hildenbrand (Arm) Cc: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 2/24/26 09:29, Zi Yan wrote: > On 24 Feb 2026, at 12:17, Zi Yan wrote: > >> On 24 Feb 2026, at 12:14, Zi Yan wrote: >> >>> On 24 Feb 2026, at 12:07, David Hildenbrand wrote: >>> >>>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 2/24/26 09:37, Ron Economos wrote: >>>>> >>>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V. >>>>>> >>>>>> >>>>>> >>>>>> [ OK ] Reached target shutdown.target - System Shutdown. >>>>>> >>>>>> [ OK ] Reached target final.target - Late Shutdown Services. >>>>>> >>>>>> [ OK ] Finished systemd-reboot.service - System Reboot. >>>>>> >>>>>> [ OK ] Reached target reboot.target - System Reboot. >>>>>> >>>>>> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850 >>>>>> >>>>>> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 >>>>>> >>>>>> index:0x0 pfn:0xf8850 >>>>>> >>>>>> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| >>>>>> >>>>>> lastcpupid=0x1ffff) CMA >>>>>> >>>>>> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 >>>>>> >>>>>> 0000000000000000 >>>>>> >>>>>> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff >>>>>> >>>>>> 0000000000000000 >>>>>> >>>>>> [ 173.985403] page dumped because: nonzero _refcount >>>>> So, we're freeing something from CMA in cma_release(). >>>>> >>>>> >>>>> >>>>> In cma_release() we iterate all pages to decrement their refcount >>>>> >>>>> >>>>> >>>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); >>>>> >>>>> >>>>> >>>>> I would expect that this would fire already if there is still a page >>>>> >>>>> referenced. >>>>> >>>>> >>>>> >>>>> Are you running with CONFIG_DEBUG_VM=y ? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> >>>>> David >>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM? >>>> >>>> >>>> >>>> At least that’s what I remember. >>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) >>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s >>> “page dumped because: nonzero _refcount”, are disabled. >>> >>> It seems to me that someone else bump the page refcount between >>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad(). >>> >> Merging Ron’s reply from another thread[1]: >> >> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and >> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.” >> >> Looks like something is racy, since it is reproducible reliably. >> >> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/ > VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only > the validity of the expression is checked and no code is generated. > So that put_page_testzero() becomes a NOP. > > Hi Ron, > > Can you check if the patch below fix the issue without CONFIG_DEBUG_VM? > > diff --git a/mm/cma.c b/mm/cma.c > index 94b5da468a7d..96be62eb3713 100644 > --- a/mm/cma.c > +++ b/mm/cma.c > @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages, > return false; > > pfn = page_to_pfn(pages); > - for (i = 0; i < count; i++, pfn++) > - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); > + for (i = 0; i < count; i++, pfn++) { > + int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn)); > + > + VM_WARN_ON(!ret); > + } > > __cma_release_frozen(cma, cmr, pages, count); > > > > Best Regards, > Yan, Zi Yes, that patch fixes the issue. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-24 17:29 ` Zi Yan 2026-02-24 20:55 ` Ron Economos @ 2026-02-25 1:58 ` Kefeng Wang 2026-02-25 2:15 ` Zi Yan 1 sibling, 1 reply; 11+ messages in thread From: Kefeng Wang @ 2026-02-25 1:58 UTC (permalink / raw) To: Zi Yan, Ron Economos, David Hildenbrand (Arm) Cc: linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 2026/2/25 1:29, Zi Yan wrote: > On 24 Feb 2026, at 12:17, Zi Yan wrote: ... >>>> >>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM? >>>> >>>> >>>> >>>> At least that’s what I remember. >>> >>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) >>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s >>> “page dumped because: nonzero _refcount”, are disabled. >>> >>> It seems to me that someone else bump the page refcount between >>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad(). >>> >> >> Merging Ron’s reply from another thread[1]: >> >> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and >> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.” >> >> Looks like something is racy, since it is reproducible reliably. >> >> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/ > > VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only > the validity of the expression is checked and no code is generated. > So that put_page_testzero() becomes a NOP. Indeed... > > Hi Ron, > > Can you check if the patch below fix the issue without CONFIG_DEBUG_VM? > > diff --git a/mm/cma.c b/mm/cma.c > index 94b5da468a7d..96be62eb3713 100644 > --- a/mm/cma.c > +++ b/mm/cma.c > @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages, > return false; > > pfn = page_to_pfn(pages); > - for (i = 0; i < count; i++, pfn++) > - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); > + for (i = 0; i < count; i++, pfn++) { > + int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn)); > + > + VM_WARN_ON(!ret); > + } Maybe we only warn once by adding back the original check? diff --git a/mm/cma.c b/mm/cma.c index 94b5da468a7d..a73a22d34232 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -1014,14 +1014,17 @@ bool cma_release(struct cma *cma, const struct page *pages, { struct cma_memrange *cmr; unsigned long i, pfn; + int ret = 0; cmr = find_cma_memrange(cma, pages, count); if (!cmr) return false; pfn = page_to_pfn(pages); - for (i = 0; i < count; i++, pfn++) - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); + for (i = 0; i < count; i++, pfn++) { + ret + = put_page_testzero(pfn_to_page(pfn)); + + WARN(ret != 0, "%lu pages are still in use!\n", ret); __cma_release_frozen(cma, cmr, pages, count); > > __cma_release_frozen(cma, cmr, pages, count); > > > > Best Regards, > Yan, Zi > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-25 1:58 ` Kefeng Wang @ 2026-02-25 2:15 ` Zi Yan 0 siblings, 0 replies; 11+ messages in thread From: Zi Yan @ 2026-02-25 2:15 UTC (permalink / raw) To: Kefeng Wang Cc: Ron Economos, David Hildenbrand (Arm), linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 24 Feb 2026, at 20:58, Kefeng Wang wrote: > On 2026/2/25 1:29, Zi Yan wrote: >> On 24 Feb 2026, at 12:17, Zi Yan wrote: > > ... > >>>>> >>>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM? >>>>> >>>>> >>>>> >>>>> At least that’s what I remember. >>>> >>>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) >>>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s >>>> “page dumped because: nonzero _refcount”, are disabled. >>>> >>>> It seems to me that someone else bump the page refcount between >>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad(). >>>> >>> >>> Merging Ron’s reply from another thread[1]: >>> >>> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and >>> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.” >>> >>> Looks like something is racy, since it is reproducible reliably. >>> >>> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/ >> >> VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only >> the validity of the expression is checked and no code is generated. >> So that put_page_testzero() becomes a NOP. > > Indeed... > >> >> Hi Ron, >> >> Can you check if the patch below fix the issue without CONFIG_DEBUG_VM? >> >> diff --git a/mm/cma.c b/mm/cma.c >> index 94b5da468a7d..96be62eb3713 100644 >> --- a/mm/cma.c >> +++ b/mm/cma.c >> @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages, >> return false; >> >> pfn = page_to_pfn(pages); >> - for (i = 0; i < count; i++, pfn++) >> - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); >> + for (i = 0; i < count; i++, pfn++) { >> + int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn)); >> + >> + VM_WARN_ON(!ret); >> + } > > Maybe we only warn once by adding back the original check? > > diff --git a/mm/cma.c b/mm/cma.c > index 94b5da468a7d..a73a22d34232 100644 > --- a/mm/cma.c > +++ b/mm/cma.c > @@ -1014,14 +1014,17 @@ bool cma_release(struct cma *cma, const struct page *pages, > { > struct cma_memrange *cmr; > unsigned long i, pfn; > + int ret = 0; > > cmr = find_cma_memrange(cma, pages, count); > if (!cmr) > return false; > > pfn = page_to_pfn(pages); > - for (i = 0; i < count; i++, pfn++) > - VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))); > + for (i = 0; i < count; i++, pfn++) { > + ret + = put_page_testzero(pfn_to_page(pfn)); > + > + WARN(ret != 0, "%lu pages are still in use!\n", ret); > > __cma_release_frozen(cma, cmr, pages, count); Sounds like a better solution. Let me use this as v2 fix. Thanks. > > > >> >> __cma_release_frozen(cma, cmr, pages, count); >> >> >> >> Best Regards, >> Yan, Zi >> Best Regards, Yan, Zi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V [not found] ` <1966378802.577797.1771952827516@app.mailbox.org> 2026-02-24 17:14 ` Zi Yan @ 2026-02-24 17:21 ` Mark Brown 1 sibling, 0 replies; 11+ messages in thread From: Mark Brown @ 2026-02-24 17:21 UTC (permalink / raw) To: David Hildenbrand Cc: David Hildenbrand (Arm), Ron Economos, wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, ziy, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, akpm, pjw [-- Attachment #1: Type: text/plain, Size: 369 bytes --] On Tue, Feb 24, 2026 at 06:07:07PM +0100, David Hildenbrand wrote: > <div class="default-style">Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?</div> Pinged David off list about the mail being HTML - this was the new comment for anyone else who's mail client just showed them the raw HTML... [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-24 8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos 2026-02-24 11:00 ` David Hildenbrand (Arm) @ 2026-02-24 12:58 ` Kefeng Wang 2026-02-24 13:25 ` Ron Economos 1 sibling, 1 reply; 11+ messages in thread From: Kefeng Wang @ 2026-02-24 12:58 UTC (permalink / raw) To: Ron Economos, linux-mm, linux-kernel, linux-riscv Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 2026/2/24 16:37, Ron Economos wrote: > I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V. > > [ OK ] Reached target shutdown.target - System Shutdown. > [ OK ] Reached target final.target - Late Shutdown Services. > [ OK ] Finished systemd-reboot.service - System Reboot. > [ OK ] Reached target reboot.target - System Reboot. > [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850 > [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 > index:0x0 pfn:0xf8850 > [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| > lastcpupid=0x1ffff) CMA > [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 > 0000000000000000 > [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff > 0000000000000000 > [ 173.985403] page dumped because: nonzero _refcount The refcount set to 1 when cma_alloc() by set_page_refcounted(), and it will be dec to 0 in cma_release() by put_page_testzero(), there may be a problem somewhere else? Could you enable CONFIG_DEBUG_VM and CONFIG_DEBUG_PAGE_REF, and try to track down page reference manipulation by tracepoint? or for CMA-related pages, introduce explicit printk both increments and decrements of the page reference count to identify the root cause of the issue. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: mm: Regression with v7.0-rc1 on RISC-V 2026-02-24 12:58 ` Kefeng Wang @ 2026-02-24 13:25 ` Ron Economos 0 siblings, 0 replies; 11+ messages in thread From: Ron Economos @ 2026-02-24 13:25 UTC (permalink / raw) To: Kefeng Wang, linux-mm, linux-kernel, linux-riscv Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw On 2/24/26 04:58, Kefeng Wang wrote: > > > On 2026/2/24 16:37, Ron Economos wrote: >> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V. >> >> [ OK ] Reached target shutdown.target - System Shutdown. >> [ OK ] Reached target final.target - Late Shutdown Services. >> [ OK ] Finished systemd-reboot.service - System Reboot. >> [ OK ] Reached target reboot.target - System Reboot. >> [ 173.985249] BUG: Bad page state in process shutdown pfn:f8850 >> [ 173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xf8850 >> [ 173.985336] flags: 0xffff80000000000(node=0|zone=0| lastcpupid=0x1ffff) CMA >> [ 173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 0000000000000000 >> [ 173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 >> [ 173.985403] page dumped because: nonzero _refcount > > The refcount set to 1 when cma_alloc() by set_page_refcounted(), and it > will be dec to 0 in cma_release() by put_page_testzero(), there may be a problem somewhere else? > > Could you enable CONFIG_DEBUG_VM and CONFIG_DEBUG_PAGE_REF, and try to track down page reference manipulation by tracepoint? or > for CMA-related > pages, introduce explicit printk both increments and decrements of the page reference count to identify the root cause of the issue. > Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and the issue went away. Let me try CONFIG_DEBUG_PAGE_REF. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-02-25 2:15 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24 8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos
2026-02-24 11:00 ` David Hildenbrand (Arm)
[not found] ` <1966378802.577797.1771952827516@app.mailbox.org>
2026-02-24 17:14 ` Zi Yan
2026-02-24 17:17 ` Zi Yan
2026-02-24 17:29 ` Zi Yan
2026-02-24 20:55 ` Ron Economos
2026-02-25 1:58 ` Kefeng Wang
2026-02-25 2:15 ` Zi Yan
2026-02-24 17:21 ` Mark Brown
2026-02-24 12:58 ` Kefeng Wang
2026-02-24 13:25 ` Ron Economos
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox