linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* mm: Regression with v7.0-rc1 on RISC-V
@ 2026-02-24  8:37 Ron Economos
  2026-02-24 11:00 ` David Hildenbrand (Arm)
  2026-02-24 12:58 ` Kefeng Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Ron Economos @ 2026-02-24  8:37 UTC (permalink / raw)
  To: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv
  Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song,
	osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
	Mark Brown, akpm, pjw

I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.

[  OK  ] Reached target shutdown.target - System Shutdown.
[  OK  ] Reached target final.target - Late Shutdown Services.
[  OK  ] Finished systemd-reboot.service - System Reboot.
[  OK  ] Reached target reboot.target - System Reboot.
[  173.985249] BUG: Bad page state in process shutdown  pfn:f8850
[  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xf8850
[  173.985336] flags: 0xffff80000000000(node=0|zone=0|lastcpupid=0x1ffff) CMA
[  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 0000000000000000
[  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[  173.985403] page dumped because: nonzero _refcount
[  173.985418] Modules linked in: qrtr cfg80211 da9063_onkey binfmt_misc at24 lm90 nls_iso8859_1 pwm_fan sch_fq_codel dm_multipath 
nvme_fabrics efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc nfnetlink autofs4 btrfs libblake2b raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear mmc_spi crc7 of_mmc_spi rtc_da9063 
crc_itu_t da9063_regulator mscc nvme macsec phy_package nvme_core nvme_keyring nvme_auth hkdf macb phylink spi_sifive i2c_ocores
[  173.985802] CPU: 3 UID: 0 PID: 1 Comm: shutdown Not tainted 7.0.0-rc1 #2 PREEMPT(full)
[  173.985814] Hardware name: SiFive HiFive Unmatched A00 (DT)
[  173.985820] Call Trace:
[  173.985827] [<ffffffff8001bfd4>] dump_backtrace+0x1c/0x30
[  173.985850] [<ffffffff80002522>] show_stack+0x2a/0x3c
[  173.985861] [<ffffffff8001395e>] dump_stack_lvl+0x46/0x6c
[  173.985877] [<ffffffff80013998>] dump_stack+0x14/0x1e
[  173.985886] [<ffffffff80317fe4>] bad_page+0x12c/0x170
[  173.985905] [<ffffffff8031bb14>] __free_frozen_pages+0x634/0x830
[  173.985919] [<ffffffff8031bd88>] __free_contig_frozen_range+0x78/0xe8
[  173.985929] [<ffffffff8031bec0>] free_contig_frozen_range+0xc8/0x1e8
[  173.985940] [<ffffffff803a15a0>] __cma_release_frozen+0x60/0x1c8
[  173.985959] [<ffffffff803a27c6>] cma_release+0x26/0x40
[  173.985969] [<ffffffff800fcf96>] dma_free_contiguous+0xde/0x178
[  173.985981] [<ffffffff800fbba4>] dma_direct_free+0x104/0x198
[  173.985990] [<ffffffff800fa19e>] dma_free_attrs+0x66/0x1e8
[  173.986002] [<ffffffff028780ec>] macb_free_consistent+0xbc/0x160 [macb]
[  173.986132] [<ffffffff02878274>] macb_close+0xe4/0x120 [macb]
[  173.986169] [<ffffffff80cae9fe>] __dev_close_many+0xa6/0x1d0
[  173.986184] [<ffffffff80caeb94>] netif_close_many+0x6c/0x118
[  173.986193] [<ffffffff80caec98>] netif_close+0x58/0x70
[  173.986201] [<ffffffff80cbab84>] dev_close+0x6c/0x98
[  173.986216] [<ffffffff02874518>] macb_shutdown+0x48/0x68 [macb]
[  173.986254] [<ffffffff809bc9c6>] platform_shutdown+0x16/0x30
[  173.986277] [<ffffffff809b7126>] device_shutdown+0x11e/0x208
[  173.986287] [<ffffffff80070f26>] kernel_restart+0x36/0x98
[  173.986304] [<ffffffff80071276>] __do_sys_reboot+0x14e/0x230
[  173.986314] [<ffffffff8007136e>] __riscv_sys_reboot+0x16/0x28
[  173.986324] [<ffffffff80f26d6a>] do_trap_ecall_u+0x1aa/0x578
[  173.986339] [<ffffffff80f37c58>] handle_exception+0x168/0x174
[  173.986359] Disabling lock debugging due to kernel taint

I've bisected it to commit "mm: cma: add cma_alloc_frozen{_compound}()" 9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f

Here's the bisect log.

git bisect start
# status: waiting for both good and bad commits
# bad: [cee73b1e840c154f64ace682cb477c1ae2e29cc4] Merge tag 'riscv-for-linus-7.0-mw1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
git bisect bad cee73b1e840c154f64ace682cb477c1ae2e29cc4
# status: waiting for good commit(s), bad commit known
# good: [8383522821c6fea6bbb4bc0317056c433a482a95] Merge branch 'net-mscc-ocelot-fix-missing-lock-in-ocelot_port_xmit'
git bisect good 8383522821c6fea6bbb4bc0317056c433a482a95
# good: [37a93dd5c49b5fda807fd204edf2547c3493319c] Merge tag 'net-next-7.0' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect good 37a93dd5c49b5fda807fd204edf2547c3493319c
# bad: [4cff5c05e076d2ee4e34122aa956b84a2eaac587] Merge tag 'mm-stable-2026-02-11-19-22' of 
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad 4cff5c05e076d2ee4e34122aa956b84a2eaac587
# bad: [fde8353121aa304ee88542f011dd5dc83ced47e4] selftests/mm: report SKIP in pfnmap if a check fails
git bisect bad fde8353121aa304ee88542f011dd5dc83ced47e4
# good: [743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6] Revert "mm/hugetlb: deal with multiple calls to hugetlb_bootmem_alloc"
git bisect good 743758ccf8bede3e7c38f3f7d3f5131aa0a7b4a6
# bad: [2f5e576598c915db18b7ccd0003be52458959ce7] powerpc/mm: implement *_user_accessible_page() for ptes
git bisect bad 2f5e576598c915db18b7ccd0003be52458959ce7
# good: [6c08cc64d194dc5cc3dfc785517098d3b161c05f] mm: cma: kill cma_pages_valid()
git bisect good 6c08cc64d194dc5cc3dfc785517098d3b161c05f
# bad: [4bdd692291275eaaabe993e1c4a7b5b01cd6dc37] mm/damon/lru_sort: add monitoring intervals auto-tuning parameter
git bisect bad 4bdd692291275eaaabe993e1c4a7b5b01cd6dc37
# bad: [fbec8a1e4fa4daf2611c9a3e3b29d03a73acbd0c] mm/damon/sysfs-schemes: support DAMOS_QUOTA_[IN]ACTIVE_MEM_BP
git bisect bad fbec8a1e4fa4daf2611c9a3e3b29d03a73acbd0c
# bad: [14f270761d3374db24c84630f2aa7a3c732fed4a] mm: hugetlb: allocate frozen pages for gigantic allocation
git bisect bad 14f270761d3374db24c84630f2aa7a3c732fed4a
# bad: [9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f] mm: cma: add cma_alloc_frozen{_compound}()
git bisect bad 9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f
# good: [e0c1326779cc1b8e3a9e30ae273b89202ed4c82c] mm: page_alloc: add alloc_contig_frozen_{range,pages}()
git bisect good e0c1326779cc1b8e3a9e30ae273b89202ed4c82c
# first bad commit: [9bda131c6093e9c4a8739e2eeb65ba4d5fbefc2f] mm: cma: add cma_alloc_frozen{_compound}()

It doesn't revert cleanly, so I couldn't try that.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-24  8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos
@ 2026-02-24 11:00 ` David Hildenbrand (Arm)
       [not found]   ` <1966378802.577797.1771952827516@app.mailbox.org>
  2026-02-24 12:58 ` Kefeng Wang
  1 sibling, 1 reply; 11+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-24 11:00 UTC (permalink / raw)
  To: Ron Economos, wangkefeng.wang, linux-mm, linux-kernel, linux-riscv
  Cc: ziy, jackmanb, jane.chu, hannes, willy, muchun.song, osalvador,
	sidhartha.kumar, vbabka, claudiu.beznea.uj, Mark Brown, akpm,
	pjw

On 2/24/26 09:37, Ron Economos wrote:
> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
> 
> [  OK  ] Reached target shutdown.target - System Shutdown.
> [  OK  ] Reached target final.target - Late Shutdown Services.
> [  OK  ] Finished systemd-reboot.service - System Reboot.
> [  OK  ] Reached target reboot.target - System Reboot.
> [  173.985249] BUG: Bad page state in process shutdown  pfn:f8850
> [  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
> index:0x0 pfn:0xf8850
> [  173.985336] flags: 0xffff80000000000(node=0|zone=0|
> lastcpupid=0x1ffff) CMA
> [  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
> 0000000000000000
> [  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
> 0000000000000000
> [  173.985403] page dumped because: nonzero _refcount

So, we're freeing something from CMA in cma_release().

In cma_release() we iterate all pages to decrement their refcount

	VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));

I would expect that this would fire already if there is still a page
referenced.
	
Are you running with CONFIG_DEBUG_VM=y ?


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-24  8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos
  2026-02-24 11:00 ` David Hildenbrand (Arm)
@ 2026-02-24 12:58 ` Kefeng Wang
  2026-02-24 13:25   ` Ron Economos
  1 sibling, 1 reply; 11+ messages in thread
From: Kefeng Wang @ 2026-02-24 12:58 UTC (permalink / raw)
  To: Ron Economos, linux-mm, linux-kernel, linux-riscv
  Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song,
	osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
	Mark Brown, akpm, pjw



On 2026/2/24 16:37, Ron Economos wrote:
> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
> 
> [  OK  ] Reached target shutdown.target - System Shutdown.
> [  OK  ] Reached target final.target - Late Shutdown Services.
> [  OK  ] Finished systemd-reboot.service - System Reboot.
> [  OK  ] Reached target reboot.target - System Reboot.
> [  173.985249] BUG: Bad page state in process shutdown  pfn:f8850
> [  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 
> index:0x0 pfn:0xf8850
> [  173.985336] flags: 0xffff80000000000(node=0|zone=0| 
> lastcpupid=0x1ffff) CMA
> [  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 
> 0000000000000000
> [  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff 
> 0000000000000000
> [  173.985403] page dumped because: nonzero _refcount


The refcount set to 1 when cma_alloc() by set_page_refcounted(), and it
will be dec to 0 in cma_release() by put_page_testzero(), there may be a 
problem somewhere else?

Could you enable CONFIG_DEBUG_VM and CONFIG_DEBUG_PAGE_REF, and try to 
track down page reference manipulation by tracepoint? or for CMA-related
pages, introduce explicit printk both increments and decrements of the 
page reference count to identify the root cause of the issue.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-24 12:58 ` Kefeng Wang
@ 2026-02-24 13:25   ` Ron Economos
  0 siblings, 0 replies; 11+ messages in thread
From: Ron Economos @ 2026-02-24 13:25 UTC (permalink / raw)
  To: Kefeng Wang, linux-mm, linux-kernel, linux-riscv
  Cc: ziy, jackmanb, david, jane.chu, hannes, willy, muchun.song,
	osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
	Mark Brown, akpm, pjw

On 2/24/26 04:58, Kefeng Wang wrote:
>
>
> On 2026/2/24 16:37, Ron Economos wrote:
>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>
>> [  OK  ] Reached target shutdown.target - System Shutdown.
>> [  OK  ] Reached target final.target - Late Shutdown Services.
>> [  OK  ] Finished systemd-reboot.service - System Reboot.
>> [  OK  ] Reached target reboot.target - System Reboot.
>> [  173.985249] BUG: Bad page state in process shutdown pfn:f8850
>> [  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xf8850
>> [  173.985336] flags: 0xffff80000000000(node=0|zone=0| lastcpupid=0x1ffff) CMA
>> [  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88 0000000000000000
>> [  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>> [  173.985403] page dumped because: nonzero _refcount
>
> The refcount set to 1 when cma_alloc() by set_page_refcounted(), and it
> will be dec to 0 in cma_release() by put_page_testzero(), there may be a problem somewhere else?
>
> Could you enable CONFIG_DEBUG_VM and CONFIG_DEBUG_PAGE_REF, and try to track down page reference manipulation by tracepoint? or 
> for CMA-related
> pages, introduce explicit printk both increments and decrements of the page reference count to identify the root cause of the issue.
>
Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
       [not found]   ` <1966378802.577797.1771952827516@app.mailbox.org>
@ 2026-02-24 17:14     ` Zi Yan
  2026-02-24 17:17       ` Zi Yan
  2026-02-24 17:21     ` Mark Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Zi Yan @ 2026-02-24 17:14 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Hildenbrand (Arm),
	Ron Economos, wangkefeng.wang, linux-mm, linux-kernel,
	linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song,
	osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
	Mark Brown, akpm, pjw

On 24 Feb 2026, at 12:07, David Hildenbrand wrote:

>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>
>>
>>
>>
>>
>> On 2/24/26 09:37, Ron Economos wrote:
>>
>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>
>>>
>>>
>>> [  OK  ] Reached target shutdown.target - System Shutdown.
>>>
>>> [  OK  ] Reached target final.target - Late Shutdown Services.
>>>
>>> [  OK  ] Finished systemd-reboot.service - System Reboot.
>>>
>>> [  OK  ] Reached target reboot.target - System Reboot.
>>>
>>> [  173.985249] BUG: Bad page state in process shutdown  pfn:f8850
>>>
>>> [  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>
>>> index:0x0 pfn:0xf8850
>>>
>>> [  173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>
>>> lastcpupid=0x1ffff) CMA
>>>
>>> [  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>
>>> 0000000000000000
>>>
>>> [  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>
>>> 0000000000000000
>>>
>>> [  173.985403] page dumped because: nonzero _refcount
>>
>> So, we're freeing something from CMA in cma_release().
>>
>>
>>
>> In cma_release() we iterate all pages to decrement their refcount
>>
>>
>>
>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>
>>
>>
>> I would expect that this would fire already if there is still a page
>>
>> referenced.
>>
>>
>>
>> Are you running with CONFIG_DEBUG_VM=y ?
>>
>>
>>
>>
>>
>> --
>>
>> Cheers,
>>
>>
>>
>> David
>
> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>
>
>
> At least that’s what I remember.

Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
and is_check_pages_enabled(), which leads to free_page_is_bad()’s
“page dumped because: nonzero _refcount”, are disabled.

It seems to me that someone else bump the page refcount between
VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-24 17:14     ` Zi Yan
@ 2026-02-24 17:17       ` Zi Yan
  2026-02-24 17:29         ` Zi Yan
  0 siblings, 1 reply; 11+ messages in thread
From: Zi Yan @ 2026-02-24 17:17 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Ron Economos, wangkefeng.wang, linux-mm, linux-kernel,
	linux-riscv, jackmanb, jane.chu, hannes, willy, muchun.song,
	osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj,
	Mark Brown, akpm, pjw

On 24 Feb 2026, at 12:14, Zi Yan wrote:

> On 24 Feb 2026, at 12:07, David Hildenbrand wrote:
>
>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>>
>>>
>>>
>>>
>>>
>>> On 2/24/26 09:37, Ron Economos wrote:
>>>
>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>>
>>>>
>>>>
>>>> [  OK  ] Reached target shutdown.target - System Shutdown.
>>>>
>>>> [  OK  ] Reached target final.target - Late Shutdown Services.
>>>>
>>>> [  OK  ] Finished systemd-reboot.service - System Reboot.
>>>>
>>>> [  OK  ] Reached target reboot.target - System Reboot.
>>>>
>>>> [  173.985249] BUG: Bad page state in process shutdown  pfn:f8850
>>>>
>>>> [  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>>
>>>> index:0x0 pfn:0xf8850
>>>>
>>>> [  173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>>
>>>> lastcpupid=0x1ffff) CMA
>>>>
>>>> [  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>>
>>>> 0000000000000000
>>>>
>>>> [  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>>
>>>> 0000000000000000
>>>>
>>>> [  173.985403] page dumped because: nonzero _refcount
>>>
>>> So, we're freeing something from CMA in cma_release().
>>>
>>>
>>>
>>> In cma_release() we iterate all pages to decrement their refcount
>>>
>>>
>>>
>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>>
>>>
>>>
>>> I would expect that this would fire already if there is still a page
>>>
>>> referenced.
>>>
>>>
>>>
>>> Are you running with CONFIG_DEBUG_VM=y ?
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Cheers,
>>>
>>>
>>>
>>> David
>>
>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>
>>
>>
>> At least that’s what I remember.
>
> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
> “page dumped because: nonzero _refcount”, are disabled.
>
> It seems to me that someone else bump the page refcount between
> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>

Merging Ron’s reply from another thread[1]:

“Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”

Looks like something is racy, since it is reproducible reliably.

[1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
       [not found]   ` <1966378802.577797.1771952827516@app.mailbox.org>
  2026-02-24 17:14     ` Zi Yan
@ 2026-02-24 17:21     ` Mark Brown
  1 sibling, 0 replies; 11+ messages in thread
From: Mark Brown @ 2026-02-24 17:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Hildenbrand (Arm),
	Ron Economos, wangkefeng.wang, linux-mm, linux-kernel,
	linux-riscv, ziy, jackmanb, jane.chu, hannes, willy, muchun.song,
	osalvador, sidhartha.kumar, vbabka, claudiu.beznea.uj, akpm, pjw

[-- Attachment #1: Type: text/plain, Size: 369 bytes --]

On Tue, Feb 24, 2026 at 06:07:07PM +0100, David Hildenbrand wrote:

>   <div class="default-style">Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?</div>

Pinged David off list about the mail being HTML - this was the new
comment for anyone else who's mail client just showed them the raw
HTML...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-24 17:17       ` Zi Yan
@ 2026-02-24 17:29         ` Zi Yan
  2026-02-24 20:55           ` Ron Economos
  2026-02-25  1:58           ` Kefeng Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Zi Yan @ 2026-02-24 17:29 UTC (permalink / raw)
  To: Ron Economos, David Hildenbrand (Arm)
  Cc: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb,
	jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar,
	vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw

On 24 Feb 2026, at 12:17, Zi Yan wrote:

> On 24 Feb 2026, at 12:14, Zi Yan wrote:
>
>> On 24 Feb 2026, at 12:07, David Hildenbrand wrote:
>>
>>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2/24/26 09:37, Ron Economos wrote:
>>>>
>>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>>>
>>>>>
>>>>>
>>>>> [  OK  ] Reached target shutdown.target - System Shutdown.
>>>>>
>>>>> [  OK  ] Reached target final.target - Late Shutdown Services.
>>>>>
>>>>> [  OK  ] Finished systemd-reboot.service - System Reboot.
>>>>>
>>>>> [  OK  ] Reached target reboot.target - System Reboot.
>>>>>
>>>>> [  173.985249] BUG: Bad page state in process shutdown  pfn:f8850
>>>>>
>>>>> [  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>>>
>>>>> index:0x0 pfn:0xf8850
>>>>>
>>>>> [  173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>>>
>>>>> lastcpupid=0x1ffff) CMA
>>>>>
>>>>> [  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>>>
>>>>> 0000000000000000
>>>>>
>>>>> [  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>>>
>>>>> 0000000000000000
>>>>>
>>>>> [  173.985403] page dumped because: nonzero _refcount
>>>>
>>>> So, we're freeing something from CMA in cma_release().
>>>>
>>>>
>>>>
>>>> In cma_release() we iterate all pages to decrement their refcount
>>>>
>>>>
>>>>
>>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>>>
>>>>
>>>>
>>>> I would expect that this would fire already if there is still a page
>>>>
>>>> referenced.
>>>>
>>>>
>>>>
>>>> Are you running with CONFIG_DEBUG_VM=y ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Cheers,
>>>>
>>>>
>>>>
>>>> David
>>>
>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>
>>>
>>>
>>> At least that’s what I remember.
>>
>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>> “page dumped because: nonzero _refcount”, are disabled.
>>
>> It seems to me that someone else bump the page refcount between
>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>
>
> Merging Ron’s reply from another thread[1]:
>
> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>
> Looks like something is racy, since it is reproducible reliably.
>
> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/

VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
the validity of the expression is checked and no code is generated.
So that put_page_testzero() becomes a NOP.

Hi Ron,

Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?

diff --git a/mm/cma.c b/mm/cma.c
index 94b5da468a7d..96be62eb3713 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
 		return false;

 	pfn = page_to_pfn(pages);
-	for (i = 0; i < count; i++, pfn++)
-		VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
+	for (i = 0; i < count; i++, pfn++) {
+		int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
+
+		VM_WARN_ON(!ret);
+	}

 	__cma_release_frozen(cma, cmr, pages, count);



Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-24 17:29         ` Zi Yan
@ 2026-02-24 20:55           ` Ron Economos
  2026-02-25  1:58           ` Kefeng Wang
  1 sibling, 0 replies; 11+ messages in thread
From: Ron Economos @ 2026-02-24 20:55 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand (Arm)
  Cc: wangkefeng.wang, linux-mm, linux-kernel, linux-riscv, jackmanb,
	jane.chu, hannes, willy, muchun.song, osalvador, sidhartha.kumar,
	vbabka, claudiu.beznea.uj, Mark Brown, akpm, pjw

On 2/24/26 09:29, Zi Yan wrote:
> On 24 Feb 2026, at 12:17, Zi Yan wrote:
>
>> On 24 Feb 2026, at 12:14, Zi Yan wrote:
>>
>>> On 24 Feb 2026, at 12:07, David Hildenbrand wrote:
>>>
>>>>> David Hildenbrand (Arm) <david@kernel.org> hat am 24.02.2026 12:00 CET geschrieben:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2/24/26 09:37, Ron Economos wrote:
>>>>>
>>>>>> I'm getting a BUG dump during shutdown with Linux v7.0-rc1 on RISC-V.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [  OK  ] Reached target shutdown.target - System Shutdown.
>>>>>>
>>>>>> [  OK  ] Reached target final.target - Late Shutdown Services.
>>>>>>
>>>>>> [  OK  ] Finished systemd-reboot.service - System Reboot.
>>>>>>
>>>>>> [  OK  ] Reached target reboot.target - System Reboot.
>>>>>>
>>>>>> [  173.985249] BUG: Bad page state in process shutdown  pfn:f8850
>>>>>>
>>>>>> [  173.985311] page: refcount:1 mapcount:0 mapping:0000000000000000
>>>>>>
>>>>>> index:0x0 pfn:0xf8850
>>>>>>
>>>>>> [  173.985336] flags: 0xffff80000000000(node=0|zone=0|
>>>>>>
>>>>>> lastcpupid=0x1ffff) CMA
>>>>>>
>>>>>> [  173.985365] raw: 0ffff80000000000 ffffffc501e21448 ffffffc600f2ae88
>>>>>>
>>>>>> 0000000000000000
>>>>>>
>>>>>> [  173.985386] raw: 0000000000000000 0000000000000000 00000001ffffffff
>>>>>>
>>>>>> 0000000000000000
>>>>>>
>>>>>> [  173.985403] page dumped because: nonzero _refcount
>>>>> So, we're freeing something from CMA in cma_release().
>>>>>
>>>>>
>>>>>
>>>>> In cma_release() we iterate all pages to decrement their refcount
>>>>>
>>>>>
>>>>>
>>>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>>>>>
>>>>>
>>>>>
>>>>> I would expect that this would fire already if there is still a page
>>>>>
>>>>> referenced.
>>>>>
>>>>>
>>>>>
>>>>> Are you running with CONFIG_DEBUG_VM=y ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>>
>>>>> David
>>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>>
>>>>
>>>>
>>>> At least that’s what I remember.
>>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>>> “page dumped because: nonzero _refcount”, are disabled.
>>>
>>> It seems to me that someone else bump the page refcount between
>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>>
>> Merging Ron’s reply from another thread[1]:
>>
>> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
>> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>>
>> Looks like something is racy, since it is reproducible reliably.
>>
>> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
> VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
> the validity of the expression is checked and no code is generated.
> So that put_page_testzero() becomes a NOP.
>
> Hi Ron,
>
> Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?
>
> diff --git a/mm/cma.c b/mm/cma.c
> index 94b5da468a7d..96be62eb3713 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
>   		return false;
>
>   	pfn = page_to_pfn(pages);
> -	for (i = 0; i < count; i++, pfn++)
> -		VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
> +	for (i = 0; i < count; i++, pfn++) {
> +		int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
> +
> +		VM_WARN_ON(!ret);
> +	}
>
>   	__cma_release_frozen(cma, cmr, pages, count);
>
>
>
> Best Regards,
> Yan, Zi

Yes, that patch fixes the issue.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-24 17:29         ` Zi Yan
  2026-02-24 20:55           ` Ron Economos
@ 2026-02-25  1:58           ` Kefeng Wang
  2026-02-25  2:15             ` Zi Yan
  1 sibling, 1 reply; 11+ messages in thread
From: Kefeng Wang @ 2026-02-25  1:58 UTC (permalink / raw)
  To: Zi Yan, Ron Economos, David Hildenbrand (Arm)
  Cc: linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes,
	willy, muchun.song, osalvador, sidhartha.kumar, vbabka,
	claudiu.beznea.uj, Mark Brown, akpm, pjw



On 2026/2/25 1:29, Zi Yan wrote:
> On 24 Feb 2026, at 12:17, Zi Yan wrote:

...

>>>>
>>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>>
>>>>
>>>>
>>>> At least that’s what I remember.
>>>
>>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>>> “page dumped because: nonzero _refcount”, are disabled.
>>>
>>> It seems to me that someone else bump the page refcount between
>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>>
>>
>> Merging Ron’s reply from another thread[1]:
>>
>> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
>> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>>
>> Looks like something is racy, since it is reproducible reliably.
>>
>> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
> 
> VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
> the validity of the expression is checked and no code is generated.
> So that put_page_testzero() becomes a NOP.

Indeed...

> 
> Hi Ron,
> 
> Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 94b5da468a7d..96be62eb3713 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
>   		return false;
> 
>   	pfn = page_to_pfn(pages);
> -	for (i = 0; i < count; i++, pfn++)
> -		VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
> +	for (i = 0; i < count; i++, pfn++) {
> +		int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
> +
> +		VM_WARN_ON(!ret);
> +	}

Maybe we only warn once by adding back the original check?

diff --git a/mm/cma.c b/mm/cma.c
index 94b5da468a7d..a73a22d34232 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -1014,14 +1014,17 @@ bool cma_release(struct cma *cma, const struct 
page *pages,
  {
         struct cma_memrange *cmr;
         unsigned long i, pfn;
+       int ret = 0;

         cmr = find_cma_memrange(cma, pages, count);
         if (!cmr)
                 return false;

         pfn = page_to_pfn(pages);
-       for (i = 0; i < count; i++, pfn++)
-               VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
+       for (i = 0; i < count; i++, pfn++) {
+               ret + = put_page_testzero(pfn_to_page(pfn));
+
+       WARN(ret != 0, "%lu pages are still in use!\n", ret);

         __cma_release_frozen(cma, cmr, pages, count);



> 
>   	__cma_release_frozen(cma, cmr, pages, count);
> 
> 
> 
> Best Regards,
> Yan, Zi
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: mm: Regression with v7.0-rc1 on RISC-V
  2026-02-25  1:58           ` Kefeng Wang
@ 2026-02-25  2:15             ` Zi Yan
  0 siblings, 0 replies; 11+ messages in thread
From: Zi Yan @ 2026-02-25  2:15 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Ron Economos, David Hildenbrand (Arm),
	linux-mm, linux-kernel, linux-riscv, jackmanb, jane.chu, hannes,
	willy, muchun.song, osalvador, sidhartha.kumar, vbabka,
	claudiu.beznea.uj, Mark Brown, akpm, pjw

On 24 Feb 2026, at 20:58, Kefeng Wang wrote:

> On 2026/2/25 1:29, Zi Yan wrote:
>> On 24 Feb 2026, at 12:17, Zi Yan wrote:
>
> ...
>
>>>>>
>>>>> Thinking again without my computer at hand … isn‘t the call completely optimized out without CONFIG_DEBUG_VM?
>>>>>
>>>>>
>>>>>
>>>>> At least that’s what I remember.
>>>>
>>>> Right. Without CONFIG_DEBUG_VM=y, VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)))
>>>> and is_check_pages_enabled(), which leads to free_page_is_bad()’s
>>>> “page dumped because: nonzero _refcount”, are disabled.
>>>>
>>>> It seems to me that someone else bump the page refcount between
>>>> VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn))) and free_page_is_bad().
>>>>
>>>
>>> Merging Ron’s reply from another thread[1]:
>>>
>>> “Something strange is going on. I enabled CONFIG_DEBUG_VM by itself and
>>> the issue went away. Let me try CONFIG_DEBUG_PAGE_REF.”
>>>
>>> Looks like something is racy, since it is reproducible reliably.
>>>
>>> [1] https://lore.kernel.org/all/30dd1efc-9bd9-4664-999e-610d181600f9@w6rz.net/
>>
>> VM_WARN_ON() is BUILD_BUG_ON_INVALID() when CONFIG_DEBUG_VM is off. Only
>> the validity of the expression is checked and no code is generated.
>> So that put_page_testzero() becomes a NOP.
>
> Indeed...
>
>>
>> Hi Ron,
>>
>> Can you check if the patch below fix the issue without CONFIG_DEBUG_VM?
>>
>> diff --git a/mm/cma.c b/mm/cma.c
>> index 94b5da468a7d..96be62eb3713 100644
>> --- a/mm/cma.c
>> +++ b/mm/cma.c
>> @@ -1020,8 +1020,11 @@ bool cma_release(struct cma *cma, const struct page *pages,
>>   		return false;
>>
>>   	pfn = page_to_pfn(pages);
>> -	for (i = 0; i < count; i++, pfn++)
>> -		VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
>> +	for (i = 0; i < count; i++, pfn++) {
>> +		int __maybe_unused ret = put_page_testzero(pfn_to_page(pfn));
>> +
>> +		VM_WARN_ON(!ret);
>> +	}
>
> Maybe we only warn once by adding back the original check?
>
> diff --git a/mm/cma.c b/mm/cma.c
> index 94b5da468a7d..a73a22d34232 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -1014,14 +1014,17 @@ bool cma_release(struct cma *cma, const struct page *pages,
>  {
>         struct cma_memrange *cmr;
>         unsigned long i, pfn;
> +       int ret = 0;
>
>         cmr = find_cma_memrange(cma, pages, count);
>         if (!cmr)
>                 return false;
>
>         pfn = page_to_pfn(pages);
> -       for (i = 0; i < count; i++, pfn++)
> -               VM_WARN_ON(!put_page_testzero(pfn_to_page(pfn)));
> +       for (i = 0; i < count; i++, pfn++) {
> +               ret + = put_page_testzero(pfn_to_page(pfn));
> +
> +       WARN(ret != 0, "%lu pages are still in use!\n", ret);
>
>         __cma_release_frozen(cma, cmr, pages, count);

Sounds like a better solution. Let me use this as v2 fix.

Thanks.

>
>
>
>>
>>   	__cma_release_frozen(cma, cmr, pages, count);
>>
>>
>>
>> Best Regards,
>> Yan, Zi
>>


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-25  2:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24  8:37 mm: Regression with v7.0-rc1 on RISC-V Ron Economos
2026-02-24 11:00 ` David Hildenbrand (Arm)
     [not found]   ` <1966378802.577797.1771952827516@app.mailbox.org>
2026-02-24 17:14     ` Zi Yan
2026-02-24 17:17       ` Zi Yan
2026-02-24 17:29         ` Zi Yan
2026-02-24 20:55           ` Ron Economos
2026-02-25  1:58           ` Kefeng Wang
2026-02-25  2:15             ` Zi Yan
2026-02-24 17:21     ` Mark Brown
2026-02-24 12:58 ` Kefeng Wang
2026-02-24 13:25   ` Ron Economos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox