linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
@ 2026-02-04 12:49 是参差
  2026-02-04 17:12 ` David Hildenbrand (arm)
  0 siblings, 1 reply; 20+ messages in thread
From: 是参差 @ 2026-02-04 12:49 UTC (permalink / raw)
  To: linux-mm; +Cc: linmiaohe, akpm, linux-kernel, akpm, david

Hi,
I’m reporting a reproducible WARNING triggered in the hwpoison / memory_failure path when injecting a hardware-poison event via madvise(MADV_HWPOISON).

The warning is triggered by a syzkaller C reproducer that:
maps a file-backed region with MAP_FIXED, touches related VMAs, and then
calls madvise() with MADV_HWPOISON over a large range.
The kernel reports a VM_WARN_ON_ONCE_FOLIO(1) from memory_failure() and points to include/linux/huge_mm.h:635, suggesting an unexpected folio/page state encountered while handling a poisoned compound/huge folio.

The target page appears to be a compound head page (order:3) already marked hwpoison. memory_failure() seems to reach a branch that unconditionally warns (VM_WARN_ON_ONCE_FOLIO(1) at include/linux/huge_mm.h:635), which usually indicates an “unreachable”/unexpected folio type or state transition in the huge/compound folio handling logic during hwpoison processing.

This looks like a kernel-side invariant violation rather than a pure userspace misuse, since the warning is emitted from an unconditional VM_WARN_ON_ONCE_FOLIO(1) site.

Reproducer:
C reproducer: https://pastebin.com/raw/UxennX2B
console output: https://pastebin.com/raw/wrhKRwZY
kernel config: https://pastebin.com/raw/dP93yBLn

Kernel:

HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377

 git tree: torvalds/linux 

kernel version: 6.19.0-rc7  (QEMU Ubuntu 24.10)


head: 0000000000000003 ffffea00003f8a01 0000000800000007 00000000ffffffff
head: ffff88800fe29e00 0000000000000000 0000000000000000 0000000000000008
page dumped because: VM_WARN_ON_ONCE_FOLIO(1)
------------[ cut here ]------------
WARNING: include/linux/huge_mm.h:635 at min_order_for_split include/linux/huge_mm.h:635 [inline], CPU#0: syz.3.7564/25556
WARNING: include/linux/huge_mm.h:635 at min_order_for_split include/linux/huge_mm.h:633 [inline], CPU#0: syz.3.7564/25556
WARNING: include/linux/huge_mm.h:635 at memory_failure+0x22e8/0x2950 mm/memory-failure.c:2434, CPU#0: syz.3.7564/25556
CPU: 0 UID: 0 PID: 25556 Comm: syz.3.7564 Not tainted 6.19.0-rc7 #1 VOLUNTARY 
Hardware name: QEMU Ubuntu 24.10 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:min_order_for_split include/linux/huge_mm.h:635 [inline]
RIP: 0010:min_order_for_split include/linux/huge_mm.h:633 [inline]
RIP: 0010:memory_failure+0x22e8/0x2950 mm/memory-failure.c:2434
Code: ff 84 db 0f 85 f1 f5 ff ff e9 36 fe ff ff e8 3f 55 ce ff 48 c7 c6 e0 f7 ee 85 4c 89 f7 e8 90 b6 ed ff c6 05 13 35 04 06 01 90 <0f> 0b 90 e9 aa ee ff ff e8 eb 15 fe ff e9 65 e1 ff ff e8 a1 15 fe
RSP: 0018:ffff888000b7fa00 EFLAGS: 00010216
RAX: 00000000000066f1 RBX: 0000000000000000 RCX: ffffc90004fbb000
RDX: 0000000000080000 RSI: ffff888029a3c500 RDI: 0000000000000002
RBP: ffffea00003f8a00 R08: fffffbfff0ddc501 R09: ffffffff819105e0
R10: 0000000000000001 R11: ffff888000b7f7e7 R12: 000000000000fe28
R13: ffffea00003f8a00 R14: ffffea00003f8a00 R15: ffffea00003f8a08
FS:  00007f3d36d7f6c0(0000) GS:0000000000000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3d384a3828 CR3: 00000000589a9000 CR4: 0000000000350ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
 <TASK>
 madvise_inject_error mm/madvise.c:1489 [inline]
 madvise_do_behavior.part.0+0x137/0x3c0 mm/madvise.c:1927
 madvise_do_behavior+0x41d/0x5d0 mm/madvise.c:979
 do_madvise+0x134/0x1b0 mm/madvise.c:2030
 __do_sys_madvise mm/madvise.c:2039 [inline]
 __se_sys_madvise mm/madvise.c:2037 [inline]
 __x64_sys_madvise+0xa8/0x110 mm/madvise.c:2037
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xa9/0x320 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f3d3831ebe9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3d36d7f038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f3d38555fa0 RCX: 00007f3d3831ebe9
RDX: 0000000000000064 RSI: 0000000000600000 RDI: 0000200000000000
RBP: 00007f3d383a1e19 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f3d38556038 R14: 00007f3d38555fa0 R15: 00007ffe8f52e9d8
 </TASK>
---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 12:49 WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered 是参差
@ 2026-02-04 17:12 ` David Hildenbrand (arm)
  2026-02-04 17:15   ` David Hildenbrand (arm)
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 17:12 UTC (permalink / raw)
  To: 是参差, linux-mm
  Cc: linmiaohe, akpm, linux-kernel, Zi Yan, Matthew Wilcox

On 2/4/26 13:49, 是参差 wrote:
> Hi,
> I’m reporting a reproducible WARNING triggered in the hwpoison / memory_failure path when injecting a hardware-poison event via madvise(MADV_HWPOISON).
> 
> The warning is triggered by a syzkaller C reproducer that:
> maps a file-backed region with MAP_FIXED, touches related VMAs, and then
> calls madvise() with MADV_HWPOISON over a large range.
> The kernel reports a VM_WARN_ON_ONCE_FOLIO(1) from memory_failure() and points to include/linux/huge_mm.h:635, suggesting an unexpected folio/page state encountered while handling a poisoned compound/huge folio.
> 
> The target page appears to be a compound head page (order:3) already marked hwpoison. memory_failure() seems to reach a branch that unconditionally warns (VM_WARN_ON_ONCE_FOLIO(1) at include/linux/huge_mm.h:635), which usually indicates an “unreachable”/unexpected folio type or state transition in the huge/compound folio handling logic during hwpoison processing.
> 
> This looks like a kernel-side invariant violation rather than a pure userspace misuse, since the warning is emitted from an unconditional VM_WARN_ON_ONCE_FOLIO(1) site.
> 
> Reproducer:
> C reproducer: https://pastebin.com/raw/UxennX2B
> console output: https://pastebin.com/raw/wrhKRwZY
> kernel config: https://pastebin.com/raw/dP93yBLn
> 
> Kernel:
> 
> HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
> 
>   git tree: torvalds/linux
> 
> kernel version: 6.19.0-rc7  (QEMU Ubuntu 24.10)

@Zi Yan, this is weird.

We run into the VM_WARN_ON_ONCE_FOLIO(1, folio); in min_order_for_split(),
which is only active with !CONFIG_TRANSPARENT_HUGEPAGE.

But how do we get a large folio in that case? folio_test_large(folio) succeeded.

I think we rules out hugetlb before in that function.


Looking into the full console output, this is an order-3 folio (fully mapped).

How do we end up with a large folio here? I am only aware of that happening when something would
allocate an order-3 compound page (not a folio) and map it into the page tables. Yes, that
is nasty and can still happen, not sure yet though whether that is really what the reproducer
triggers.


[  451.810860] Injecting memory failure for pfn 0xfe28 at process virtual address 0x200000000000
[  451.812878] page: refcount:10 mapcount:1 mapping:0000000000000000 index:0xffff88800fe2e600 pfn:0xfe28
[  451.814740] head: order:3 mapcount:8 entire_mapcount:0 nr_pages_mapped:8 pincount:0
[  451.816263] flags: 0x200044(referenced|head|hwpoison|zone=0)
[  451.817414] raw: 0000000000200044 0000000000000000 dead000000000122 0000000000000000
[  451.818924] raw: ffff88800fe2e600 0000000000000000 0000000a00000000 0000000000000000
[  451.820422] head: 0000000000200044 0000000000000000 dead000000000122 0000000000000000
[  451.821835] head: ffff88800fe2e600 0000000000000000 0000000a00000000 0000000000000000
[  451.823276] head: 0000000000000003 ffffea00003f8a01 0000000800000007 00000000ffffffff
[  451.824701] head: ffff88800fe29e00 0000000000000000 0000000000000000 0000000000000008
[  451.826113] page dumped because: VM_WARN_ON_ONCE_FOLIO(1)

> 
> 
> head: 0000000000000003 ffffea00003f8a01 0000000800000007 00000000ffffffff
> head: ffff88800fe29e00 0000000000000000 0000000000000000 0000000000000008
> page dumped because: VM_WARN_ON_ONCE_FOLIO(1)
> ------------[ cut here ]------------
> WARNING: include/linux/huge_mm.h:635 at min_order_for_split include/linux/huge_mm.h:635 [inline], CPU#0: syz.3.7564/25556
> WARNING: include/linux/huge_mm.h:635 at min_order_for_split include/linux/huge_mm.h:633 [inline], CPU#0: syz.3.7564/25556
> WARNING: include/linux/huge_mm.h:635 at memory_failure+0x22e8/0x2950 mm/memory-failure.c:2434, CPU#0: syz.3.7564/25556
> CPU: 0 UID: 0 PID: 25556 Comm: syz.3.7564 Not tainted 6.19.0-rc7 #1 VOLUNTARY
> Hardware name: QEMU Ubuntu 24.10 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> RIP: 0010:min_order_for_split include/linux/huge_mm.h:635 [inline]
> RIP: 0010:min_order_for_split include/linux/huge_mm.h:633 [inline]
> RIP: 0010:memory_failure+0x22e8/0x2950 mm/memory-failure.c:2434
> Code: ff 84 db 0f 85 f1 f5 ff ff e9 36 fe ff ff e8 3f 55 ce ff 48 c7 c6 e0 f7 ee 85 4c 89 f7 e8 90 b6 ed ff c6 05 13 35 04 06 01 90 <0f> 0b 90 e9 aa ee ff ff e8 eb 15 fe ff e9 65 e1 ff ff e8 a1 15 fe
> RSP: 0018:ffff888000b7fa00 EFLAGS: 00010216
> RAX: 00000000000066f1 RBX: 0000000000000000 RCX: ffffc90004fbb000
> RDX: 0000000000080000 RSI: ffff888029a3c500 RDI: 0000000000000002
> RBP: ffffea00003f8a00 R08: fffffbfff0ddc501 R09: ffffffff819105e0
> R10: 0000000000000001 R11: ffff888000b7f7e7 R12: 000000000000fe28
> R13: ffffea00003f8a00 R14: ffffea00003f8a00 R15: ffffea00003f8a08
> FS:  00007f3d36d7f6c0(0000) GS:0000000000000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f3d384a3828 CR3: 00000000589a9000 CR4: 0000000000350ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> Call Trace:
>   <TASK>
>   madvise_inject_error mm/madvise.c:1489 [inline]
>   madvise_do_behavior.part.0+0x137/0x3c0 mm/madvise.c:1927
>   madvise_do_behavior+0x41d/0x5d0 mm/madvise.c:979
>   do_madvise+0x134/0x1b0 mm/madvise.c:2030
>   __do_sys_madvise mm/madvise.c:2039 [inline]
>   __se_sys_madvise mm/madvise.c:2037 [inline]
>   __x64_sys_madvise+0xa8/0x110 mm/madvise.c:2037
>   do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>   do_syscall_64+0xa9/0x320 arch/x86/entry/syscall_64.c:94
>   entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f3d3831ebe9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f3d36d7f038 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
> RAX: ffffffffffffffda RBX: 00007f3d38555fa0 RCX: 00007f3d3831ebe9
> RDX: 0000000000000064 RSI: 0000000000600000 RDI: 0000200000000000
> RBP: 00007f3d383a1e19 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f3d38556038 R14: 00007f3d38555fa0 R15: 00007ffe8f52e9d8
>   </TASK>
> ---[ end trace 0000000000000000 ]---


-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 17:12 ` David Hildenbrand (arm)
@ 2026-02-04 17:15   ` David Hildenbrand (arm)
  2026-02-04 17:23     ` Zi Yan
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 17:15 UTC (permalink / raw)
  To: 是参差, linux-mm
  Cc: linmiaohe, akpm, linux-kernel, Zi Yan, Matthew Wilcox

On 2/4/26 18:12, David Hildenbrand (arm) wrote:
> On 2/4/26 13:49, 是参差 wrote:
>> Hi,
>> I’m reporting a reproducible WARNING triggered in the hwpoison / 
>> memory_failure path when injecting a hardware-poison event via 
>> madvise(MADV_HWPOISON).
>>
>> The warning is triggered by a syzkaller C reproducer that:
>> maps a file-backed region with MAP_FIXED, touches related VMAs, and then
>> calls madvise() with MADV_HWPOISON over a large range.
>> The kernel reports a VM_WARN_ON_ONCE_FOLIO(1) from 
>> memory_failure() and points to include/linux/huge_mm.h:635, suggesting 
>> an unexpected folio/page state encountered while handling a poisoned 
>> compound/huge folio.
>>
>> The target page appears to be a compound head page (order:3) already 
>> marked hwpoison. memory_failure() seems to reach a branch that 
>> unconditionally warns (VM_WARN_ON_ONCE_FOLIO(1) at include/linux/ 
>> huge_mm.h:635), which usually indicates an “unreachable”/unexpected 
>> folio type or state transition in the huge/compound folio handling 
>> logic during hwpoison processing.
>>
>> This looks like a kernel-side invariant violation rather than a pure 
>> userspace misuse, since the warning is emitted from an unconditional 
>> VM_WARN_ON_ONCE_FOLIO(1) site.
>>
>> Reproducer:
>> C reproducer: https://pastebin.com/raw/UxennX2B
>> console output: https://pastebin.com/raw/wrhKRwZY
>> kernel config: https://pastebin.com/raw/dP93yBLn
>>
>> Kernel:
>>
>> HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
>>
>>   git tree: torvalds/linux
>>
>> kernel version: 6.19.0-rc7  (QEMU Ubuntu 24.10)
> 
> @Zi Yan, this is weird.
> 
> We run into the VM_WARN_ON_ONCE_FOLIO(1, folio); in min_order_for_split(),
> which is only active with !CONFIG_TRANSPARENT_HUGEPAGE.
> 
> But how do we get a large folio in that case? folio_test_large(folio) 
> succeeded.
> 
> I think we rules out hugetlb before in that function.
> 
> 
> Looking into the full console output, this is an order-3 folio (fully 
> mapped).
> 
> How do we end up with a large folio here? I am only aware of that 
> happening when something would
> allocate an order-3 compound page (not a folio) and map it into the page 
> tables. Yes, that
> is nasty and can still happen, not sure yet though whether that is 
> really what the reproducer
> triggers.

Looking again,

mapping:0000000000000000 index:0xffff88800fe2e600

At least mapping==0 could indicate a non-folio thing.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 17:15   ` David Hildenbrand (arm)
@ 2026-02-04 17:23     ` Zi Yan
  2026-02-04 17:34       ` Zi Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 17:23 UTC (permalink / raw)
  To: David Hildenbrand (arm)
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 12:15, David Hildenbrand (arm) wrote:

> On 2/4/26 18:12, David Hildenbrand (arm) wrote:
>> On 2/4/26 13:49, 是参差 wrote:
>>> Hi,
>>> I’m reporting a reproducible WARNING triggered in the hwpoison / memory_failure path when injecting a hardware-poison event via madvise(MADV_HWPOISON).
>>>
>>> The warning is triggered by a syzkaller C reproducer that:
>>> maps a file-backed region with MAP_FIXED, touches related VMAs, and then
>>> calls madvise() with MADV_HWPOISON over a large range.
>>> The kernel reports a VM_WARN_ON_ONCE_FOLIO(1) from memory_failure() and points to include/linux/huge_mm.h:635, suggesting an unexpected folio/page state encountered while handling a poisoned compound/huge folio.
>>>
>>> The target page appears to be a compound head page (order:3) already marked hwpoison. memory_failure() seems to reach a branch that unconditionally warns (VM_WARN_ON_ONCE_FOLIO(1) at include/linux/ huge_mm.h:635), which usually indicates an “unreachable”/unexpected folio type or state transition in the huge/compound folio handling logic during hwpoison processing.
>>>
>>> This looks like a kernel-side invariant violation rather than a pure userspace misuse, since the warning is emitted from an unconditional VM_WARN_ON_ONCE_FOLIO(1) site.
>>>
>>> Reproducer:
>>> C reproducer: https://pastebin.com/raw/UxennX2B
>>> console output: https://pastebin.com/raw/wrhKRwZY
>>> kernel config: https://pastebin.com/raw/dP93yBLn
>>>
>>> Kernel:
>>>
>>> HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
>>>
>>>   git tree: torvalds/linux
>>>
>>> kernel version: 6.19.0-rc7  (QEMU Ubuntu 24.10)
>>
>> @Zi Yan, this is weird.
>>
>> We run into the VM_WARN_ON_ONCE_FOLIO(1, folio); in min_order_for_split(),
>> which is only active with !CONFIG_TRANSPARENT_HUGEPAGE.
>>
>> But how do we get a large folio in that case? folio_test_large(folio) succeeded.
>>
>> I think we rules out hugetlb before in that function.
>>
>>
>> Looking into the full console output, this is an order-3 folio (fully mapped).
>>
>> How do we end up with a large folio here? I am only aware of that happening when something would
>> allocate an order-3 compound page (not a folio) and map it into the page tables. Yes, that
>> is nasty and can still happen, not sure yet though whether that is really what the reproducer
>> triggers.
>
> Looking again,
>
> mapping:0000000000000000 index:0xffff88800fe2e600
>
> At least mapping==0 could indicate a non-folio thing.

From the C repro above, syzbot opened a dev "/dev/sg#" and did mmap on it.
Is it a device driver issue?


--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 17:23     ` Zi Yan
@ 2026-02-04 17:34       ` Zi Yan
  2026-02-04 17:41         ` Zi Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 17:34 UTC (permalink / raw)
  To: David Hildenbrand (arm)
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 12:23, Zi Yan wrote:

> On 4 Feb 2026, at 12:15, David Hildenbrand (arm) wrote:
>
>> On 2/4/26 18:12, David Hildenbrand (arm) wrote:
>>> On 2/4/26 13:49, 是参差 wrote:
>>>> Hi,
>>>> I’m reporting a reproducible WARNING triggered in the hwpoison / memory_failure path when injecting a hardware-poison event via madvise(MADV_HWPOISON).
>>>>
>>>> The warning is triggered by a syzkaller C reproducer that:
>>>> maps a file-backed region with MAP_FIXED, touches related VMAs, and then
>>>> calls madvise() with MADV_HWPOISON over a large range.
>>>> The kernel reports a VM_WARN_ON_ONCE_FOLIO(1) from memory_failure() and points to include/linux/huge_mm.h:635, suggesting an unexpected folio/page state encountered while handling a poisoned compound/huge folio.
>>>>
>>>> The target page appears to be a compound head page (order:3) already marked hwpoison. memory_failure() seems to reach a branch that unconditionally warns (VM_WARN_ON_ONCE_FOLIO(1) at include/linux/ huge_mm.h:635), which usually indicates an “unreachable”/unexpected folio type or state transition in the huge/compound folio handling logic during hwpoison processing.
>>>>
>>>> This looks like a kernel-side invariant violation rather than a pure userspace misuse, since the warning is emitted from an unconditional VM_WARN_ON_ONCE_FOLIO(1) site.
>>>>
>>>> Reproducer:
>>>> C reproducer: https://pastebin.com/raw/UxennX2B
>>>> console output: https://pastebin.com/raw/wrhKRwZY
>>>> kernel config: https://pastebin.com/raw/dP93yBLn
>>>>
>>>> Kernel:
>>>>
>>>> HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
>>>>
>>>>   git tree: torvalds/linux
>>>>
>>>> kernel version: 6.19.0-rc7  (QEMU Ubuntu 24.10)
>>>
>>> @Zi Yan, this is weird.
>>>
>>> We run into the VM_WARN_ON_ONCE_FOLIO(1, folio); in min_order_for_split(),
>>> which is only active with !CONFIG_TRANSPARENT_HUGEPAGE.
>>>
>>> But how do we get a large folio in that case? folio_test_large(folio) succeeded.
>>>
>>> I think we rules out hugetlb before in that function.
>>>
>>>
>>> Looking into the full console output, this is an order-3 folio (fully mapped).
>>>
>>> How do we end up with a large folio here? I am only aware of that happening when something would
>>> allocate an order-3 compound page (not a folio) and map it into the page tables. Yes, that
>>> is nasty and can still happen, not sure yet though whether that is really what the reproducer
>>> triggers.
>>
>> Looking again,
>>
>> mapping:0000000000000000 index:0xffff88800fe2e600
>>
>> At least mapping==0 could indicate a non-folio thing.
>
> From the C repro above, syzbot opened a dev "/dev/sg#" and did mmap on it.
> Is it a device driver issue?

OK, it is CONFIG_CHR_DEV_SG. And the driver allocates a compound page at[1].
Since we initialize folio fields in prep_compound_page(), it becomes a folio
when it is inserted into a VMA.

It seems that my compound page and folio code separation patchset comes right
on time[2]. Basically, compound page should not be a folio.
With !CONFIG_TRANSPARENT_HUGEPAGE, __GFP_COMP for allocating a compound page
that is used as a folio should be rejected.

[1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/scsi/sg.c#L1868
[2] https://lore.kernel.org/all/20260130034818.472804-1-ziy@nvidia.com/

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 17:34       ` Zi Yan
@ 2026-02-04 17:41         ` Zi Yan
  2026-02-04 19:18           ` David Hildenbrand (arm)
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 17:41 UTC (permalink / raw)
  To: David Hildenbrand (arm)
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 12:34, Zi Yan wrote:

> On 4 Feb 2026, at 12:23, Zi Yan wrote:
>
>> On 4 Feb 2026, at 12:15, David Hildenbrand (arm) wrote:
>>
>>> On 2/4/26 18:12, David Hildenbrand (arm) wrote:
>>>> On 2/4/26 13:49, 是参差 wrote:
>>>>> Hi,
>>>>> I’m reporting a reproducible WARNING triggered in the hwpoison / memory_failure path when injecting a hardware-poison event via madvise(MADV_HWPOISON).
>>>>>
>>>>> The warning is triggered by a syzkaller C reproducer that:
>>>>> maps a file-backed region with MAP_FIXED, touches related VMAs, and then
>>>>> calls madvise() with MADV_HWPOISON over a large range.
>>>>> The kernel reports a VM_WARN_ON_ONCE_FOLIO(1) from memory_failure() and points to include/linux/huge_mm.h:635, suggesting an unexpected folio/page state encountered while handling a poisoned compound/huge folio.
>>>>>
>>>>> The target page appears to be a compound head page (order:3) already marked hwpoison. memory_failure() seems to reach a branch that unconditionally warns (VM_WARN_ON_ONCE_FOLIO(1) at include/linux/ huge_mm.h:635), which usually indicates an “unreachable”/unexpected folio type or state transition in the huge/compound folio handling logic during hwpoison processing.
>>>>>
>>>>> This looks like a kernel-side invariant violation rather than a pure userspace misuse, since the warning is emitted from an unconditional VM_WARN_ON_ONCE_FOLIO(1) site.
>>>>>
>>>>> Reproducer:
>>>>> C reproducer: https://pastebin.com/raw/UxennX2B
>>>>> console output: https://pastebin.com/raw/wrhKRwZY
>>>>> kernel config: https://pastebin.com/raw/dP93yBLn
>>>>>
>>>>> Kernel:
>>>>>
>>>>> HEAD commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
>>>>>
>>>>>   git tree: torvalds/linux
>>>>>
>>>>> kernel version: 6.19.0-rc7  (QEMU Ubuntu 24.10)
>>>>
>>>> @Zi Yan, this is weird.
>>>>
>>>> We run into the VM_WARN_ON_ONCE_FOLIO(1, folio); in min_order_for_split(),
>>>> which is only active with !CONFIG_TRANSPARENT_HUGEPAGE.
>>>>
>>>> But how do we get a large folio in that case? folio_test_large(folio) succeeded.
>>>>
>>>> I think we rules out hugetlb before in that function.
>>>>
>>>>
>>>> Looking into the full console output, this is an order-3 folio (fully mapped).
>>>>
>>>> How do we end up with a large folio here? I am only aware of that happening when something would
>>>> allocate an order-3 compound page (not a folio) and map it into the page tables. Yes, that
>>>> is nasty and can still happen, not sure yet though whether that is really what the reproducer
>>>> triggers.
>>>
>>> Looking again,
>>>
>>> mapping:0000000000000000 index:0xffff88800fe2e600
>>>
>>> At least mapping==0 could indicate a non-folio thing.
>>
>> From the C repro above, syzbot opened a dev "/dev/sg#" and did mmap on it.
>> Is it a device driver issue?
>
> OK, it is CONFIG_CHR_DEV_SG. And the driver allocates a compound page at[1].
> Since we initialize folio fields in prep_compound_page(), it becomes a folio
> when it is inserted into a VMA.

More details:
later at sg_vma_fault(), the driver just handles a page fault by supplying
a subpage from a pre-allocated compound page[3]. We then get a large folio
without !CONFIG_TRANSPARENT_HUGEPAGE.

[3] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/scsi/sg.c#L1241

>
> It seems that my compound page and folio code separation patchset comes right
> on time[2]. Basically, compound page should not be a folio.
> With !CONFIG_TRANSPARENT_HUGEPAGE, __GFP_COMP for allocating a compound page
> that is used as a folio should be rejected.
>
> [1] https://elixir.bootlin.com/linux/v6.18.6/source/drivers/scsi/sg.c#L1868
> [2] https://lore.kernel.org/all/20260130034818.472804-1-ziy@nvidia.com/
>
> --
> Best Regards,
> Yan, Zi


--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 17:41         ` Zi Yan
@ 2026-02-04 19:18           ` David Hildenbrand (arm)
  2026-02-04 19:48             ` Zi Yan
  2026-02-04 21:08             ` Zi Yan
  0 siblings, 2 replies; 20+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 19:18 UTC (permalink / raw)
  To: Zi Yan
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 2/4/26 18:41, Zi Yan wrote:
> On 4 Feb 2026, at 12:34, Zi Yan wrote:
> 
>> On 4 Feb 2026, at 12:23, Zi Yan wrote:
>>
>>>
>>>
>>>  From the C repro above, syzbot opened a dev "/dev/sg#" and did mmap on it.
>>> Is it a device driver issue?
>>
>> OK, it is CONFIG_CHR_DEV_SG. And the driver allocates a compound page at[1].
>> Since we initialize folio fields in prep_compound_page(), it becomes a folio
>> when it is inserted into a VMA.
> 
> More details:
> later at sg_vma_fault(), the driver just handles a page fault by supplying
> a subpage from a pre-allocated compound page[3]. We then get a large folio
> without !CONFIG_TRANSPARENT_HUGEPAGE.

We can identify such non-folio (but compound) things by looking at 
PG_large_rmappable IIRC.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 19:18           ` David Hildenbrand (arm)
@ 2026-02-04 19:48             ` Zi Yan
  2026-02-04 19:55               ` David Hildenbrand (arm)
  2026-02-04 21:08             ` Zi Yan
  1 sibling, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 19:48 UTC (permalink / raw)
  To: David Hildenbrand (arm)
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:

> On 2/4/26 18:41, Zi Yan wrote:
>> On 4 Feb 2026, at 12:34, Zi Yan wrote:
>>
>>> On 4 Feb 2026, at 12:23, Zi Yan wrote:
>>>
>>>>
>>>>
>>>>  From the C repro above, syzbot opened a dev "/dev/sg#" and did mmap on it.
>>>> Is it a device driver issue?
>>>
>>> OK, it is CONFIG_CHR_DEV_SG. And the driver allocates a compound page at[1].
>>> Since we initialize folio fields in prep_compound_page(), it becomes a folio
>>> when it is inserted into a VMA.
>>
>> More details:
>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>
> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.

What do you mean? Changing memory failure code to only handle large_rmappable?
large_rmappable is a folio flag, memory failure code should see such
non-folio but compound things to begin with, IMHO.

I think we need to be able to tell between raw page (compound or not),
mappable page (compound or not, especially for those used with vm_insert_*),
and folio.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 19:48             ` Zi Yan
@ 2026-02-04 19:55               ` David Hildenbrand (arm)
  2026-02-04 20:13                 ` Zi Yan
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 19:55 UTC (permalink / raw)
  To: Zi Yan
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 2/4/26 20:48, Zi Yan wrote:
> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
> 
>> On 2/4/26 18:41, Zi Yan wrote:
>>>
>>>
>>> More details:
>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>
>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
> 
> What do you mean? Changing memory failure code to only handle large_rmappable?
> large_rmappable is a folio flag, memory failure code should see such

Did you mean "should not" ? :)

> non-folio but compound things to begin with, IMHO.

I would say that we could right now reject in memory failure code any 
compound pages that do not have PG_large_rmappable set.

I have the faint recollection that we don't set PG_large_rmappable on 
hugetlb folios yet, so they have to identified as well.

> 
> I think we need to be able to tell between raw page (compound or not),
> mappable page (compound or not, especially for those used with vm_insert_*),
> and folio.

We can't identify (small) folios just yet. We'd need another page flag 
for that (just like PG_large_rmappable), and we all know how that ends ;)

With Willy's work we'll be able to identify folios reliably.

How to deal with that vm_insert_* crap, especially for non-folio pages, 
is also future work based on that.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 19:55               ` David Hildenbrand (arm)
@ 2026-02-04 20:13                 ` Zi Yan
  2026-02-04 20:31                   ` David Hildenbrand (arm)
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 20:13 UTC (permalink / raw)
  To: David Hildenbrand (arm)
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 14:55, David Hildenbrand (arm) wrote:

> On 2/4/26 20:48, Zi Yan wrote:
>> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
>>
>>> On 2/4/26 18:41, Zi Yan wrote:
>>>>
>>>>
>>>> More details:
>>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>>
>>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
>>
>> What do you mean? Changing memory failure code to only handle large_rmappable?
>> large_rmappable is a folio flag, memory failure code should see such
>
> Did you mean "should not" ? :)

Yes.

>
>> non-folio but compound things to begin with, IMHO.
>
> I would say that we could right now reject in memory failure code any compound pages that do not have PG_large_rmappable set.
>
> I have the faint recollection that we don't set PG_large_rmappable on hugetlb folios yet, so they have to identified as well.

Right. My patchset[1] is trying to add it, since hugetlb is used as a folio
in most places and large_rmappable is a folio flag.


[1] https://lore.kernel.org/all/20260130034818.472804-1-ziy@nvidia.com/
>>
>> I think we need to be able to tell between raw page (compound or not),
>> mappable page (compound or not, especially for those used with vm_insert_*),
>> and folio.
>
> We can't identify (small) folios just yet. We'd need another page flag for that (just like PG_large_rmappable), and we all know how that ends ;)

Yes, I am thinking about removing mapcount in struct page to achieve that.
And only pages used for vm_insert_*() and folios need mapcount. Code
uses vm_insert_*() on pages would probably have a struct mappable_page
with mapcount.

>
> With Willy's work we'll be able to identify folios reliably.
>
> How to deal with that vm_insert_* crap, especially for non-folio pages, is also future work based on that.

I think it might the other way around. memdesc does not have mapcount,
if we do not have a separate struct for these mappable pages now,
what do we use at memdesc time? folio?


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 20:13                 ` Zi Yan
@ 2026-02-04 20:31                   ` David Hildenbrand (arm)
  2026-02-04 20:45                     ` Zi Yan
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 20:31 UTC (permalink / raw)
  To: Zi Yan
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 2/4/26 21:13, Zi Yan wrote:
> On 4 Feb 2026, at 14:55, David Hildenbrand (arm) wrote:
> 
>> On 2/4/26 20:48, Zi Yan wrote:
>>>
>>>
>>> What do you mean? Changing memory failure code to only handle large_rmappable?
>>> large_rmappable is a folio flag, memory failure code should see such
>>
>> Did you mean "should not" ? :)
> 
> Yes.
> 
>>
>>> non-folio but compound things to begin with, IMHO.
>>
>> I would say that we could right now reject in memory failure code any compound pages that do not have PG_large_rmappable set.
>>
>> I have the faint recollection that we don't set PG_large_rmappable on hugetlb folios yet, so they have to identified as well.
> 
> Right. My patchset[1] is trying to add it, since hugetlb is used as a folio
> in most places and large_rmappable is a folio flag.
> 
> 
> [1] https://lore.kernel.org/all/20260130034818.472804-1-ziy@nvidia.com/

Still on my todo list :)

>>>
>>> I think we need to be able to tell between raw page (compound or not),
>>> mappable page (compound or not, especially for those used with vm_insert_*),
>>> and folio.
>>
>> We can't identify (small) folios just yet. We'd need another page flag for that (just like PG_large_rmappable), and we all know how that ends ;)
> 
> Yes, I am thinking about removing mapcount in struct page to achieve that.

On my todo list :) Stupid CONFIG_PAGE_MAPCOUNT that is still around and 
stupid partial-mapping handling.

I worked on this after LPC but was distracted by PTO :D

> And only pages used for vm_insert_*() and folios need mapcount. 

vm_insert_*() won't need it for non-folio things. Only folios. We just 
have to teach the zap code to also leave the mapcount of these non-folio 
things alone. IOW, identify them when we map/unmap as "not folio" and 
not touch the mapcount.

> Code
> uses vm_insert_*() on pages would probably have a struct mappable_page
> with mapcount.

I don't think we'll need a mapcount for them. Only for folios.

> 
>>
>> With Willy's work we'll be able to identify folios reliably.
>>
>> How to deal with that vm_insert_* crap, especially for non-folio pages, is also future work based on that.
> 
> I think it might the other way around. memdesc does not have mapcount,
> if we do not have a separate struct for these mappable pages now,
> what do we use at memdesc time? folio?

Folios will have mapcount related information, yes. Long term, memdescs 
will for sure not have any.

Real fun begins once we start messing with refcounts. vm_insert_*() will 
be "fun" on non-folio things.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 20:31                   ` David Hildenbrand (arm)
@ 2026-02-04 20:45                     ` Zi Yan
  2026-02-04 21:14                       ` David Hildenbrand (arm)
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 20:45 UTC (permalink / raw)
  To: David Hildenbrand (arm)
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 15:31, David Hildenbrand (arm) wrote:

> On 2/4/26 21:13, Zi Yan wrote:
>> On 4 Feb 2026, at 14:55, David Hildenbrand (arm) wrote:
>>
>>> On 2/4/26 20:48, Zi Yan wrote:
>>>>
>>>>
>>>> What do you mean? Changing memory failure code to only handle large_rmappable?
>>>> large_rmappable is a folio flag, memory failure code should see such
>>>
>>> Did you mean "should not" ? :)
>>
>> Yes.
>>
>>>
>>>> non-folio but compound things to begin with, IMHO.
>>>
>>> I would say that we could right now reject in memory failure code any compound pages that do not have PG_large_rmappable set.
>>>
>>> I have the faint recollection that we don't set PG_large_rmappable on hugetlb folios yet, so they have to identified as well.
>>
>> Right. My patchset[1] is trying to add it, since hugetlb is used as a folio
>> in most places and large_rmappable is a folio flag.
>>
>>
>> [1] https://lore.kernel.org/all/20260130034818.472804-1-ziy@nvidia.com/
>
> Still on my todo list :)

Sure. Wait for your input there. :)

>
>>>>
>>>> I think we need to be able to tell between raw page (compound or not),
>>>> mappable page (compound or not, especially for those used with vm_insert_*),
>>>> and folio.
>>>
>>> We can't identify (small) folios just yet. We'd need another page flag for that (just like PG_large_rmappable), and we all know how that ends ;)
>>
>> Yes, I am thinking about removing mapcount in struct page to achieve that.
>
> On my todo list :) Stupid CONFIG_PAGE_MAPCOUNT that is still around and stupid partial-mapping handling.
>
> I worked on this after LPC but was distracted by PTO :D
>
>> And only pages used for vm_insert_*() and folios need mapcount.
>
> vm_insert_*() won't need it for non-folio things. Only folios. We just have to teach the zap code to also leave the mapcount of these non-folio things alone. IOW, identify them when we map/unmap as "not folio" and not touch the mapcount.

Oh, that sounds great. I thought I would need to convert all drive code
that does vm_insert_*() to use folio. Basically, I hit
__folio_large_mapcount_sanity_checks() on _mm_id_mapcount when I moved
_mm_id_mapcount and friends from prep_compound_page() to page_rmappable_folio().
IIUC, __folio_add_file_rmap() can just return if a non-folio compound page
is encountered. Of course, remove part needs to do the same.


>
>> Code
>> uses vm_insert_*() on pages would probably have a struct mappable_page
>> with mapcount.
>
> I don't think we'll need a mapcount for them. Only for folios.
>
>>
>>>
>>> With Willy's work we'll be able to identify folios reliably.
>>>
>>> How to deal with that vm_insert_* crap, especially for non-folio pages, is also future work based on that.
>>
>> I think it might the other way around. memdesc does not have mapcount,
>> if we do not have a separate struct for these mappable pages now,
>> what do we use at memdesc time? folio?
>
> Folios will have mapcount related information, yes. Long term, memdescs will for sure not have any.
>
> Real fun begins once we start messing with refcounts. vm_insert_*() will be "fun" on non-folio things.

Yeah, maybe we will refcounts for every used memdescs. But who knows.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 19:18           ` David Hildenbrand (arm)
  2026-02-04 19:48             ` Zi Yan
@ 2026-02-04 21:08             ` Zi Yan
  2026-02-04 21:37               ` David Hildenbrand (arm)
  1 sibling, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 21:08 UTC (permalink / raw)
  To: David Hildenbrand (arm), 是参差
  Cc: linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:

> On 2/4/26 18:41, Zi Yan wrote:
>> On 4 Feb 2026, at 12:34, Zi Yan wrote:
>>
>>> On 4 Feb 2026, at 12:23, Zi Yan wrote:
>>>
>>>>
>>>>
>>>>  From the C repro above, syzbot opened a dev "/dev/sg#" and did mmap on it.
>>>> Is it a device driver issue?
>>>
>>> OK, it is CONFIG_CHR_DEV_SG. And the driver allocates a compound page at[1].
>>> Since we initialize folio fields in prep_compound_page(), it becomes a folio
>>> when it is inserted into a VMA.
>>
>> More details:
>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>
> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.

OK, back to the issue. The patch below should fix the issue?

Hi 是参差,

Can you test it?

From fa4a900e027a0c7365a9f786840943990d5da971 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Wed, 4 Feb 2026 16:04:19 -0500
Subject: [PATCH] mm/memory_failure: reject unsupported non-folio compound page

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/memory-failure.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 825c706ac576..4ed903de9a0e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2440,9 +2440,12 @@ int memory_failure(unsigned long pfn, int flags)

 	folio = page_folio(p);

-	/* filter pages that are protected from hwpoison test by users */
+	/*
+	 * filter pages that are protected from hwpoison test by users
+	 * or unsupported non folio compound pages
+	 */
 	folio_lock(folio);
-	if (hwpoison_filter(p)) {
+	if (hwpoison_filter(p) || !folio_test_large_rmappable(folio)) {
 		ClearPageHWPoison(p);
 		folio_unlock(folio);
 		folio_put(folio);
-- 
2.51.0



Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 20:45                     ` Zi Yan
@ 2026-02-04 21:14                       ` David Hildenbrand (arm)
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 21:14 UTC (permalink / raw)
  To: Zi Yan
  Cc: 是参差,
	linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 2/4/26 21:45, Zi Yan wrote:
> On 4 Feb 2026, at 15:31, David Hildenbrand (arm) wrote:
> 
>> On 2/4/26 21:13, Zi Yan wrote:
>>>
>>>
>>> Yes.
>>>
>>>
>>> Right. My patchset[1] is trying to add it, since hugetlb is used as a folio
>>> in most places and large_rmappable is a folio flag.
>>>
>>>
>>> [1] https://lore.kernel.org/all/20260130034818.472804-1-ziy@nvidia.com/
>>
>> Still on my todo list :)
> 
> Sure. Wait for your input there. :)

Hoping I'll be able to dig through that pile in my inbox fairly quickly
throughout the next days.

Thanks for all the review you did why I was not paying a lot of attention.

> 
>>
>>>
>>> Yes, I am thinking about removing mapcount in struct page to achieve that.
>>
>> On my todo list :) Stupid CONFIG_PAGE_MAPCOUNT that is still around and stupid partial-mapping handling.
>>
>> I worked on this after LPC but was distracted by PTO :D
>>
>>> And only pages used for vm_insert_*() and folios need mapcount.
>>
>> vm_insert_*() won't need it for non-folio things. Only folios. We just have to teach the zap code to also leave the mapcount of these non-folio things alone. IOW, identify them when we map/unmap as "not folio" and not touch the mapcount.
> 
> Oh, that sounds great. I thought I would need to convert all drive code
> that does vm_insert_*() to use folio.

Heh, no. We really just have to identify them when mapping and unmapping them.
"these are not folios".

> Basically, I hit
> __folio_large_mapcount_sanity_checks() on _mm_id_mapcount when I moved
> _mm_id_mapcount and friends from prep_compound_page() to page_rmappable_folio().

Yes, exact same issue.

I ran into something similar in the past and documented it in __folio_rmap_sanity_checks():

	/*
	 * TODO: we get driver-allocated folios that have nothing to do with
	 * the rmap using vm_insert_page(); therefore, we cannot assume that
	 * folio_test_large_rmappable() holds for large folios. We should
	 * handle any desired mapcount+stats accounting for these folios in
	 * VM_MIXEDMAP VMAs separately, and then sanity-check here that
	 * we really only get rmappable folios.
	 */

should have been "for these pages" now that it's clear that not all pages
are/will-be folios.

I added an rmappable check in there back then but found out about the other
compound pages.


> IIUC, __folio_add_file_rmap() can just return if a non-folio compound page
> is encountered. Of course, remove part needs to do the same.

We should never call that code, because ... we don't won't really have a folio :)

With Willy's changes, page_folio() will return NULL for things that are not a folio
IIRC.


> 
> 
>>
>>> Code
>>> uses vm_insert_*() on pages would probably have a struct mappable_page
>>> with mapcount.
>>
>> I don't think we'll need a mapcount for them. Only for folios.
>>
>>>
>>>
>>> I think it might the other way around. memdesc does not have mapcount,
>>> if we do not have a separate struct for these mappable pages now,
>>> what do we use at memdesc time? folio?
>>
>> Folios will have mapcount related information, yes. Long term, memdescs will for sure not have any.
>>
>> Real fun begins once we start messing with refcounts. vm_insert_*() will be "fun" on non-folio things.
> 
> Yeah, maybe we will refcounts for every used memdescs. But who knows.

Some of these things should probably be frozen pages and use a different interface
then. A bunch of hard nuts to crack.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 21:08             ` Zi Yan
@ 2026-02-04 21:37               ` David Hildenbrand (arm)
  2026-02-04 21:41                 ` Zi Yan
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand (arm) @ 2026-02-04 21:37 UTC (permalink / raw)
  To: Zi Yan, 是参差
  Cc: linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 2/4/26 22:08, Zi Yan wrote:
> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
> 
>> On 2/4/26 18:41, Zi Yan wrote:
>>>
>>>
>>> More details:
>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>
>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
> 
> OK, back to the issue. The patch below should fix the issue?
> 
> Hi 是参差,
> 
> Can you test it?
> 
>  From fa4a900e027a0c7365a9f786840943990d5da971 Mon Sep 17 00:00:00 2001
> From: Zi Yan <ziy@nvidia.com>
> Date: Wed, 4 Feb 2026 16:04:19 -0500
> Subject: [PATCH] mm/memory_failure: reject unsupported non-folio compound page
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/memory-failure.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 825c706ac576..4ed903de9a0e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2440,9 +2440,12 @@ int memory_failure(unsigned long pfn, int flags)
> 
>   	folio = page_folio(p);
> 
> -	/* filter pages that are protected from hwpoison test by users */
> +	/*
> +	 * filter pages that are protected from hwpoison test by users
> +	 * or unsupported non folio compound pages
> +	 */
>   	folio_lock(folio);
> -	if (hwpoison_filter(p)) {
> +	if (hwpoison_filter(p) || !folio_test_large_rmappable(folio)) {

I think you have to test for folio_test_large() before testing 
folio_test_large_rmappable().

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 21:37               ` David Hildenbrand (arm)
@ 2026-02-04 21:41                 ` Zi Yan
  2026-02-05  2:00                   ` jane.chu
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-04 21:41 UTC (permalink / raw)
  To: David Hildenbrand (arm), 是参差
  Cc: linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 16:37, David Hildenbrand (arm) wrote:

> On 2/4/26 22:08, Zi Yan wrote:
>> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
>>
>>> On 2/4/26 18:41, Zi Yan wrote:
>>>>
>>>>
>>>> More details:
>>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>>
>>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
>>
>> OK, back to the issue. The patch below should fix the issue?
>>
>> Hi 是参差,
>>
>> Can you test it?
>>

<snip>
> I think you have to test for folio_test_large() before testing folio_test_large_rmappable().

Oh, forgot that. Thanks.


From 8dda4bba9964890462eca3ef3cce57bb4fab8313 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Wed, 4 Feb 2026 16:04:19 -0500
Subject: [PATCH] mm/memory_failure: reject unsupported non-folio compound page

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/memory-failure.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 825c706ac576..137c67fda57e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2440,9 +2440,13 @@ int memory_failure(unsigned long pfn, int flags)

 	folio = page_folio(p);

-	/* filter pages that are protected from hwpoison test by users */
+	/*
+	 * filter pages that are protected from hwpoison test by users
+	 * or unsupported non folio compound pages
+	 */
 	folio_lock(folio);
-	if (hwpoison_filter(p)) {
+	if (hwpoison_filter(p) ||
+	    (folio_test_large(folio) && !folio_test_large_rmappable(folio))) {
 		ClearPageHWPoison(p);
 		folio_unlock(folio);
 		folio_put(folio);
-- 
2.51.0



Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-04 21:41                 ` Zi Yan
@ 2026-02-05  2:00                   ` jane.chu
  2026-02-05  3:21                     ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: jane.chu @ 2026-02-05  2:00 UTC (permalink / raw)
  To: Zi Yan, David Hildenbrand (arm), 是参差
  Cc: linux-mm, linmiaohe, akpm, linux-kernel, Matthew Wilcox



On 2/4/2026 1:41 PM, Zi Yan wrote:
> On 4 Feb 2026, at 16:37, David Hildenbrand (arm) wrote:
> 
>> On 2/4/26 22:08, Zi Yan wrote:
>>> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
>>>
>>>> On 2/4/26 18:41, Zi Yan wrote:
>>>>>
>>>>>
>>>>> More details:
>>>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>>>
>>>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
>>>
>>> OK, back to the issue. The patch below should fix the issue?
>>>
>>> Hi 是参差,
>>>
>>> Can you test it?
>>>
> 
> <snip>
>> I think you have to test for folio_test_large() before testing folio_test_large_rmappable().
> 
> Oh, forgot that. Thanks.
> 
> 
>  From 8dda4bba9964890462eca3ef3cce57bb4fab8313 Mon Sep 17 00:00:00 2001
> From: Zi Yan <ziy@nvidia.com>
> Date: Wed, 4 Feb 2026 16:04:19 -0500
> Subject: [PATCH] mm/memory_failure: reject unsupported non-folio compound page
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/memory-failure.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 825c706ac576..137c67fda57e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2440,9 +2440,13 @@ int memory_failure(unsigned long pfn, int flags)
> 
>   	folio = page_folio(p);
> 
> -	/* filter pages that are protected from hwpoison test by users */
> +	/*
> +	 * filter pages that are protected from hwpoison test by users
> +	 * or unsupported non folio compound pages
> +	 */
>   	folio_lock(folio);
> -	if (hwpoison_filter(p)) {
> +	if (hwpoison_filter(p) ||
> +	    (folio_test_large(folio) && !folio_test_large_rmappable(folio))) {

Just curious, would this filter out pte-mapped THP/mTHP folios?

>   		ClearPageHWPoison(p);
>   		folio_unlock(folio);
>   		folio_put(folio);

thanks,
-jane


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-05  2:00                   ` jane.chu
@ 2026-02-05  3:21                     ` Miaohe Lin
  2026-02-05  3:53                       ` Zi Yan
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2026-02-05  3:21 UTC (permalink / raw)
  To: jane.chu, Zi Yan, David Hildenbrand (arm), 是参差
  Cc: linux-mm, akpm, linux-kernel, Matthew Wilcox

On 2026/2/5 10:00, jane.chu@oracle.com wrote:
> 
> 
> On 2/4/2026 1:41 PM, Zi Yan wrote:
>> On 4 Feb 2026, at 16:37, David Hildenbrand (arm) wrote:
>>
>>> On 2/4/26 22:08, Zi Yan wrote:
>>>> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
>>>>
>>>>> On 2/4/26 18:41, Zi Yan wrote:
>>>>>>
>>>>>>
>>>>>> More details:
>>>>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>>>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>>>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>>>>
>>>>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
>>>>
>>>> OK, back to the issue. The patch below should fix the issue?
>>>>
>>>> Hi 是参差,
>>>>
>>>> Can you test it?
>>>>
>>
>> <snip>
>>> I think you have to test for folio_test_large() before testing folio_test_large_rmappable().
>>
>> Oh, forgot that. Thanks.
>>
>>
>>  From 8dda4bba9964890462eca3ef3cce57bb4fab8313 Mon Sep 17 00:00:00 2001
>> From: Zi Yan <ziy@nvidia.com>
>> Date: Wed, 4 Feb 2026 16:04:19 -0500
>> Subject: [PATCH] mm/memory_failure: reject unsupported non-folio compound page
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>   mm/memory-failure.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 825c706ac576..137c67fda57e 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2440,9 +2440,13 @@ int memory_failure(unsigned long pfn, int flags)
>>
>>       folio = page_folio(p);
>>
>> -    /* filter pages that are protected from hwpoison test by users */
>> +    /*
>> +     * filter pages that are protected from hwpoison test by users
>> +     * or unsupported non folio compound pages
>> +     */
>>       folio_lock(folio);
>> -    if (hwpoison_filter(p)) {
>> +    if (hwpoison_filter(p) ||
>> +        (folio_test_large(folio) && !folio_test_large_rmappable(folio))) {
> 
> Just curious, would this filter out pte-mapped THP/mTHP folios?

Thanks all.

memory_failure() can meet various types of folios. So in get_hwpoison_page(),
HWPoisonHandlable() and PageHuge() are used to check whether the folio can
be handled. But in madvise(MADV_HWPOISON) scene, MF_COUNT_INCREASED is set in
flag, so this check is skipped and warning triggered. Might HWPoisonHandlable()
check be always used to make sure the folio is in sane types? Something like
below (i.e. remove the MF_COUNT_INCREASED check before calling get_hwpoison_page):

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 825c706ac576..ba4231858a36 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2411,31 +2411,29 @@ int memory_failure(unsigned long pfn, int flags)
         * In fact it's dangerous to directly bump up page count from 0,
         * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
         */
-       if (!(flags & MF_COUNT_INCREASED)) {
-               res = get_hwpoison_page(p, flags);
-               if (!res) {
-                       if (is_free_buddy_page(p)) {
-                               if (take_page_off_buddy(p)) {
-                                       page_ref_inc(p);
-                                       res = MF_RECOVERED;
-                               } else {
-                                       /* We lost the race, try again */
-                                       if (retry) {
-                                               ClearPageHWPoison(p);
-                                               retry = false;
-                                               goto try_again;
-                                       }
-                                       res = MF_FAILED;
-                               }
-                               res = action_result(pfn, MF_MSG_BUDDY, res);
+       res = get_hwpoison_page(p, flags);
+       if (!res) {
+               if (is_free_buddy_page(p)) {
+                       if (take_page_off_buddy(p)) {
+                               page_ref_inc(p);
+                               res = MF_RECOVERED;
                        } else {
-                               res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
+                               /* We lost the race, try again */
+                               if (retry) {
+                                       ClearPageHWPoison(p);
+                                       retry = false;
+                                       goto try_again;
+                               }
+                               res = MF_FAILED;
                        }
-                       goto unlock_mutex;
-               } else if (res < 0) {
-                       res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
-                       goto unlock_mutex;
+                       res = action_result(pfn, MF_MSG_BUDDY, res);
+               } else {
+                       res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
                }
+               goto unlock_mutex;
+       } else if (res < 0) {
+               res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
+               goto unlock_mutex;
        }

        folio = page_folio(p);

Thanks.
.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-05  3:21                     ` Miaohe Lin
@ 2026-02-05  3:53                       ` Zi Yan
  2026-02-05  7:18                         ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Zi Yan @ 2026-02-05  3:53 UTC (permalink / raw)
  To: Miaohe Lin, jane.chu
  Cc: David Hildenbrand (arm), 是参差,
	linux-mm, akpm, linux-kernel, Matthew Wilcox

On 4 Feb 2026, at 22:21, Miaohe Lin wrote:

> On 2026/2/5 10:00, jane.chu@oracle.com wrote:
>>
>>
>> On 2/4/2026 1:41 PM, Zi Yan wrote:
>>> On 4 Feb 2026, at 16:37, David Hildenbrand (arm) wrote:
>>>
>>>> On 2/4/26 22:08, Zi Yan wrote:
>>>>> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
>>>>>
>>>>>> On 2/4/26 18:41, Zi Yan wrote:
>>>>>>>
>>>>>>>
>>>>>>> More details:
>>>>>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>>>>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>>>>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>>>>>
>>>>>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
>>>>>
>>>>> OK, back to the issue. The patch below should fix the issue?
>>>>>
>>>>> Hi 是参差,
>>>>>
>>>>> Can you test it?
>>>>>
>>>
>>> <snip>
>>>> I think you have to test for folio_test_large() before testing folio_test_large_rmappable().
>>>
>>> Oh, forgot that. Thanks.
>>>
>>>
>>>  From 8dda4bba9964890462eca3ef3cce57bb4fab8313 Mon Sep 17 00:00:00 2001
>>> From: Zi Yan <ziy@nvidia.com>
>>> Date: Wed, 4 Feb 2026 16:04:19 -0500
>>> Subject: [PATCH] mm/memory_failure: reject unsupported non-folio compound page
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>>   mm/memory-failure.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>> index 825c706ac576..137c67fda57e 100644
>>> --- a/mm/memory-failure.c
>>> +++ b/mm/memory-failure.c
>>> @@ -2440,9 +2440,13 @@ int memory_failure(unsigned long pfn, int flags)
>>>
>>>       folio = page_folio(p);
>>>
>>> -    /* filter pages that are protected from hwpoison test by users */
>>> +    /*
>>> +     * filter pages that are protected from hwpoison test by users
>>> +     * or unsupported non folio compound pages
>>> +     */
>>>       folio_lock(folio);
>>> -    if (hwpoison_filter(p)) {
>>> +    if (hwpoison_filter(p) ||
>>> +        (folio_test_large(folio) && !folio_test_large_rmappable(folio))) {
>>
>> Just curious, would this filter out pte-mapped THP/mTHP folios?

No. All folios (including pre-mapped/mTHP ones) are large_rmappable.

>
> Thanks all.
>
> memory_failure() can meet various types of folios. So in get_hwpoison_page(),
> HWPoisonHandlable() and PageHuge() are used to check whether the folio can
> be handled. But in madvise(MADV_HWPOISON) scene, MF_COUNT_INCREASED is set in
> flag, so this check is skipped and warning triggered. Might HWPoisonHandlable()
> check be always used to make sure the folio is in sane types? Something like
> below (i.e. remove the MF_COUNT_INCREASED check before calling get_hwpoison_page):
>
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 825c706ac576..ba4231858a36 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -2411,31 +2411,29 @@ int memory_failure(unsigned long pfn, int flags)
>          * In fact it's dangerous to directly bump up page count from 0,
>          * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>          */
> -       if (!(flags & MF_COUNT_INCREASED)) {
> -               res = get_hwpoison_page(p, flags);
> -               if (!res) {
> -                       if (is_free_buddy_page(p)) {
> -                               if (take_page_off_buddy(p)) {
> -                                       page_ref_inc(p);
> -                                       res = MF_RECOVERED;
> -                               } else {
> -                                       /* We lost the race, try again */
> -                                       if (retry) {
> -                                               ClearPageHWPoison(p);
> -                                               retry = false;
> -                                               goto try_again;
> -                                       }
> -                                       res = MF_FAILED;
> -                               }
> -                               res = action_result(pfn, MF_MSG_BUDDY, res);
> +       res = get_hwpoison_page(p, flags);
> +       if (!res) {
> +               if (is_free_buddy_page(p)) {
> +                       if (take_page_off_buddy(p)) {
> +                               page_ref_inc(p);
> +                               res = MF_RECOVERED;
>                         } else {
> -                               res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
> +                               /* We lost the race, try again */
> +                               if (retry) {
> +                                       ClearPageHWPoison(p);
> +                                       retry = false;
> +                                       goto try_again;
> +                               }
> +                               res = MF_FAILED;
>                         }
> -                       goto unlock_mutex;
> -               } else if (res < 0) {
> -                       res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
> -                       goto unlock_mutex;
> +                       res = action_result(pfn, MF_MSG_BUDDY, res);
> +               } else {
> +                       res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
>                 }
> +               goto unlock_mutex;
> +       } else if (res < 0) {
> +               res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
> +               goto unlock_mutex;
>         }
>
>         folio = page_folio(p);
>
> Thanks.
> .

This makes sense to me. And it gets rid of the warning as well.

Can you send a proper patch of this?

Feel free to add

Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>

to it. Thanks.

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered
  2026-02-05  3:53                       ` Zi Yan
@ 2026-02-05  7:18                         ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2026-02-05  7:18 UTC (permalink / raw)
  To: Zi Yan, jane.chu
  Cc: David Hildenbrand (arm), 是参差,
	linux-mm, akpm, linux-kernel, Matthew Wilcox

On 2026/2/5 11:53, Zi Yan wrote:
> On 4 Feb 2026, at 22:21, Miaohe Lin wrote:
> 
>> On 2026/2/5 10:00, jane.chu@oracle.com wrote:
>>>
>>>
>>> On 2/4/2026 1:41 PM, Zi Yan wrote:
>>>> On 4 Feb 2026, at 16:37, David Hildenbrand (arm) wrote:
>>>>
>>>>> On 2/4/26 22:08, Zi Yan wrote:
>>>>>> On 4 Feb 2026, at 14:18, David Hildenbrand (arm) wrote:
>>>>>>
>>>>>>> On 2/4/26 18:41, Zi Yan wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> More details:
>>>>>>>> later at sg_vma_fault(), the driver just handles a page fault by supplying
>>>>>>>> a subpage from a pre-allocated compound page[3]. We then get a large folio
>>>>>>>> without !CONFIG_TRANSPARENT_HUGEPAGE.
>>>>>>>
>>>>>>> We can identify such non-folio (but compound) things by looking at PG_large_rmappable IIRC.
>>>>>>
>>>>>> OK, back to the issue. The patch below should fix the issue?
>>>>>>
>>>>>> Hi 是参差,
>>>>>>
>>>>>> Can you test it?
>>>>>>
>>>>
>>>> <snip>
>>>>> I think you have to test for folio_test_large() before testing folio_test_large_rmappable().
>>>>
>>>> Oh, forgot that. Thanks.
>>>>
>>>>
>>>>  From 8dda4bba9964890462eca3ef3cce57bb4fab8313 Mon Sep 17 00:00:00 2001
>>>> From: Zi Yan <ziy@nvidia.com>
>>>> Date: Wed, 4 Feb 2026 16:04:19 -0500
>>>> Subject: [PATCH] mm/memory_failure: reject unsupported non-folio compound page
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>>   mm/memory-failure.c | 8 ++++++--
>>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>> index 825c706ac576..137c67fda57e 100644
>>>> --- a/mm/memory-failure.c
>>>> +++ b/mm/memory-failure.c
>>>> @@ -2440,9 +2440,13 @@ int memory_failure(unsigned long pfn, int flags)
>>>>
>>>>       folio = page_folio(p);
>>>>
>>>> -    /* filter pages that are protected from hwpoison test by users */
>>>> +    /*
>>>> +     * filter pages that are protected from hwpoison test by users
>>>> +     * or unsupported non folio compound pages
>>>> +     */
>>>>       folio_lock(folio);
>>>> -    if (hwpoison_filter(p)) {
>>>> +    if (hwpoison_filter(p) ||
>>>> +        (folio_test_large(folio) && !folio_test_large_rmappable(folio))) {
>>>
>>> Just curious, would this filter out pte-mapped THP/mTHP folios?
> 
> No. All folios (including pre-mapped/mTHP ones) are large_rmappable.
> 
>>
>> Thanks all.
>>
>> memory_failure() can meet various types of folios. So in get_hwpoison_page(),
>> HWPoisonHandlable() and PageHuge() are used to check whether the folio can
>> be handled. But in madvise(MADV_HWPOISON) scene, MF_COUNT_INCREASED is set in
>> flag, so this check is skipped and warning triggered. Might HWPoisonHandlable()
>> check be always used to make sure the folio is in sane types? Something like
>> below (i.e. remove the MF_COUNT_INCREASED check before calling get_hwpoison_page):
>>
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 825c706ac576..ba4231858a36 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -2411,31 +2411,29 @@ int memory_failure(unsigned long pfn, int flags)
>>          * In fact it's dangerous to directly bump up page count from 0,
>>          * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
>>          */
>> -       if (!(flags & MF_COUNT_INCREASED)) {
>> -               res = get_hwpoison_page(p, flags);
>> -               if (!res) {
>> -                       if (is_free_buddy_page(p)) {
>> -                               if (take_page_off_buddy(p)) {
>> -                                       page_ref_inc(p);
>> -                                       res = MF_RECOVERED;
>> -                               } else {
>> -                                       /* We lost the race, try again */
>> -                                       if (retry) {
>> -                                               ClearPageHWPoison(p);
>> -                                               retry = false;
>> -                                               goto try_again;
>> -                                       }
>> -                                       res = MF_FAILED;
>> -                               }
>> -                               res = action_result(pfn, MF_MSG_BUDDY, res);
>> +       res = get_hwpoison_page(p, flags);
>> +       if (!res) {
>> +               if (is_free_buddy_page(p)) {
>> +                       if (take_page_off_buddy(p)) {
>> +                               page_ref_inc(p);
>> +                               res = MF_RECOVERED;
>>                         } else {
>> -                               res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
>> +                               /* We lost the race, try again */
>> +                               if (retry) {
>> +                                       ClearPageHWPoison(p);
>> +                                       retry = false;
>> +                                       goto try_again;
>> +                               }
>> +                               res = MF_FAILED;
>>                         }
>> -                       goto unlock_mutex;
>> -               } else if (res < 0) {
>> -                       res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>> -                       goto unlock_mutex;
>> +                       res = action_result(pfn, MF_MSG_BUDDY, res);
>> +               } else {
>> +                       res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
>>                 }
>> +               goto unlock_mutex;
>> +       } else if (res < 0) {
>> +               res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
>> +               goto unlock_mutex;
>>         }
>>
>>         folio = page_folio(p);
>>
>> Thanks.
>> .
> 
> This makes sense to me. And it gets rid of the warning as well.
> 
> Can you send a proper patch of this?
> 
> Feel free to add
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Tested-by: Zi Yan <ziy@nvidia.com>

Sure. Thanks for all of your work. :)


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-02-05  7:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-04 12:49 WARNING in memory_failure() at include/linux/huge_mm.h:635 triggered 是参差
2026-02-04 17:12 ` David Hildenbrand (arm)
2026-02-04 17:15   ` David Hildenbrand (arm)
2026-02-04 17:23     ` Zi Yan
2026-02-04 17:34       ` Zi Yan
2026-02-04 17:41         ` Zi Yan
2026-02-04 19:18           ` David Hildenbrand (arm)
2026-02-04 19:48             ` Zi Yan
2026-02-04 19:55               ` David Hildenbrand (arm)
2026-02-04 20:13                 ` Zi Yan
2026-02-04 20:31                   ` David Hildenbrand (arm)
2026-02-04 20:45                     ` Zi Yan
2026-02-04 21:14                       ` David Hildenbrand (arm)
2026-02-04 21:08             ` Zi Yan
2026-02-04 21:37               ` David Hildenbrand (arm)
2026-02-04 21:41                 ` Zi Yan
2026-02-05  2:00                   ` jane.chu
2026-02-05  3:21                     ` Miaohe Lin
2026-02-05  3:53                       ` Zi Yan
2026-02-05  7:18                         ` Miaohe Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox