* VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
@ 2026-02-25 13:43 Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 20:30 ` David Hildenbrand (Arm)
0 siblings, 2 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-25 13:43 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand, Hugh Dickins,
Zi Yan, Gavin Guo
Hi,
I've been playing around with improvements to syzkaller locally, and hit the
following crash on v7.0-rc1:
vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm ffff8881048e1780
prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
refcnt 1
flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2999!
Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
FS: 0000000000000000(0000) GS:ffff88816f701000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
PKRU: 80000000
Call Trace:
<TASK>
__split_huge_pmd+0x201/0x350
unmap_page_range+0xa6a/0x3db0
unmap_single_vma+0x14b/0x230
unmap_vmas+0x28f/0x580
exit_mmap+0x203/0xa80
__mmput+0x11b/0x540
mmput+0x81/0xa0
do_exit+0x7b9/0x2c60
do_group_exit+0xd5/0x2a0
get_signal+0x1fdc/0x2340
arch_do_signal_or_restart+0x93/0x790
exit_to_user_mode_loop+0x84/0x480
do_syscall_64+0x4df/0x700
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Kernel panic - not syncing: Fatal exception
The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
a 136KB region starting 816KB past the PMD base.
---
The following analysis was performed with the help of an LLM:
The crash path is:
exit_mmap -> unmap_vmas -> unmap_page_range -> zap_pmd_range
-> sees pmd_is_huge(*pmd) is true
-> range doesn't cover full HPAGE_PMD_SIZE
-> calls __split_huge_pmd()
-> haddr = address & HPAGE_PMD_MASK = 0x555580c00000
-> __split_huge_pmd_locked()
-> VM_BUG_ON_VMA(vma->vm_start > haddr) fires
because 0x555580cc0000 > 0x555580c00000
The root cause appears to be remove_migration_pmd() (mm/huge_memory.c:4906).
This function reinstalls a huge PMD via set_pmd_at() after migration
completes, but it never checks whether the VMA still covers the full
PMD-aligned 2MB range.
Every other code path that installs a huge PMD validates VMA boundaries:
- do_huge_pmd_anonymous_page(): thp_vma_suitable_order()
- collapse_huge_page(): hugepage_vma_revalidate()
- MADV_COLLAPSE: hugepage_vma_revalidate()
- do_set_pmd() (shmem/tmpfs): thp_vma_suitable_order()
remove_migration_pmd() checks none of these.
The suspected race window is:
1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
entry.
2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
migration entry into 512 PTE migration entries, and releases the PMD
lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).
3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
(only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
lock. If it wins the lock BEFORE step 2's split, it finds the PMD
migration entry still intact and returns with pvmw->pte == NULL.
4. remove_migration_pmd() then reinstalls the huge PMD via set_pmd_at()
without checking that the VMA (whose boundaries may have already been
modified in step 2) still covers the full PMD range.
5. Later, exit_mmap -> unmap_page_range -> zap_pmd_range encounters the
huge PMD, calls __split_huge_pmd(), and the VM_BUG_ON_VMA fires
because vma->vm_start no longer aligns with the PMD base.
The fix should add a VMA boundary check in remove_migration_pmd(). If
haddr < vma->vm_start or haddr + HPAGE_PMD_SIZE > vma->vm_end, the
function should split the PMD migration entry into PTE-level migration
entries instead of reinstalling the huge PMD, allowing PTE-level removal
to handle each page individually.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
@ 2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 18:12 ` Sasha Levin
2026-02-25 20:30 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 13:50 UTC (permalink / raw)
To: Sasha Levin, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo
On 2/25/26 14:43, Sasha Levin wrote:
> Hi,
>
> I've been playing around with improvements to syzkaller locally, and hit
> the
> following crash on v7.0-rc1:
>
> vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
> ffff8881048e1780
> prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
> pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
> refcnt 1
> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> ------------[ cut here ]------------
> kernel BUG at mm/huge_memory.c:2999!
> Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
> N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> debian-1.17.0-1 04/01/2014
> RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
> RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
> RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
> RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
> RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
> R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
> R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
> FS: 0000000000000000(0000) GS:ffff88816f701000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
> PKRU: 80000000
> Call Trace:
> <TASK>
> __split_huge_pmd+0x201/0x350
> unmap_page_range+0xa6a/0x3db0
> unmap_single_vma+0x14b/0x230
> unmap_vmas+0x28f/0x580
> exit_mmap+0x203/0xa80
> __mmput+0x11b/0x540
> mmput+0x81/0xa0
> do_exit+0x7b9/0x2c60
> do_group_exit+0xd5/0x2a0
> get_signal+0x1fdc/0x2340
> arch_do_signal_or_restart+0x93/0x790
> exit_to_user_mode_loop+0x84/0x480
> do_syscall_64+0x4df/0x700
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> </TASK>
> Kernel panic - not syncing: Fatal exception
>
> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
> a 136KB region starting 816KB past the PMD base.
Do you have a reproducer and would this trigger before v7.0-rc1?
Lorenzo did some changes around anon_vma locking recently, maybe related
to that.
--
Cheers,
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 18:12 ` Sasha Levin
0 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-25 18:12 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, linux-kernel, Lorenzo Stoakes, Andrew Morton,
Hugh Dickins, Zi Yan, Gavin Guo
On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
>On 2/25/26 14:43, Sasha Levin wrote:
>> Hi,
>>
>> I've been playing around with improvements to syzkaller locally, and hit
>> the
>> following crash on v7.0-rc1:
>>
>> vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
>> ffff8881048e1780
>> prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>> pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>> refcnt 1
>> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>> ------------[ cut here ]------------
>> kernel BUG at mm/huge_memory.c:2999!
>> Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
>> N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
>> debian-1.17.0-1 04/01/2014
>> RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>> RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>> RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>> RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>> RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>> R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>> R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>> FS: 0000000000000000(0000) GS:ffff88816f701000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>> PKRU: 80000000
>> Call Trace:
>> <TASK>
>> __split_huge_pmd+0x201/0x350
>> unmap_page_range+0xa6a/0x3db0
>> unmap_single_vma+0x14b/0x230
>> unmap_vmas+0x28f/0x580
>> exit_mmap+0x203/0xa80
>> __mmput+0x11b/0x540
>> mmput+0x81/0xa0
>> do_exit+0x7b9/0x2c60
>> do_group_exit+0xd5/0x2a0
>> get_signal+0x1fdc/0x2340
>> arch_do_signal_or_restart+0x93/0x790
>> exit_to_user_mode_loop+0x84/0x480
>> do_syscall_64+0x4df/0x700
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> </TASK>
>> Kernel panic - not syncing: Fatal exception
>>
>> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
>> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
>> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
>> a 136KB region starting 816KB past the PMD base.
>
>Do you have a reproducer and would this trigger before v7.0-rc1?
No reproducer. I saw it exactly once yesterday, syzkaller wasn't able to come
up with a reproducer and pointing the LLM at that task hasn't produced anything
useful either :(
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 20:30 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 20:30 UTC (permalink / raw)
To: Sasha Levin, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo
> The suspected race window is:
>
> 1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
> entry.
>
> 2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
> vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
> migration entry into 512 PTE migration entries, and releases the PMD
> lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).
split_huge_pmd_locked() will call __split_huge_pmd_locked() either for
* pmd_trans_huge(): Present PMD
* pmd_is_valid_softleaf(): Migration PMDs (for our purpose)
__split_huge_pmd_locked() will remap either, resulting in a PTE table.
>
> 3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
> (only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
> lock. If it wins the lock BEFORE step 2's split, it finds the PMD
> migration entry still intact and returns with pvmw->pte == NULL.
If 2. runs after 3, 2. would simply split the (present) PMD.
If 2. runs before 3, 2. would simply split the migration PMD.
And both should sync on the PMD table lock.
exit() cannot run before 2. completed.
Wondering if it's instead some missed vma_adjust_trans_huge() case. I
see a call from commit_merge(), so maybe some odd corner cases with VMA
merging.
--
Cheers,
David
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-25 20:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 18:12 ` Sasha Levin
2026-02-25 20:30 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox