* VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
@ 2026-02-25 13:43 Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 20:30 ` VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range David Hildenbrand (Arm)
0 siblings, 2 replies; 8+ messages in thread
From: Sasha Levin @ 2026-02-25 13:43 UTC (permalink / raw)
To: linux-mm, linux-kernel
Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand, Hugh Dickins,
Zi Yan, Gavin Guo
Hi,
I've been playing around with improvements to syzkaller locally, and hit the
following crash on v7.0-rc1:
vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm ffff8881048e1780
prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
refcnt 1
flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2999!
Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
FS: 0000000000000000(0000) GS:ffff88816f701000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
PKRU: 80000000
Call Trace:
<TASK>
__split_huge_pmd+0x201/0x350
unmap_page_range+0xa6a/0x3db0
unmap_single_vma+0x14b/0x230
unmap_vmas+0x28f/0x580
exit_mmap+0x203/0xa80
__mmput+0x11b/0x540
mmput+0x81/0xa0
do_exit+0x7b9/0x2c60
do_group_exit+0xd5/0x2a0
get_signal+0x1fdc/0x2340
arch_do_signal_or_restart+0x93/0x790
exit_to_user_mode_loop+0x84/0x480
do_syscall_64+0x4df/0x700
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Kernel panic - not syncing: Fatal exception
The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
a 136KB region starting 816KB past the PMD base.
---
The following analysis was performed with the help of an LLM:
The crash path is:
exit_mmap -> unmap_vmas -> unmap_page_range -> zap_pmd_range
-> sees pmd_is_huge(*pmd) is true
-> range doesn't cover full HPAGE_PMD_SIZE
-> calls __split_huge_pmd()
-> haddr = address & HPAGE_PMD_MASK = 0x555580c00000
-> __split_huge_pmd_locked()
-> VM_BUG_ON_VMA(vma->vm_start > haddr) fires
because 0x555580cc0000 > 0x555580c00000
The root cause appears to be remove_migration_pmd() (mm/huge_memory.c:4906).
This function reinstalls a huge PMD via set_pmd_at() after migration
completes, but it never checks whether the VMA still covers the full
PMD-aligned 2MB range.
Every other code path that installs a huge PMD validates VMA boundaries:
- do_huge_pmd_anonymous_page(): thp_vma_suitable_order()
- collapse_huge_page(): hugepage_vma_revalidate()
- MADV_COLLAPSE: hugepage_vma_revalidate()
- do_set_pmd() (shmem/tmpfs): thp_vma_suitable_order()
remove_migration_pmd() checks none of these.
The suspected race window is:
1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
entry.
2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
migration entry into 512 PTE migration entries, and releases the PMD
lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).
3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
(only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
lock. If it wins the lock BEFORE step 2's split, it finds the PMD
migration entry still intact and returns with pvmw->pte == NULL.
4. remove_migration_pmd() then reinstalls the huge PMD via set_pmd_at()
without checking that the VMA (whose boundaries may have already been
modified in step 2) still covers the full PMD range.
5. Later, exit_mmap -> unmap_page_range -> zap_pmd_range encounters the
huge PMD, calls __split_huge_pmd(), and the VM_BUG_ON_VMA fires
because vma->vm_start no longer aligns with the PMD base.
The fix should add a VMA boundary check in remove_migration_pmd(). If
haddr < vma->vm_start or haddr + HPAGE_PMD_SIZE > vma->vm_end, the
function should split the PMD migration entry into PTE-level migration
entries instead of reinstalling the huge PMD, allowing PTE-level removal
to handle each page individually.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
@ 2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 18:12 ` Sasha Levin
2026-03-02 10:57 ` Lorenzo Stoakes
2026-02-25 20:30 ` VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range David Hildenbrand (Arm)
1 sibling, 2 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 13:50 UTC (permalink / raw)
To: Sasha Levin, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo
On 2/25/26 14:43, Sasha Levin wrote:
> Hi,
>
> I've been playing around with improvements to syzkaller locally, and hit
> the
> following crash on v7.0-rc1:
>
> vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
> ffff8881048e1780
> prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
> pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
> refcnt 1
> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> ------------[ cut here ]------------
> kernel BUG at mm/huge_memory.c:2999!
> Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
> N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> debian-1.17.0-1 04/01/2014
> RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
> RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
> RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
> RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
> RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
> R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
> R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
> FS: 0000000000000000(0000) GS:ffff88816f701000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
> PKRU: 80000000
> Call Trace:
> <TASK>
> __split_huge_pmd+0x201/0x350
> unmap_page_range+0xa6a/0x3db0
> unmap_single_vma+0x14b/0x230
> unmap_vmas+0x28f/0x580
> exit_mmap+0x203/0xa80
> __mmput+0x11b/0x540
> mmput+0x81/0xa0
> do_exit+0x7b9/0x2c60
> do_group_exit+0xd5/0x2a0
> get_signal+0x1fdc/0x2340
> arch_do_signal_or_restart+0x93/0x790
> exit_to_user_mode_loop+0x84/0x480
> do_syscall_64+0x4df/0x700
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> </TASK>
> Kernel panic - not syncing: Fatal exception
>
> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
> a 136KB region starting 816KB past the PMD base.
Do you have a reproducer and would this trigger before v7.0-rc1?
Lorenzo did some changes around anon_vma locking recently, maybe related
to that.
--
Cheers,
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 18:12 ` Sasha Levin
2026-03-02 10:57 ` Lorenzo Stoakes
1 sibling, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-02-25 18:12 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, linux-kernel, Lorenzo Stoakes, Andrew Morton,
Hugh Dickins, Zi Yan, Gavin Guo
On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
>On 2/25/26 14:43, Sasha Levin wrote:
>> Hi,
>>
>> I've been playing around with improvements to syzkaller locally, and hit
>> the
>> following crash on v7.0-rc1:
>>
>> vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
>> ffff8881048e1780
>> prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>> pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>> refcnt 1
>> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>> ------------[ cut here ]------------
>> kernel BUG at mm/huge_memory.c:2999!
>> Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
>> N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
>> debian-1.17.0-1 04/01/2014
>> RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>> RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>> RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>> RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>> RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>> R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>> R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>> FS: 0000000000000000(0000) GS:ffff88816f701000(0000)
>> knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>> PKRU: 80000000
>> Call Trace:
>> <TASK>
>> __split_huge_pmd+0x201/0x350
>> unmap_page_range+0xa6a/0x3db0
>> unmap_single_vma+0x14b/0x230
>> unmap_vmas+0x28f/0x580
>> exit_mmap+0x203/0xa80
>> __mmput+0x11b/0x540
>> mmput+0x81/0xa0
>> do_exit+0x7b9/0x2c60
>> do_group_exit+0xd5/0x2a0
>> get_signal+0x1fdc/0x2340
>> arch_do_signal_or_restart+0x93/0x790
>> exit_to_user_mode_loop+0x84/0x480
>> do_syscall_64+0x4df/0x700
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> </TASK>
>> Kernel panic - not syncing: Fatal exception
>>
>> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
>> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
>> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
>> a 136KB region starting 816KB past the PMD base.
>
>Do you have a reproducer and would this trigger before v7.0-rc1?
No reproducer. I saw it exactly once yesterday, syzkaller wasn't able to come
up with a reproducer and pointing the LLM at that task hasn't produced anything
useful either :(
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 20:30 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 20:30 UTC (permalink / raw)
To: Sasha Levin, linux-mm, linux-kernel
Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo
> The suspected race window is:
>
> 1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
> entry.
>
> 2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
> vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
> migration entry into 512 PTE migration entries, and releases the PMD
> lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).
split_huge_pmd_locked() will call __split_huge_pmd_locked() either for
* pmd_trans_huge(): Present PMD
* pmd_is_valid_softleaf(): Migration PMDs (for our purpose)
__split_huge_pmd_locked() will remap either, resulting in a PTE table.
>
> 3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
> (only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
> lock. If it wins the lock BEFORE step 2's split, it finds the PMD
> migration entry still intact and returns with pvmw->pte == NULL.
If 2. runs after 3, 2. would simply split the (present) PMD.
If 2. runs before 3, 2. would simply split the migration PMD.
And both should sync on the PMD table lock.
exit() cannot run before 2. completed.
Wondering if it's instead some missed vma_adjust_trans_huge() case. I
see a call from commit_merge(), so maybe some odd corner cases with VMA
merging.
--
Cheers,
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 18:12 ` Sasha Levin
@ 2026-03-02 10:57 ` Lorenzo Stoakes
2026-03-02 15:13 ` Sasha Levin
2026-03-02 15:15 ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
1 sibling, 2 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2026-03-02 10:57 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Sasha Levin, linux-mm, linux-kernel, Andrew Morton, Hugh Dickins,
Zi Yan, Gavin Guo
On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
> On 2/25/26 14:43, Sasha Levin wrote:
> > Hi,
> >
> > I've been playing around with improvements to syzkaller locally, and hit
> > the
> > following crash on v7.0-rc1:
> >
> > vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
> > ffff8881048e1780
> > prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
> > pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
> > refcnt 1
> > flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> > ------------[ cut here ]------------
> > kernel BUG at mm/huge_memory.c:2999!
> > Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> > CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
> > N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> > debian-1.17.0-1 04/01/2014
> > RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
> > RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
> > RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
> > RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
> > RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
> > R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
> > R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
> > FS: 0000000000000000(0000) GS:ffff88816f701000(0000)
> > knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
> > PKRU: 80000000
> > Call Trace:
> > <TASK>
> > __split_huge_pmd+0x201/0x350
> > unmap_page_range+0xa6a/0x3db0
> > unmap_single_vma+0x14b/0x230
> > unmap_vmas+0x28f/0x580
> > exit_mmap+0x203/0xa80
> > __mmput+0x11b/0x540
> > mmput+0x81/0xa0
> > do_exit+0x7b9/0x2c60
> > do_group_exit+0xd5/0x2a0
> > get_signal+0x1fdc/0x2340
> > arch_do_signal_or_restart+0x93/0x790
> > exit_to_user_mode_loop+0x84/0x480
> > do_syscall_64+0x4df/0x700
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > </TASK>
> > Kernel panic - not syncing: Fatal exception
> >
> > The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
> > mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
> > 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
> > a 136KB region starting 816KB past the PMD base.
>
> Do you have a reproducer and would this trigger before v7.0-rc1?
>
> Lorenzo did some changes around anon_vma locking recently, maybe related
> to that.
A quick glance doesn't suggest any changes I made should have had an impact
here.
_Should have_ :)
I think without a reproducer this is going to be hard to pinpoint. Hopefully
syzbot proper should figure one out eventually?
>
> --
> Cheers,
>
> David
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
2026-03-02 10:57 ` Lorenzo Stoakes
@ 2026-03-02 15:13 ` Sasha Levin
2026-03-02 15:15 ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
1 sibling, 0 replies; 8+ messages in thread
From: Sasha Levin @ 2026-03-02 15:13 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: David Hildenbrand (Arm),
linux-mm, linux-kernel, Andrew Morton, Hugh Dickins, Zi Yan,
Gavin Guo
On Mon, Mar 02, 2026 at 10:57:47AM +0000, Lorenzo Stoakes wrote:
>On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
>> On 2/25/26 14:43, Sasha Levin wrote:
>> > Hi,
>> >
>> > I've been playing around with improvements to syzkaller locally, and hit
>> > the
>> > following crash on v7.0-rc1:
>> >
>> > vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
>> > ffff8881048e1780
>> > prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>> > pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>> > refcnt 1
>> > flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>> > ------------[ cut here ]------------
>> > kernel BUG at mm/huge_memory.c:2999!
>> > Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> > CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G
>> > N 7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
>> > debian-1.17.0-1 04/01/2014
>> > RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>> > RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>> > RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>> > RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>> > RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>> > R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>> > R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>> > FS: 0000000000000000(0000) GS:ffff88816f701000(0000)
>> > knlGS:0000000000000000
>> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>> > PKRU: 80000000
>> > Call Trace:
>> > <TASK>
>> > __split_huge_pmd+0x201/0x350
>> > unmap_page_range+0xa6a/0x3db0
>> > unmap_single_vma+0x14b/0x230
>> > unmap_vmas+0x28f/0x580
>> > exit_mmap+0x203/0xa80
>> > __mmput+0x11b/0x540
>> > mmput+0x81/0xa0
>> > do_exit+0x7b9/0x2c60
>> > do_group_exit+0xd5/0x2a0
>> > get_signal+0x1fdc/0x2340
>> > arch_do_signal_or_restart+0x93/0x790
>> > exit_to_user_mode_loop+0x84/0x480
>> > do_syscall_64+0x4df/0x700
>> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> > </TASK>
>> > Kernel panic - not syncing: Fatal exception
>> >
>> > The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
>> > mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
>> > 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
>> > a 136KB region starting 816KB past the PMD base.
>>
>> Do you have a reproducer and would this trigger before v7.0-rc1?
>>
>> Lorenzo did some changes around anon_vma locking recently, maybe related
>> to that.
>
>A quick glance doesn't suggest any changes I made should have had an impact
>here.
>
>_Should have_ :)
>
>I think without a reproducer this is going to be hard to pinpoint. Hopefully
>syzbot proper should figure one out eventually?
So no luck just yet.
I did hit a different issue, which the LLM was able to triage, and I'm running
with the patch right now to make sure that the issue doesn't reproduce.
I'm not sure if it's related or not, but I'll send the WIP patch as a reply.
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 8+ messages in thread
* [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma()
2026-03-02 10:57 ` Lorenzo Stoakes
2026-03-02 15:13 ` Sasha Levin
@ 2026-03-02 15:15 ` Sasha Levin
2026-03-02 17:48 ` Lorenzo Stoakes
1 sibling, 1 reply; 8+ messages in thread
From: Sasha Levin @ 2026-03-02 15:15 UTC (permalink / raw)
To: lorenzo.stoakes
Cc: akpm, david, gavinguo, hughd, linux-kernel, linux-mm, sashal, ziy
When dup_anon_vma() calls anon_vma_clone() and it fails with -ENOMEM,
dst->anon_vma is left pointing at src->anon_vma without a corresponding
num_active_vmas increment (which only happens on the success path).
The internal cleanup_partial_anon_vmas() correctly frees partially-
allocated AVCs but does not clear dst->anon_vma. Later, when the VMA is
torn down during process exit, unlink_anon_vmas() sees a non-NULL
vma->anon_vma and decrements num_active_vmas without a prior matching
increment, causing an underflow. This eventually triggers:
WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900 mm/rmap.c:528
First, fault injection in the mlock2 syscall path:
FAULT_INJECTION: forcing a failure.
name failslab, interval 1, probability 0, space 0, times 0
CPU: 3 PID: 4261 Comm: syz.6.96
Call Trace:
should_fail_ex.cold+0xd8/0x15d
should_failslab+0xd4/0x150
kmem_cache_alloc_noprof+0x60/0x630
anon_vma_clone+0x2ed/0xcf0
dup_anon_vma+0x1cb/0x320
vma_modify+0x16dd/0x2230
vma_modify_flags+0x1f9/0x350
mlock_fixup+0x225/0xe10
apply_vma_lock_flags+0x249/0x360
do_mlock+0x269/0x7f0
__x64_sys_mlock2+0xc0/0x100
Followed by the WARNING on the same task during exit:
WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900
CPU: 3 PID: 4261 Comm: syz.6.96
Call Trace:
free_pgtables+0x312/0x950
exit_mmap+0x487/0xa80
__mmput+0x11b/0x540
exit_mm
do_exit+0x7b9/0x2c60
Fix this by clearing dst->anon_vma on clone failure, restoring the VMA
to its original unfaulted state. This ensures unlink_anon_vmas() will
correctly bail out early at the !active_anon_vma check.
Other callers of anon_vma_clone() are unaffected: VMA_OP_SPLIT/REMAP
free the dst VMA on error, and VMA_OP_FORK explicitly sets anon_vma to
NULL before cloning.
Fixes: 542eda1a83294 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/vma.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/vma.c b/mm/vma.c
index be64f781a3aa7..4cf6a2a05c10a 100644
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -629,8 +629,10 @@ static int dup_anon_vma(struct vm_area_struct *dst,
vma_assert_write_locked(dst);
dst->anon_vma = src->anon_vma;
ret = anon_vma_clone(dst, src, VMA_OP_MERGE_UNFAULTED);
- if (ret)
+ if (ret) {
+ dst->anon_vma = NULL;
return ret;
+ }
*dup = dst;
}
--
2.51.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma()
2026-03-02 15:15 ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
@ 2026-03-02 17:48 ` Lorenzo Stoakes
0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2026-03-02 17:48 UTC (permalink / raw)
To: Sasha Levin; +Cc: akpm, david, gavinguo, hughd, linux-kernel, linux-mm, ziy
On Mon, Mar 02, 2026 at 10:15:47AM -0500, Sasha Levin wrote:
> When dup_anon_vma() calls anon_vma_clone() and it fails with -ENOMEM,
> dst->anon_vma is left pointing at src->anon_vma without a corresponding
> num_active_vmas increment (which only happens on the success path).
>
> The internal cleanup_partial_anon_vmas() correctly frees partially-
> allocated AVCs but does not clear dst->anon_vma. Later, when the VMA is
> torn down during process exit, unlink_anon_vmas() sees a non-NULL
> vma->anon_vma and decrements num_active_vmas without a prior matching
> increment, causing an underflow. This eventually triggers:
Yikes!
>
> WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900 mm/rmap.c:528
>
> First, fault injection in the mlock2 syscall path:
>
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 0
> CPU: 3 PID: 4261 Comm: syz.6.96
> Call Trace:
> should_fail_ex.cold+0xd8/0x15d
> should_failslab+0xd4/0x150
> kmem_cache_alloc_noprof+0x60/0x630
> anon_vma_clone+0x2ed/0xcf0
> dup_anon_vma+0x1cb/0x320
> vma_modify+0x16dd/0x2230
> vma_modify_flags+0x1f9/0x350
> mlock_fixup+0x225/0xe10
> apply_vma_lock_flags+0x249/0x360
> do_mlock+0x269/0x7f0
> __x64_sys_mlock2+0xc0/0x100
>
> Followed by the WARNING on the same task during exit:
>
> WARNING: mm/rmap.c:528 at unlink_anon_vmas+0x68e/0x900
> CPU: 3 PID: 4261 Comm: syz.6.96
> Call Trace:
> free_pgtables+0x312/0x950
> exit_mmap+0x487/0xa80
> __mmput+0x11b/0x540
> exit_mm
> do_exit+0x7b9/0x2c60
>
> Fix this by clearing dst->anon_vma on clone failure, restoring the VMA
> to its original unfaulted state. This ensures unlink_anon_vmas() will
> correctly bail out early at the !active_anon_vma check.
>
> Other callers of anon_vma_clone() are unaffected: VMA_OP_SPLIT/REMAP
> free the dst VMA on error, and VMA_OP_FORK explicitly sets anon_vma to
> NULL before cloning.
>
> Fixes: 542eda1a83294 ("mm/rmap: improve anon_vma_clone(), unlink_anon_vmas() comments, add asserts")
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> mm/vma.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vma.c b/mm/vma.c
> index be64f781a3aa7..4cf6a2a05c10a 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -629,8 +629,10 @@ static int dup_anon_vma(struct vm_area_struct *dst,
> vma_assert_write_locked(dst);
> dst->anon_vma = src->anon_vma;
> ret = anon_vma_clone(dst, src, VMA_OP_MERGE_UNFAULTED);
> - if (ret)
> + if (ret) {
> + dst->anon_vma = NULL;
> return ret;
> + }
Hm, I think I'd rather we tackle this at the source to be honest.
I think it makes sense to do this in cleanup_partial_anon_vmas() since that's
handling the rest of the cleanup, and this is what the anon_vma_clone() error
path previously did.
Something like:
static void cleanup_partial_anon_vmas(struct vm_area_struct *vma)
{
struct anon_vma_chain *avc, *next;
list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) {
list_del(&avc->same_vma);
anon_vma_chain_free(avc);
}
+ vma->anon_vma = NULL;
}
>
> *dup = dst;
> }
> --
> 2.51.0
>
Thanks for looking at this, this definitely needs fixing, albeit luckily real
world OOM's like this are probably near-impossible to trigger due to be 'too
small to fail' allocations, however we do absolutely need to ensure these code
paths are correctly handled.
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-02 17:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 18:12 ` Sasha Levin
2026-03-02 10:57 ` Lorenzo Stoakes
2026-03-02 15:13 ` Sasha Levin
2026-03-02 15:15 ` [WIP] mm/vma: clear dst->anon_vma on anon_vma_clone() failure in dup_anon_vma() Sasha Levin
2026-03-02 17:48 ` Lorenzo Stoakes
2026-02-25 20:30 ` VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox