VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
@ 2026-02-25 13:43 Sasha Levin
  2026-02-25 13:50 ` David Hildenbrand (Arm)
  2026-02-25 20:30 ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-25 13:43 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand, Hugh Dickins,
	Zi Yan, Gavin Guo

Hi,

I've been playing around with improvements to syzkaller locally, and hit the
following crash on v7.0-rc1:

   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm ffff8881048e1780
   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
   refcnt 1
   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
   ------------[ cut here ]------------
   kernel BUG at mm/huge_memory.c:2999!
   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G                 N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
   FS:  0000000000000000(0000) GS:ffff88816f701000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
   PKRU: 80000000
   Call Trace:
    <TASK>
    __split_huge_pmd+0x201/0x350
    unmap_page_range+0xa6a/0x3db0
    unmap_single_vma+0x14b/0x230
    unmap_vmas+0x28f/0x580
    exit_mmap+0x203/0xa80
    __mmput+0x11b/0x540
    mmput+0x81/0xa0
    do_exit+0x7b9/0x2c60
    do_group_exit+0xd5/0x2a0
    get_signal+0x1fdc/0x2340
    arch_do_signal_or_restart+0x93/0x790
    exit_to_user_mode_loop+0x84/0x480
    do_syscall_64+0x4df/0x700
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
    </TASK>
   Kernel panic - not syncing: Fatal exception

The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
a 136KB region starting 816KB past the PMD base.

---

The following analysis was performed with the help of an LLM:

The crash path is:

   exit_mmap -> unmap_vmas -> unmap_page_range -> zap_pmd_range
     -> sees pmd_is_huge(*pmd) is true
     -> range doesn't cover full HPAGE_PMD_SIZE
     -> calls __split_huge_pmd()
       -> haddr = address & HPAGE_PMD_MASK = 0x555580c00000
       -> __split_huge_pmd_locked()
         -> VM_BUG_ON_VMA(vma->vm_start > haddr) fires
            because 0x555580cc0000 > 0x555580c00000

The root cause appears to be remove_migration_pmd() (mm/huge_memory.c:4906).
This function reinstalls a huge PMD via set_pmd_at() after migration
completes, but it never checks whether the VMA still covers the full
PMD-aligned 2MB range.

Every other code path that installs a huge PMD validates VMA boundaries:

   - do_huge_pmd_anonymous_page(): thp_vma_suitable_order()
   - collapse_huge_page():         hugepage_vma_revalidate()
   - MADV_COLLAPSE:                hugepage_vma_revalidate()
   - do_set_pmd() (shmem/tmpfs):   thp_vma_suitable_order()

remove_migration_pmd() checks none of these.

The suspected race window is:

   1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
      entry.

   2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
      vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
      migration entry into 512 PTE migration entries, and releases the PMD
      lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).

   3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
      (only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
      lock. If it wins the lock BEFORE step 2's split, it finds the PMD
      migration entry still intact and returns with pvmw->pte == NULL.

   4. remove_migration_pmd() then reinstalls the huge PMD via set_pmd_at()
      without checking that the VMA (whose boundaries may have already been
      modified in step 2) still covers the full PMD range.

   5. Later, exit_mmap -> unmap_page_range -> zap_pmd_range encounters the
      huge PMD, calls __split_huge_pmd(), and the VM_BUG_ON_VMA fires
      because vma->vm_start no longer aligns with the PMD base.

The fix should add a VMA boundary check in remove_migration_pmd(). If
haddr < vma->vm_start or haddr + HPAGE_PMD_SIZE > vma->vm_end, the
function should split the PMD migration entry into PTE-level migration
entries instead of reinstalling the huge PMD, allowing PTE-level removal
to handle each page individually.

-- 
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
@ 2026-02-25 13:50 ` David Hildenbrand (Arm)
  2026-02-25 18:12   ` Sasha Levin
  2026-02-25 20:30 ` David Hildenbrand (Arm)
  1 sibling, 1 reply; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 13:50 UTC (permalink / raw)
  To: Sasha Levin, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo

On 2/25/26 14:43, Sasha Levin wrote:
> Hi,
> 
> I've been playing around with improvements to syzkaller locally, and hit
> the
> following crash on v7.0-rc1:
> 
>   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
> ffff8881048e1780
>   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>   refcnt 1
>   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>   ------------[ cut here ]------------
>   kernel BUG at mm/huge_memory.c:2999!
>   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G                
> N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
> debian-1.17.0-1 04/01/2014
>   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>   FS:  0000000000000000(0000) GS:ffff88816f701000(0000)
> knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>   PKRU: 80000000
>   Call Trace:
>    <TASK>
>    __split_huge_pmd+0x201/0x350
>    unmap_page_range+0xa6a/0x3db0
>    unmap_single_vma+0x14b/0x230
>    unmap_vmas+0x28f/0x580
>    exit_mmap+0x203/0xa80
>    __mmput+0x11b/0x540
>    mmput+0x81/0xa0
>    do_exit+0x7b9/0x2c60
>    do_group_exit+0xd5/0x2a0
>    get_signal+0x1fdc/0x2340
>    arch_do_signal_or_restart+0x93/0x790
>    exit_to_user_mode_loop+0x84/0x480
>    do_syscall_64+0x4df/0x700
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>    </TASK>
>   Kernel panic - not syncing: Fatal exception
> 
> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
> a 136KB region starting 816KB past the PMD base.

Do you have a reproducer and would this trigger before v7.0-rc1?

Lorenzo did some changes around anon_vma locking recently, maybe related
to that.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 18:12   ` Sasha Levin
  0 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-02-25 18:12 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: linux-mm, linux-kernel, Lorenzo Stoakes, Andrew Morton,
	Hugh Dickins, Zi Yan, Gavin Guo

On Wed, Feb 25, 2026 at 02:50:16PM +0100, David Hildenbrand (Arm) wrote:
>On 2/25/26 14:43, Sasha Levin wrote:
>> Hi,
>>
>> I've been playing around with improvements to syzkaller locally, and hit
>> the
>> following crash on v7.0-rc1:
>>
>>   vma ffff888109f988c0 start 0000555580cc0000 end 0000555580ce2000 mm
>> ffff8881048e1780
>>   prot 8000000000000025 anon_vma ffff88810b20f100 vm_ops 0000000000000000
>>   pgoff 555580cc0 file 0000000000000000 private_data 0000000000000000
>>   refcnt 1
>>   flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>>   ------------[ cut here ]------------
>>   kernel BUG at mm/huge_memory.c:2999!
>>   Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>>   CPU: 3 UID: 0 PID: 15162 Comm: syz.7.3120 Tainted: G                
>> N  7.0.0-rc1-00001-gc5447a46efed #51 PREEMPT(full)
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-
>> debian-1.17.0-1 04/01/2014
>>   RIP: 0010:split_huge_pmd_locked+0x11a0/0x2f80
>>   RSP: 0018:ffff888053cc7338 EFLAGS: 00010282
>>   RAX: 0000000000000126 RBX: ffff888109f988d0 RCX: 0000000000000000
>>   RDX: 0000000000000126 RSI: 0000000000000000 RDI: ffffed100a798e43
>>   RBP: 0000555580cc0000 R08: ffffffffa3e62775 R09: 0000000000000001
>>   R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000000080
>>   R13: 0000000000000000 R14: 0000555580c00000 R15: ffff888109f988c0
>>   FS:  0000000000000000(0000) GS:ffff88816f701000(0000)
>> knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 00007fe2ac1907a0 CR3: 0000000021c91000 CR4: 0000000000750ef0
>>   PKRU: 80000000
>>   Call Trace:
>>    <TASK>
>>    __split_huge_pmd+0x201/0x350
>>    unmap_page_range+0xa6a/0x3db0
>>    unmap_single_vma+0x14b/0x230
>>    unmap_vmas+0x28f/0x580
>>    exit_mmap+0x203/0xa80
>>    __mmput+0x11b/0x540
>>    mmput+0x81/0xa0
>>    do_exit+0x7b9/0x2c60
>>    do_group_exit+0xd5/0x2a0
>>    get_signal+0x1fdc/0x2340
>>    arch_do_signal_or_restart+0x93/0x790
>>    exit_to_user_mode_loop+0x84/0x480
>>    do_syscall_64+0x4df/0x700
>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>    </TASK>
>>   Kernel panic - not syncing: Fatal exception
>>
>> The assertion VM_BUG_ON_VMA(vma->vm_start > haddr, vma) fires at
>> mm/huge_memory.c:2999 because a huge PMD exists at PMD-aligned address
>> 0x555580c00000 but the VMA only covers [0x555580cc0000, 0x555580ce2000):
>> a 136KB region starting 816KB past the PMD base.
>
>Do you have a reproducer and would this trigger before v7.0-rc1?

No reproducer. I saw it exactly once yesterday, syzkaller wasn't able to come
up with a reproducer and pointing the LLM at that task hasn't produced anything
useful either :(

-- 
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range
  2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
  2026-02-25 13:50 ` David Hildenbrand (Arm)
@ 2026-02-25 20:30 ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-25 20:30 UTC (permalink / raw)
  To: Sasha Levin, linux-mm, linux-kernel
  Cc: Lorenzo Stoakes, Andrew Morton, Hugh Dickins, Zi Yan, Gavin Guo

> The suspected race window is:
> 
>   1. VMA [A, A+2MB) has a THP. Migration starts, PMD becomes a migration
>      entry.
> 
>   2. Concurrently, __split_vma() runs under mmap_write_lock. It calls
>      vma_adjust_trans_huge() which acquires the PMD lock, splits the PMD
>      migration entry into 512 PTE migration entries, and releases the PMD
>      lock. Then VMA boundaries are modified (e.g., vma->vm_start = A+X).

split_huge_pmd_locked() will call __split_huge_pmd_locked() either for
* pmd_trans_huge(): Present PMD
* pmd_is_valid_softleaf(): Migration PMDs (for our purpose)

__split_huge_pmd_locked() will remap either, resulting in a PTE table.


> 
>   3. remove_migration_ptes() runs via rmap_walk_anon() WITHOUT mmap_lock
>      (only the anon_vma lock). page_vma_mapped_walk() acquires the PMD
>      lock. If it wins the lock BEFORE step 2's split, it finds the PMD
>      migration entry still intact and returns with pvmw->pte == NULL.
If 2. runs after 3, 2. would simply split the (present) PMD.
If 2. runs before 3, 2. would simply split the migration PMD.

And both should sync on the PMD table lock.

exit() cannot run before 2. completed.


Wondering if it's instead some missed vma_adjust_trans_huge() case. I
see a call from commit_merge(), so maybe some odd corner cases with VMA
merging.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-25 20:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-25 13:43 VM_BUG_ON_VMA in split_huge_pmd_locked: huge PMD doesn't cover full VMA range Sasha Levin
2026-02-25 13:50 ` David Hildenbrand (Arm)
2026-02-25 18:12   ` Sasha Levin
2026-02-25 20:30 ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox